ELT MongoDB Data Using Airbyte
Rate this tutorial
Airbyte is an open source data integration platform that provides an easy and quick way to ELT (Extract, Load, and Transform) your data between a plethora of data sources. AirByte can be used as part of a workflow orchestration solution like Apache Airflow to address data movement. In this post, we will install Airbyte and replicate the sample database, “sample_restaurants,” found in MongoDB Atlas out to a CSV file.
Airbyte is available as a cloud service or can be installed self-hosted using Docker containers. In this post, we will deploy Airbyte locally using Docker.
1 git clone https://github.com/airbytehq/airbyte.git 2 cd airbyte 3 docker-compose up
When the containers are ready, you will see the logo printed in the compose logs as follows:
Navigate to http://localhost:8000 to launch the Airbyte portal. Note that the default username is “admin” and the password is “password.”
To create a source connector, click on the Sources menu item on the left side of the portal and then the “Connect to your first source” button. This will launch the New Source page as follows:
Type “mongodb” and select “MongoDb.”
The MongoDB Connector can be used with both self-hosted and MongoDB Atlas clusters.
Select the appropriate MongoDB instance type and fill out the rest of the configuration information. In this post, we will be using MongoDB Atlas and have set our configuration as follows:
MongoDB Instance Type | MongoDB Atlas |
Cluster URL | demo.ikyil.mongodb.net |
Database Name | sample_restaurants |
Username | ab_user |
Password | ********** |
Authentication Source | admin |
Note: If you’re using MongoDB Atlas, be sure to create the user and allow network access. By default, MongoDB Atlas does not access remote connections.
Click “Setup source” and Airbyte will test the connection. If it’s successful, you’ll be sent to the Add destination page. Click the “Add destination” button and select “Local CSV” from the drop-down.
Next, provide a destination name, “restaurant-samples,” and destination path, “/local.” The Airbyte portal provides a setup guide for the Local CSV connector on the right side of the page. This is useful for a quick reference on connector configuration.
Click “Set up destination” and Airbyte will test the connection with the destination. Upon success, you’ll be redirected to a page where you can define the details of the stream you’d like to sync.
Airbyte provides a variety of sync options, including full refresh and incremental.
Select “Full Refresh | Overwrite” and then click “Set up sync.”
Airbyte will kick off the sync process and if successful, you’ll see the Sync Succeeded message.
Let’s take a look at the CSV files created. The CSV connector writes to the /local docker mount on the airbyte server. By default, this mount is defined as /tmp/airbyte_local and can be changed by defining the LOCAL_ROOT docker environment variable.
To view the CSV files, launch bash from the docker exec command as follows:
docker exec -it airbyte-server bash
Once connected, navigate to the /local folder and view the CSV files:
bash-4.2# cd /tmp/airbyte_local/
bash-4.2# ls
_airbyte_raw_neighborhoods.csv _airbyte_raw_restaurants.csv
In today’s data-rich world, building data pipelines to collect and transform heterogeneous data is an essential part of many business processes. Whether the goal is deriving business insights through analytics or creating a single view of the customer, Airbyte makes it easy to move data between MongoDB and many other data sources.