AWS Glue Visual ETL for Your Data in MongoDB Atlas

Venkatesh Shanbhag, Igor Alekseev, Anuj Panchal3 min read • Published Nov 22, 2024 • Updated Nov 22, 2024

Atlas

Rate this tutorial

AWS Glue is a serverless data integration service. It simplifies data processing for customers. Migrating data between MongoDB Atlas and AWS becomes efficient with AWS Glue's visual ETL (Extract, Transform, and Load) capabilities. The visual interface and built-in features allow data extraction and insertion to and from MongoDB Atlas collections, with optional transformations, before loading it to the target location. This serverless and scalable approach ensures cost-effective data movement while maintaining security through IAM roles. In this post, we will go through a visual approach to utilize MongoDB Atlas and AWS Glue for building pipelines between the two platforms.

To follow this post and test out AWS Glue’s capabilities with MongoDB Atlas, we need an AWS account and a subscription to MongoDB Atlas. You can subscribe to MongoDB Atlas from the AWS marketplace.

The post describes a process for transferring data between MongoDB Atlas and AWS S3 using AWS Glue's visual ETL capabilities. These capabilities allow developers to create ETL pipelines without the knowledge of Spark or SQL by leveraging AWS Glue Studio. This post highlights the benefits of using AWS Glue for data transformation and integration with other AWS services. AWS S3—being highly scalable, durable, and cost-effective object storage—can be used as data lakes, a data warehousing solution, machine learning, media streaming, backup and recovery, and web hosting.

Set up MongoDB Atlas

Configure a MongoDB cluster on AWS. For instructions, refer to How to Set Up a MongoDB Cluster.
Configure PrivateLink by following the steps described in Connecting Applications Securely to a MongoDB Atlas Data Plane with AWS PrivateLink. With AWS PrivateLink, we will simplify our networking architecture and make sure the traffic stays on the AWS network.
To obtain your MongoDB cluster connection string from the Connect UI on the MongoDB Atlas console, navigate to your Atlas home screen and click on Connect for the AWS cluster you want to connect. Select the Private Endpoint and Connection Method.
Copy the SRV connection string. We use this SRV connection string in the subsequent steps.

The following screenshot shows that we have loaded a sample collection (in this case, sample weather data) in MongoDB Atlas, which we will connect to in the next steps. Note: The records in this collection include several arrays as well as nested data.

Set up the MongoDB Atlas connection with AWS Glue

Before we can configure the AWS Glue crawler, we need to create the MongoDB Atlas connection in AWS Glue.

On the AWS Glue Studio console, choose Connectors in the navigation pane.
Choose Create connection.

When filling out the connection details, use the SRV connection string we obtained earlier in MongoDB Atlas.
In the Network options section, add the VPC and subnet. Important: The VPC and the subnet must correspond to the PrivateLink settings you configured earlier.

Create and run an ETL job to extract data from MongoDB Atlas into S3

Once the connection is configured, Navigate to the Glue | ETL jobs | Visual ETL to create the ETL job.

To open the Glue visual editor, either click on Visual ETL or Author and edit ETL jobs, and then click on visual ETL on the next screen.

If you have saved jobs, you can access them from the AWS Glue studio, as well.

Click on (+) on the home screen of the Glue visual editor to add nodes. Search for MongoDB and select MongoDB as the source to read from the previously created connection.

Select the MongoDB connection and provide the database and collection name. Save the pipeline.

Click on (+) to add one more node and search for S3. Select Amazon S3 as Target. Select MongoDB as Node parent.

Select the data format for your destination data file. Select the S3 bucket where you want to write the data and click on Save.

Run the job by clicking the button in the top right corner.

The job will take a few minutes to complete. You can monitor your job on the Runs tab, as shown below.

You can verify your data written to S3 by navigating to your S3 bucket.

Watch this Video to see the steps in action

AWS Glue Visual ETL simplifies creating and managing data transformations, allowing developers to create ETL pipelines without the specialized knowledge of data engineering tools. It offers connectors to numerous third-party and AWS-native products and services. This enables you to enrich data from various sources for analysis in data warehousing, while building efficient pipelines effortlessly with Glue Visual ETL. For advanced data transformations involving MongoDB Atlas, refer to the AWS Glue documentation.

Questions? Join us in the MongoDB Developer Community.

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

How to Deploy MongoDB Atlas with AWS CDK in TypeScript

Jan 23, 2024 | 5 min read

Tutorial

Building a Knowledge Base and Visualization Graphs for RAG With MongoDB

Sep 02, 2024 | 12 min read

Tutorial

Getting Started with MongoDB Atlas and Azure Functions using Node.js

Jan 13, 2025 | 8 min read

Tutorial

Getting Started With Deno 2.0 & MongoDB

Oct 22, 2024 | 13 min read

Set up MongoDB Atlas
Set up the MongoDB Atlas connection with AWS Glue
Create and run an ETL job to extract data from MongoDB Atlas into S3

Atlas

AWS Glue Visual ETL for Your Data in MongoDB Atlas

Set up MongoDB Atlas

Set up the MongoDB Atlas connection with AWS Glue

Top Comments in Forums

Related

How to Deploy MongoDB Atlas with AWS CDK in TypeScript

Building a Knowledge Base and Visualization Graphs for RAG With MongoDB

Getting Started with MongoDB Atlas and Azure Functions using Node.js

Getting Started With Deno 2.0 & MongoDB

Table of Contents