How to Query from Multiple MongoDB Databases Using MongoDB Atlas Data Federation

Joe Karlsson7 min read • Published Feb 07, 2022 • Updated Jan 23, 2024

AWS Atlas Data Federation

Rate this tutorial

Have you ever needed to make queries across databases, clusters, data centers, or even mix it with data stored in an AWS S3 blob? You probably haven't had to do all of these at once, but I'm guessing you've needed to do at least one of these at some point in your career. I'll also bet that you didn't know that this is possible (and easy) to do with MongoDB Atlas Data Federation! These allow you to configure multiple remote MongoDB deployments, and enable federated queries across all the configured deployments.

MongoDB Atlas Data Federation allows you to perform queries across many MongoDB systems, including Clusters, Databases, and even AWS S3 buckets. Here's how MongoDB Atlas Data Federation works in practice.

Note: In this post, we will be demoing how to query from two separate databases. However, if you want to query data from two separate collections that are in the same database, I would personally recommend that you use the $lookup (aggregation pipeline) query. $lookup performs a left outer join to an unsharded collection in the same database to filter documents from the "joined" collection for processing. In this scenario, using a federated database instance is not necessary.

tl;dr: In this post, I will guide you through the process of creating and connecting to a virtual database in MongoDB Atlas, configuring paths to collections in two separate MongoDB databases stored in separate datacenters, and querying data from both databases using only a single query.

Prerequisites

In order to follow along this tutorial, you need to:

Create at least two M10 clusters in MongoDB Atlas. For this demo, I have created two databases deployed to separate Cloud Providers (AWS and GCP). Click here for information on setting up a new MongoDB Atlas cluster. ✅ Already have an AWS account? Atlas supports paying for usage via the AWS Marketplace (AWS MP) without any upfront commitment — simply
sign up for MongoDB Atlas via AWS Marketplace.
Ensure that each database has been seeded by loading sample data into our Atlas cluster.
Have a Mongo Shell installed.

Deploy a Federated Database Instance

First, make sure you are logged into MongoDB Atlas. Next, select the Data Federation option on the left-hand navigation.

Create a Virtual Database

Click “set up manually” in the "create new federated database" dropdown in the top right corner of the UI.

Click Add Data Source on the Data Federation Configuration page, and select MongoDB Atlas Cluster. Select your first cluster, input sample_mflix as the database and theaters as the collection. Do this again for your second cluster and input sample_restaurants as the database and restaurants as the collection. For this tutorial, we will be analyzing restaurant data and some movie theater sample data to determine the number of theaters and restaurants in each zip code.

Repeat the steps above to connect the data for your other cluster and data source.

Next, drag these new data stores into your federated database instance and click save. It should look like this.

Connect to Your Federated Database Instance

The next thing we are going to need to do after setting up our federated database instance is to connect to it so we can start running queries on all of our data. First, click connect in the first box on the data federation overview page.

Click Add Your Current IP Address. Enter your IP address and an optional description, then click Add IP Address. In the Create a MongoDB User step of the dialog, enter a Username and a Password for your database user. (Note: You'll use this username and password combination to access data on your cluster.)

Run Queries Against Your Virtual Database

You can run your queries any way you feel comfortable. You can use MongoDB Compass, the MongoDB Shell, connect to an application, or anything you see fit. For this demo, I'm going to be running my queries using MongoDB Visual Studio Code plugin and leveraging its Playgrounds feature. For more information on using this plugin, check out this post on our Developer Hub.

Make sure you are using the connection string for your federated database instance and not for your individual MongoDB databases. To get the connection string for your new federated database instance, click the connect button on the MongoDB Atlas Data Federation overview page. Then click on Connect using MongoDB Compass. Copy this connection string to your clipboard. Note: You will need to add the password of the user that you authorized to access your virtual database here.

You're going to paste this connection string into the MongoDB Visual Studio Code plugin when you add a new connection.

Note: If you need assistance with getting started with the MongoDB Visual Studio Code Plugin, be sure to check out my post, How To Use The MongoDB Visual Studio Code Plugin, and the official documentation.

You can run operations using the MongoDB Query Language (MQL) which includes most, but not all, standard server commands. To learn which MQL operations are supported, see the MQL Support documentation.

The following queries use the paths that you added to your Federated Database Instance during deployment.

For this query, I wanted to construct a unique aggregation that could only be used if both sample datasets were combined using federated query and MongoDB Atlas Data Federation. For this example, we will run a query to determine the number of theaters and restaurants in each zip code, by analyzing the sample_restaurants.restaurants and the sample_mflix.theaters datasets that were entered above in our clusters.

I want to make it clear that these data sources are still being stored in different MongoDB databases in completely different datacenters, but by leveraging MongoDB Atlas Data Federation, we can query all of our databases at once as if all of our data is in a single collection! The following query is only possible using federated search! How cool is that?

1 // MongoDB Playground
2 
3 // Select the database to use. VirtualDatabase0 is the default name for a MongoDB Atlas Data Federation database. If you renamed your database, be sure to put in your virtual database name here.
4 use('VirtualDatabase0');
5 
6 // We are connecting to `VirtualCollection0` since this is the default collection that MongoDB Atlas Data Federation calls your collection. If you renamed it, be sure to put in your virtual collection name here.
7 db.VirtualCollection0.aggregate([
8 
9   // In the first stage of our aggregation pipeline, we extract and normalize the dataset to only extract zip code data from our dataset.
10   {
11     '$project': {
12       'restaurant_zipcode': '$address.zipcode',
13       'theater_zipcode': '$location.address.zipcode',
14       'zipcode': {
15         '$ifNull': [
16           '$address.zipcode', '$location.address.zipcode'
17         ]
18       }
19     }
20   },
21 
22   // In the second stage of our aggregation, we group the data based on the zip code it resides in. We also push each unique restaurant and theater into an array, so we can get a count of the number of each in the next stage.
23   // We are calculating the `total` number of theaters and restaurants by using the aggregator function on $group. This sums all the documents that share a common zip code.
24   {
25     '$group': {
26       '_id': '$zipcode',
27       'total': {
28         '$sum': 1
29       },
30       'theaters': {
31         '$push': '$theater_zipcode'
32       },
33       'restaurants': {
34         '$push': '$restaurant_zipcode'
35       }
36     }
37   },
38 
39   // In the third stage, we get the size or length of the `theaters` and `restaurants` array from the previous stage. This gives us our totals for each category.
40   {
41     '$project': {
42       'zipcode': '$_id',
43       'total': '$total',
44       'total_theaters': {
45         '$size': '$theaters'
46       },
47       'total_restaurants': {
48         '$size': '$restaurants'
49       }
50     }
51   },
52 
53   // In our final stage, we sort our data in descending order so that the zip codes with the most number of restaurants and theaters are listed at the top.
54   {
55     '$sort': {
56       'total': -1
57     }
58   }
59 ])

This outputs the zip codes with the most theaters and restaurants.

1 [
2   {
3     "_id": "10003",
4     "zipcode": "10003",
5     "total": 688,
6     "total_theaters": 2,
7     "total_restaurants": 686
8   },
9   {
10     "_id": "10019",
11     "zipcode": "10019",
12     "total": 676,
13     "total_theaters": 1,
14     "total_restaurants": 675
15   },
16   {
17     "_id": "10036",
18     "zipcode": "10036",
19     "total": 611,
20     "total_theaters": 0,
21     "total_restaurants": 611
22   },
23   {
24     "_id": "10012",
25     "zipcode": "10012",
26     "total": 408,
27     "total_theaters": 1,
28     "total_restaurants": 407
29   },
30   {
31     "_id": "11354",
32     "zipcode": "11354",
33     "total": 379,
34     "total_theaters": 1,
35     "total_restaurants": 378
36   },
37   {
38     "_id": "10017",
39     "zipcode": "10017",
40     "total": 378,
41     "total_theaters": 1,
42     "total_restaurants": 377
43   }
44  ]

Wrap-Up

Congratulations! You just set up an Federated Database Instance that contains databases being run in different cloud providers. Then, you queried both databases using the MongoDB Aggregation pipeline by leveraging Atlas Data Federation and federated queries. This allows us to more easily run queries on data that is stored in multiple MongoDB database deployments across clusters, data centers, and even in different formats, including S3 blob storage.

Screenshot from the MongoDB Atlas Data Federation overview page showing the information for our new Virtual Database.

If you have questions, please head to our developer community website where the MongoDB engineers and the MongoDB community will help you build your next big idea with MongoDB.

Additional Resources

Rate this tutorial

Quickstart

MongoDB Atlas Serverless Instances: Quick Start

Aug 13, 2024 | 4 min read

Tutorial

Synchronize Your Mobile Application With MongoDB Atlas and Google Cloud MySQL

Feb 08, 2024 | 6 min read

Tutorial

Calling the MongoDB Atlas Administration API: How to Do it from Node, Python, and Ruby

Jun 18, 2024 | 4 min read

Tutorial

Introducing Atlas Stream Processing Support Within the MongoDB for VS Code Extension

Mar 05, 2024 | 4 min read

Prerequisites
Deploy a Federated Database Instance
Connect to Your Federated Database Instance
Run Queries Against Your Virtual Database
Wrap-Up
Additional Resources

Atlas

How to Query from Multiple MongoDB Databases Using MongoDB Atlas Data Federation

Prerequisites

Deploy a Federated Database Instance

Connect to Your Federated Database Instance

Run Queries Against Your Virtual Database

Wrap-Up

Additional Resources

Related

MongoDB Atlas Serverless Instances: Quick Start

Synchronize Your Mobile Application With MongoDB Atlas and Google Cloud MySQL

Calling the MongoDB Atlas Administration API: How to Do it from Node, Python, and Ruby

Introducing Atlas Stream Processing Support Within the MongoDB for VS Code Extension

Table of Contents

1	// MongoDB Playground
2
3	// Select the database to use. VirtualDatabase0 is the default name for a MongoDB Atlas Data Federation database. If you renamed your database, be sure to put in your virtual database name here.
4	use('VirtualDatabase0');
5
6	// We are connecting to `VirtualCollection0` since this is the default collection that MongoDB Atlas Data Federation calls your collection. If you renamed it, be sure to put in your virtual collection name here.
7	db.VirtualCollection0.aggregate([
8
9	// In the first stage of our aggregation pipeline, we extract and normalize the dataset to only extract zip code data from our dataset.
10	{
11	'$project': {
12	'restaurant_zipcode': '$address.zipcode',
13	'theater_zipcode': '$location.address.zipcode',
14	'zipcode': {
15	'$ifNull': [
16	'$address.zipcode', '$location.address.zipcode'
17	]
18	}
19	}
20	},
21
22	// In the second stage of our aggregation, we group the data based on the zip code it resides in. We also push each unique restaurant and theater into an array, so we can get a count of the number of each in the next stage.
23	// We are calculating the `total` number of theaters and restaurants by using the aggregator function on $group. This sums all the documents that share a common zip code.
24	{
25	'$group': {
26	'_id': '$zipcode',
27	'total': {
28	'$sum': 1
29	},
30	'theaters': {
31	'$push': '$theater_zipcode'
32	},
33	'restaurants': {
34	'$push': '$restaurant_zipcode'
35	}
36	}
37	},
38
39	// In the third stage, we get the size or length of the `theaters` and `restaurants` array from the previous stage. This gives us our totals for each category.
40	{
41	'$project': {
42	'zipcode': '$_id',
43	'total': '$total',
44	'total_theaters': {
45	'$size': '$theaters'
46	},
47	'total_restaurants': {
48	'$size': '$restaurants'
49	}
50	}
51	},
52
53	// In our final stage, we sort our data in descending order so that the zip codes with the most number of restaurants and theaters are listed at the top.
54	{
55	'$sort': {
56	'total': -1
57	}
58	}
59	])

1	[
2	{
3	"_id": "10003",
4	"zipcode": "10003",
5	"total": 688,
6	"total_theaters": 2,
7	"total_restaurants": 686
8	},
9	{
10	"_id": "10019",
11	"zipcode": "10019",
12	"total": 676,
13	"total_theaters": 1,
14	"total_restaurants": 675
15	},
16	{
17	"_id": "10036",
18	"zipcode": "10036",
19	"total": 611,
20	"total_theaters": 0,
21	"total_restaurants": 611
22	},
23	{
24	"_id": "10012",
25	"zipcode": "10012",
26	"total": 408,
27	"total_theaters": 1,
28	"total_restaurants": 407
29	},
30	{
31	"_id": "11354",
32	"zipcode": "11354",
33	"total": 379,
34	"total_theaters": 1,
35	"total_restaurants": 378
36	},
37	{
38	"_id": "10017",
39	"zipcode": "10017",
40	"total": 378,
41	"total_theaters": 1,
42	"total_restaurants": 377
43	}
44	]