Deploy a Federated Database Instance
On this page
This page describes how to deploy a federated database instance for accessing data in your AWS S3 buckets.
Required Access
To deploy a federated database instance, you must have Project Owner
access to the project.
Users with Organization Owner
access must add themselves as a Project Owner
to the project before deploying a federated database instance.
Prerequisites
Before you begin, you will need to:
Create a MongoDB Atlas account, if you do not have one already.
Configure the AWS CLI to access your AWS account. Alternatively, you must have access to the AWS Management Console with permission to create IAM roles.
Optional. Set Up Unified AWS Access.
Procedure
To create a new Data Federation database using the Atlas CLI, run the following command:
atlas dataFederation create <name> [options]
To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataFederation create.
Select the cloud provider where Atlas Data Federation will process your queries against your federated database instance.
You can select AWS, Azure, or Google Cloud. Once your federated database instance is created, you can't change the cloud provider where Atlas Data Federation processes your queries.
If you are configuring a federated database instance for data in AWS S3 bucket, you can't choose a cloud provider that is different from the cloud provider that is hosting your data. That is, you must choose AWS.
Specify your AWS S3 data store and configure federated databases and virtual collections that map to your data store.
Select the dataset for your federated database instance from the Data Sources section.
Click Add Data Sources to select your data store.
Specify your data store.
Choose Amazon S3 to configure a federated database instance for data in AWS S3 buckets.
Corresponds to
stores.[n].provider
JSON configuration setting.Select an AWS IAM role for Atlas.
You can select an existing AWS IAM role that Atlas is authorized for from the role selection dropdown list or choose Authorize an AWS IAM Role to authorize a new role.
If you selected an existing role that Atlas is authorized for, proceed to the next step to list your AWS S3 buckets.
If you are authorizing Atlas for an existing role or are creating a new role, complete the following steps before proceeding to the next step:
Select Authorize an AWS IAM Role to authorize a new role or select an existing role from the dropdown and click Next.
Use the AWS ARN and unique External ID in the Add Atlas to the trust relationships of your AWS IAM role section to add Atlas to the trust relationships of an existing or new AWS IAM role.
In the Atlas UI, click and expand one of the following:
The Create New Role with the AWS CLI shows how to use the ARN and the unique External ID to add Atlas to the trust relationships of a new AWS IAM role. Follow the steps in the Atlas UI for creating a new role. To learn more, see Create New Role with the AWS CLI.
When authorizing a new role, if you quit the
Configure a New Data Lake
workflow:Before validating the role, Atlas will not create the federated database instance. You can go to the Atlas Integrations page to authorize a new role. You can resume the workflow when you have the AWS IAM role ARN.
After validating the role, Atlas will not create the federated database instance. However, the role is available in the role selection dropdown and can be used to create a federated database instance. You do not need to authorize the role again.
The Add Trust Relationships to an Existing Role shows how to use the ARN and the unique External ID to add Atlas to the trust relationships of an existing AWS IAM role. Follow the steps in the Atlas UI for adding Atlas to the trust relationship to an existing role. To learn more, see Add Trust Relationships to an Existing Role .
Important
If you modify your custom AWS role ARN in the future, ensure that the access policy of the role includes the appropriate access to the S3 resources for the federated database instance.
Click Next.
Enter the S3 bucket information.
Enter the name of your S3 bucket.
Corresponds to
stores.[n].bucket
JSON configuration setting.Specify whether the bucket is Read-only or both Read and write.
Atlas can only query Read-only buckets; if you wish to query and save query results to your S3 bucket, choose Read and write. To save query results to your S3 bucket, the role policy that grants Atlas access to your AWS resources must include the
s3:PutObject
ands3:DeleteObject
permissions in addition to thes3:ListBucket
,s3:GetObject
,s3:GetObjectVersion
, ands3:GetBucketLocation
permissions, which grant read access. See step 4 below to learn more about assigning access policy to your AWS IAM role.Select the region of the S3 bucket.
Corresponds to
stores.[n].region
JSON configuration setting.Note
You can't create a federated database instance if Atlas Data Federation is unable to retrieve the region of the specified S3 bucket.
Optional. Specify a prefix that Data Federation should use when searching the files in the S3 bucket. If omitted, Data Federation does a recursive search for all files from the root of the S3 bucket.
Corresponds to
stores.[n].prefix
JSON configuration setting.Click Next.
Assign an access policy to your AWS IAM role.
Follow the steps in the Atlas user interface to assign an access policy to your AWS IAM role.
Your role policy for read-only or read and write access should look similar to the following:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:GetObjectVersion", "s3:GetBucketLocation" ], "Resource": [ <role arn> ] } ] } { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:GetObjectVersion", "s3:GetBucketLocation", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ <role arn> ] } ] } Click Next.
Define the path structure for your files in the S3 bucket and click Next.
For example:
s3://<bucket-name>/<path>/<to>/<files>/<filename>.<file-extension> To add additional paths to data on your S3 bucket, click Add Data Source and enter the path. To learn more about paths, see Define Path for S3 Data.
Corresponds to
databases.[n].collections.[n].dataSources.[n].path
JSON configuration setting.Create the virtual databases, collections, and views and map the databases, collections, and views to your data store.
(Optional) Click the for the:
Database to edit the database name. Defaults to
VirtualDatabase[n]
.Corresponds to
databases.[n].name
JSON configuration setting.Collection to edit the collection name. Defaults to
VirtualCollection[n]
.Corresponds to
databases.[n].collections.[n].name
JSON configuration setting.View to edit the view name.
You can click:
Add Database to add databases and collections.
associated with the database to add collections to the database.
associated with the collection to add views on the collection. To create a view, you must specify:
The name of the view.
The pipeline to apply to the view.
The view definition pipeline cannot include the
$out
or the$merge
stage. If the view definition includes nested pipeline stages such as$lookup
or$facet
, this restriction applies to those nested pipelines as well.
To learn more about views, see:
associated with the database, collection, or view to remove it.
Select AWS S3 from the dropdown in the Data Sources section.
Drag and drop the data store to map with the collection.
Corresponds to
databases.[n].collections.[n].dataSources
JSON configuration setting.
Your configuration for AWS S3 data store should look similar to the following:
{ "stores" : [ { "name" : "<string>", "provider": "<string>", "region" : "<string>", "bucket" : "<string>", "additionalStorageClasses" : ["<string>"], "prefix" : "<string>", "includeTags": <boolean>, "delimiter": "<string>", "public": <boolean> } ], "databases" : [ { "name" : "<string>", "collections" : [ { "name" : "<string>", "dataSources" : [ { "storeName" : "<string>", "path" : "<string>", "defaultFormat" : "<string>", "provenanceFieldName": "<string>", "omitAttributes": true | false } ] } ], "maxWildcardCollections" : <integer>, "views" : [ { "name" : "<string>", "source" : "<string>", "pipeline" : "<string>" } ] } ] }
For more information on the configuration settings, see Define Data Stores for a Federated Database Instance.
Define your AWS S3 data store.
Edit the JSON configuration settings shown in the UI for
stores
. Yourstores
cofiguration setting should resemble the following:"stores" : [ { "name" : "<string>", "provider" : "<string>", "region" : "<string>", "bucket" : "<string>", "additionalStorageClasses" : ["<string>"], "prefix" : "<string>", "delimiter" : "<string>", "includeTags": <boolean>, "public": <boolean> } ] To learn more about these configuration settings, see
stores
.Define your federated database instance virtual databases, collections, and views.
Edit the JSON configuration settings shown in the UI for
databases
. Yourdatabases
cofiguration setting should resemble the following:"databases" : [ { "name" : "<string>", "collections" : [ { "name" : "<string>", "dataSources" : [ { "storeName" : "<string>", "defaultFormat" : "<string>", "path" : "<string>", "provenanceFieldName": "<string>", "omitAttributes": <boolean> } ] } ], "maxWildcardCollections" : <integer>, "views" : [ { "name" : "<string>", "source" : "<string>", "pipeline" : "<string>" } ] } ] To learn more about these configuration settings, see
databases
.
Optional: Repeat steps in the Visual Editor or JSON Editor tab above to define additional Azure Azure Blob Storage data stores.
To add other data stores for federated queries, see:
Note
You can't connect an Azure Blob Storage data store for running federated queries across cloud providers.