Docs Menu
Docs Home
/
MongoDB Atlas
/ / /

Deploy a Federated Database Instance

On this page

  • Required Access
  • Prerequisites
  • Procedure

This page describes how to deploy a federated database instance for accessing data in your AWS S3 buckets.

To deploy a federated database instance, you must have Project Owner access to the project. Users with Organization Owner access must add themselves as a Project Owner to the project before deploying a federated database instance.

Before you begin, you will need to:

  • Create a MongoDB Atlas account, if you do not have one already.

  • Install the AWS CLI.

  • Configure the AWS CLI to access your AWS account. Alternatively, you must have access to the AWS Management Console with permission to create IAM roles.

  • Optional. Set Up Unified AWS Access.

To create a new Data Federation database using the Atlas CLI, run the following command:

atlas dataFederation create <name> [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataFederation create.

Tip

See: Related Links

1
2
3
  1. Click the Create New Federated Database dropdown.

  2. Select Manual Setup.

4

You can select AWS, Azure, or Google Cloud. Once your federated database instance is created, you can't change the cloud provider where Atlas Data Federation processes your queries.

If you are configuring a federated database instance for data in AWS S3 bucket, you can't choose a cloud provider that is different from the cloud provider that is hosting your data. That is, you must choose AWS.

5

Defaults to FederatedDatabaseInstance[n]. Once your federated database instance is created, you can't change its name.

6
  • For a guided experience, click Visual Editor.

  • To edit the raw JSON, click JSON Editor.

7
  1. Select the dataset for your federated database instance from the Data Sources section.

    Click Add Data Sources to select your data store.

  2. Specify your data store.

    Choose Amazon S3 to configure a federated database instance for data in AWS S3 buckets.

    Corresponds to stores.[n].provider JSON configuration setting.

  3. Select an AWS IAM role for Atlas.

    You can select an existing AWS IAM role that Atlas is authorized for from the role selection dropdown list or choose Authorize an AWS IAM Role to authorize a new role.

    If you selected an existing role that Atlas is authorized for, proceed to the next step to list your AWS S3 buckets.

    If you are authorizing Atlas for an existing role or are creating a new role, complete the following steps before proceeding to the next step:

    1. Select Authorize an AWS IAM Role to authorize a new role or select an existing role from the dropdown and click Next.

    2. Use the AWS ARN and unique External ID in the Add Atlas to the trust relationships of your AWS IAM role section to add Atlas to the trust relationships of an existing or new AWS IAM role.

      In the Atlas UI, click and expand one of the following:

      • The Create New Role with the AWS CLI shows how to use the ARN and the unique External ID to add Atlas to the trust relationships of a new AWS IAM role. Follow the steps in the Atlas UI for creating a new role. To learn more, see Create New Role with the AWS CLI.

        When authorizing a new role, if you quit the Configure a New Data Lake workflow:

        • Before validating the role, Atlas will not create the federated database instance. You can go to the Atlas Integrations page to authorize a new role. You can resume the workflow when you have the AWS IAM role ARN.

        • After validating the role, Atlas will not create the federated database instance. However, the role is available in the role selection dropdown and can be used to create a federated database instance. You do not need to authorize the role again.

      • The Add Trust Relationships to an Existing Role shows how to use the ARN and the unique External ID to add Atlas to the trust relationships of an existing AWS IAM role. Follow the steps in the Atlas UI for adding Atlas to the trust relationship to an existing role. To learn more, see Add Trust Relationships to an Existing Role .

      Important

      If you modify your custom AWS role ARN in the future, ensure that the access policy of the role includes the appropriate access to the S3 resources for the federated database instance.

    3. Click Next.

  4. Enter the S3 bucket information.

    1. Enter the name of your S3 bucket.

      Corresponds to stores.[n].bucket JSON configuration setting.

    2. Specify whether the bucket is Read-only or both Read and write.

      Atlas can only query Read-only buckets; if you wish to query and save query results to your S3 bucket, choose Read and write. To save query results to your S3 bucket, the role policy that grants Atlas access to your AWS resources must include the s3:PutObject and s3:DeleteObject permissions in addition to the s3:ListBucket, s3:GetObject, s3:GetObjectVersion, and s3:GetBucketLocation permissions, which grant read access. See step 4 below to learn more about assigning access policy to your AWS IAM role.

    3. Select the region of the S3 bucket.

      Corresponds to stores.[n].region JSON configuration setting.

      Note

      You can't create a federated database instance if Atlas Data Federation is unable to retrieve the region of the specified S3 bucket.

    4. Optional. Specify a prefix that Data Federation should use when searching the files in the S3 bucket. If omitted, Data Federation does a recursive search for all files from the root of the S3 bucket.

      Corresponds to stores.[n].prefix JSON configuration setting.

    5. Click Next.

  5. Assign an access policy to your AWS IAM role.

    1. Follow the steps in the Atlas user interface to assign an access policy to your AWS IAM role.

      Your role policy for read-only or read and write access should look similar to the following:

      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Effect": "Allow",
      "Action": [
      "s3:ListBucket",
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:GetBucketLocation"
      ],
      "Resource": [
      <role arn>
      ]
      }
      ]
      }
      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Effect": "Allow",
      "Action": [
      "s3:ListBucket",
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:GetBucketLocation",
      "s3:PutObject",
      "s3:DeleteObject"
      ],
      "Resource": [
      <role arn>
      ]
      }
      ]
      }
    2. Click Next.

  6. Define the path structure for your files in the S3 bucket and click Next.

    For example:

    s3://<bucket-name>/<path>/<to>/<files>/<filename>.<file-extension>

    To add additional paths to data on your S3 bucket, click Add Data Source and enter the path. To learn more about paths, see Define Path for S3 Data.

    Corresponds to databases.[n].collections.[n].dataSources.[n].path JSON configuration setting.

  7. Create the virtual databases, collections, and views and map the databases, collections, and views to your data store.

    1. (Optional) Click the for the:

      • Database to edit the database name. Defaults to VirtualDatabase[n].

        Corresponds to databases.[n].name JSON configuration setting.

      • Collection to edit the collection name. Defaults to VirtualCollection[n].

        Corresponds to databases.[n].collections.[n].name JSON configuration setting.

      • View to edit the view name.

      You can click:

      • Add Database to add databases and collections.

      • associated with the database to add collections to the database.

      • associated with the collection to add views on the collection. To create a view, you must specify:

        • The name of the view.

        • The pipeline to apply to the view.

          The view definition pipeline cannot include the $out or the $merge stage. If the view definition includes nested pipeline stages such as $lookup or $facet, this restriction applies to those nested pipelines as well.

        To learn more about views, see:

      • associated with the database, collection, or view to remove it.

    2. Select AWS S3 from the dropdown in the Data Sources section.

    3. Drag and drop the data store to map with the collection.

      Corresponds to databases.[n].collections.[n].dataSources JSON configuration setting.

Your configuration for AWS S3 data store should look similar to the following:

{
"stores" : [
{
"name" : "<string>",
"provider": "<string>",
"region" : "<string>",
"bucket" : "<string>",
"additionalStorageClasses" : ["<string>"],
"prefix" : "<string>",
"includeTags": <boolean>,
"delimiter": "<string>",
"public": <boolean>
}
],
"databases" : [
{
"name" : "<string>",
"collections" : [
{
"name" : "<string>",
"dataSources" : [
{
"storeName" : "<string>",
"path" : "<string>",
"defaultFormat" : "<string>",
"provenanceFieldName": "<string>",
"omitAttributes": true | false
}
]
}
],
"maxWildcardCollections" : <integer>,
"views" : [
{
"name" : "<string>",
"source" : "<string>",
"pipeline" : "<string>"
}
]
}
]
}

For more information on the configuration settings, see Define Data Stores for a Federated Database Instance.

  1. Define your AWS S3 data store.

    Edit the JSON configuration settings shown in the UI for stores. Your stores cofiguration setting should resemble the following:

    "stores" : [
    {
    "name" : "<string>",
    "provider" : "<string>",
    "region" : "<string>",
    "bucket" : "<string>",
    "additionalStorageClasses" : ["<string>"],
    "prefix" : "<string>",
    "delimiter" : "<string>",
    "includeTags": <boolean>,
    "public": <boolean>
    }
    ]

    To learn more about these configuration settings, see stores.

  2. Define your federated database instance virtual databases, collections, and views.

    Edit the JSON configuration settings shown in the UI for databases. Your databases cofiguration setting should resemble the following:

    "databases" : [
    {
    "name" : "<string>",
    "collections" : [
    {
    "name" : "<string>",
    "dataSources" : [
    {
    "storeName" : "<string>",
    "defaultFormat" : "<string>",
    "path" : "<string>",
    "provenanceFieldName": "<string>",
    "omitAttributes": <boolean>
    }
    ]
    }
    ],
    "maxWildcardCollections" : <integer>,
    "views" : [
    {
    "name" : "<string>",
    "source" : "<string>",
    "pipeline" : "<string>"
    }
    ]
    }
    ]

    To learn more about these configuration settings, see databases.

8

To add other data stores for federated queries, see:

Note

You can't connect an Azure Blob Storage data store for running federated queries across cloud providers.

9

Back

AWS S3 Bucket