Deploy a Federated Database Instance

On this page

Required Access

Prerequisites
Procedure

This page describes how to deploy a federated database instance for accessing data in your Google Cloud Storage buckets.

Required Access

To deploy a federated database instance, you must have Project Owner access to the project. Users with Organization Owner access must add themselves as a Project Owner to the project before deploying a federated database instance.

Prerequisites

Before you begin, you will need to:

Create a MongoDB Atlas account, if you do not have one already.
Install the gcloud CLI.
Configure the gcloud CLI to access your Google Cloud account. Alternatively, you must have access to the Google Cloud console with permission to create IAM roles.
Optional. Set Up a Google Cloud service account.

Procedure

To create a new Data Federation database using the Atlas CLI, run the following command:

atlas dataFederation create <name> [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataFederation create.

Tip

See: Related Links

Log in to MongoDB Atlas.

Select the Data Federation option on the left-hand navigation.

Create a federated database instance.

Click the Create New Federated Database dropdown.
Select Manual Setup.

Select the cloud provider where Atlas Data Federation will process your queries against your federated database instance.

You can select AWS, Azure, or Google Cloud. Once your federated database instance is created, you can't change the cloud provider where Atlas Data Federation processes your queries.

You must configure your federated database instance on the same cloud provider as the data stores to which your federated database instance maps.

Type a name for your federated database instance in the Federated Database Instance Name field and click Continue.

Defaults to FederatedDatabaseInstance[n]. Once your federated database instance is created, you can't change its name.

Select the configuration method.

For a guided experience, click Visual Editor.
To edit the raw JSON, click JSON Editor.

Specify your Google Cloud Storage bucket and configure federated databases and virtual collections that map to your data store.

Select the dataset for your federated database instance from the Data Sources section.
Click Add Data Sources to select your data store.
Specify your data store.
Choose Google Cloud Storage to configure a federated database instance for data in Google Cloud Storage buckets.
Corresponds to stores.[n].provider JSON configuration setting.
Select a Google Cloud Service Account for Atlas.
You can select an existing Google Cloud Service Account that Atlas is authorized for from the role selection dropdown list or choose Create a Google Service Account.
If you selected an existing account that Atlas is authorized for, click Next and proceed to the next step to list your Google Cloud Storage buckets.
If you are creating a new service account, select Create a Google Service Account and click Next.
Tip
See also:
- Set Up and Manage Google Cloud Service Account Access </security/set-up-unified-aws-access/>`
- Create a Cloud Provider Access Role
In the Configure Google Cloud Storage modal, follow the provided instructions to configure the Google Cloud CLI, then click Next.
Configure Google Cloud Storage.
1. Enter the name of your Google Cloud Storage bucket.
  Corresponds to the stores.[n].bucket JSON configuration setting.
2. Specify whether the bucket is Read-only or both Read and write.
  Atlas can only query Read-only buckets; if you wish to query and save query results to your Google Cloud Storage bucket, choose Read and write.
3. Select the region of the Google Cloud Storage bucket.
  Corresponds to the stores.[n].region JSON configuration setting.
  Note
  You can't create a federated database instance if Atlas Data Federation is unable to retrieve the region of the specified Google Cloud Storage bucket.
4. Grant access to your Google Cloud project.
  1. In the Google Cloud console for the project that hosts your Google Cloud Storage bucket, navigate to IAM and Admin, then navigate to IAM.
  2. Click Grant Access. In the modal that appears, in the New principals field, enter the Google Cloud Service Account associated with your federated database instance.
  3. To grant read-only access to the bucket, apply the storage.viewer role. To grant read-write access to the bucket, additionally apply the storage.editor role.
5. Optional. Specify a prefix that Data Federation should use when searching the files in the Google Cloud Storage bucket. If omitted, Data Federation does a recursive search for all files from the root of the Google Cloud Storage bucket.
  Corresponds to the stores.[n].prefix JSON configuration setting.
6. Click Validate and finish.
Define the path structure for your files in the Google Cloud Storage bucket and click Next.
For example:
```
https://storage.googleapis.com/<path>/<to>/<files>/<filename>.<file-extension>
```
To add additional paths to data on your Google Cloud Storage bucket, click Add Data Source and enter the path. To learn more about paths, see Define Path for S3 Data.
Corresponds to the databases.[n].collections.[n].dataSources.[n].path JSON configuration setting.
Create the virtual databases, collections, and views and map the databases, collections, and views to your data store.
1. (Optional) Click the for the:
  - Database to edit the database name. Defaults to VirtualDatabase[n].
    Corresponds to databases.[n].name JSON configuration setting.
  - Collection to edit the collection name. Defaults to VirtualCollection[n].
    Corresponds to databases.[n].collections.[n].name JSON configuration setting.
  - View to edit the view name.
  You can click:
  - Add Database to add databases and collections.
  - associated with the database to add collections to the database.
  - associated with the collection to add views on the collection. To create a view, you must specify:
    - The name of the view.
    - The pipeline to apply to the view.
      The view definition pipeline cannot include the $out or the $merge stage. If the view definition includes nested pipeline stages such as $lookup or $facet, this restriction applies to those nested pipelines as well.
    To learn more about views, see:
    - Views
    - db.createView
  - associated with the database, collection, or view to remove it.
2. Select Google Cloud Storage from the dropdown in the Data Sources section.
3. Drag and drop the data store to map with the collection.
  Corresponds to databases.[n].collections.[n].dataSources JSON configuration setting.

Your configuration for Google Cloud Storage data store should look similar to the following:

{
  "stores" : [
    {
      "name" : "<string>",
      "provider" : "<string>",
      "region" : "<string>",
      "bucket" : "<string>",
      "prefix": "<string>",
      "delimiter": "<string>"
    }
  ],
  "databases" : [
    {
      "name" : "<string>",
      "collections" : [
	{
	  "name" : "<string>",
	  "dataSources" : [
	    {
	      "storeName" : "<string>",
	      "path" : "<string>",
	      "defaultFormat" : "<string>",
	      "provenanceFieldName": "<string>",
	      "omitAttributes": <boolean>
	    }
	  ]
	}
      ],
      "maxWildcardCollections" : <integer>,
      "views" : [ 
	{
	  "name" : "<string>", 
	  "source" : "<string>", 
	  "pipeline" : "<string>" 
	}
      ] 
    }
  ]
}

For more information on the configuration settings, see Define Data Stores for a Federated Database Instance.

Define your AWS S3 data store.

Edit the JSON configuration settings shown in the UI for stores. Your stores cofiguration setting should resemble the following:

   "stores" : [
     {
       "name" : "<string>",
       "provider" : "<string>",
       "region" : "<string>",
       "bucket" : "<string>",
       "additionalStorageClasses" : ["<string>"],
       "prefix" : "<string>",
       "delimiter" : "<string>",
       "includeTags": <boolean>,
       "public": <boolean>
     }
   ]

To learn more about these configuration settings, see stores.

Define your federated database instance virtual databases, collections, and views.

Edit the JSON configuration settings shown in the UI for databases. Your databases cofiguration setting should resemble the following:

"databases" : [
  {
    "name" : "<string>",
    "collections" : [
      {
        "name" : "<string>",
        "dataSources" : [
          {
            "storeName" : "<string>",
            "defaultFormat" : "<string>",
            "path" : "<string>",
            "provenanceFieldName": "<string>",
            "omitAttributes": <boolean>
          }
        ]
      }
    ], 
    "maxWildcardCollections" : <integer>,
    "views" : [
      {
        "name" : "<string>",
        "source" : "<string>",
        "pipeline" : "<string>"
      }
    ]
  }
]

To learn more about these configuration settings, see databases.

Optional: Repeat steps in the Visual Editor or JSON Editor tab above to define additional Azure Azure Blob Storage data stores.

To add other data stores for federated queries, see:

Note

You can't connect an Azure Blob Storage data store for running federated queries across cloud providers.

Click Save to create the federated database instance.

Back

Google Cloud Storage

Atlas Cluster

Required Access

Prerequisites

Procedure

Tip

See: Related Links

Log in to MongoDB Atlas.

Select the .css-h15tq0{font-style:normal;font-weight:700;}Data Federation option on the left-hand navigation.

Create a federated database instance.

Select the cloud provider where Atlas Data Federation will process your queries against your federated database instance.

Type a name for your federated database instance in the Federated Database Instance Name field and click Continue.

Select the configuration method.

Specify your Google Cloud Storage bucket and configure federated databases and virtual collections that map to your data store.

Tip

See also:

Note

Optional: Repeat steps in the Visual Editor or JSON Editor tab above to define additional Azure Azure Blob Storage data stores.

Note

Click Save to create the federated database instance.

Select the Data Federation option on the left-hand navigation.