Archive Data

On this page

Overview
Cluster Requirements
Required Permissions
How Atlas Archives Data
Atlas Data Federation for Online Archive
Limitations
Viewing the Online Archive
Querying the Online Archive
Managing Query Limits for Online Archive
Editing Online Archives
Deleting Online Archives
Online Archive Costs
Manage Your Online Archive

Important

Feature unavailable in Flex Clusters and Serverless Instances

Flex clusters and Serverless instances don't support this feature at this time. To learn more, see Atlas Flex Limitations and Serverless Instance Limitations.

Overview

Atlas moves infrequently accessed data from your Atlas cluster to a MongoDB-managed read-only Federated Database Instance on a cloud object storage. Once Atlas archives the data, you have a unified view of your Atlas and Online Archive data through a read-only federated database instance.

Atlas archives data based on the criteria you specify in an archiving rule. The criteria vary based on the type of collection you want to archive:

For standard collections, the criteria can be one of the following:

A combination of a date field to archive data and number of days to keep data on the Atlas cluster. When the current date exceeds the value of the specified date field, Atlas subtracts the number of days from the current time and then archives data after the time.
A custom query. Atlas runs the query specified in the archiving rule to select the documents to archive.

For time series collections, the criteria is a combination of a time field and number of days to keep data on the Atlas cluster. When the current time exceeds the value of the specified time field, Atlas subtracts the number of days from the current time and then archives data after that many days, hours, and minutes.

When you configure an Online Archive on your cluster, Atlas creates 2 federated database instances:

Federated Database Instance for your archive that allows you to query data on your archive only.
Federated Database Instance for your cluster and archive that allows you to query both your cluster and archived data.

Cluster Requirements

Online Archive in Atlas is available only on M10 and greater clusters.

Required Permissions

To create or delete an Online Archive, you must have one of these roles:

How Atlas Archives Data

To archive data:

For each archive, Atlas runs a query in the archive's namespace to identify the documents that match the criteria for archiving. Atlas refers to this query on a particular archive's namespace as a job.
By default, Atlas runs the job every five minutes. If the size of documents to archive doesn't meet the threshold, Atlas expands the job interval by five minutes, up to a maximum of four hours. If the job interval reaches the maximum or if the size of documents to archive reaches the threshold, Atlas runs the job again and resets the job interval to five minutes. The threshold is 1.8GiB per job.
Atlas might initiate the job from any node in the cluster. However, since the job might need to perform delete operations, it always connects to the primary member.
If you specify a time window when you want to run the job, Atlas runs the job continuously during that time window as long as there is at least 5 MiB of data to archive. To learn more, see Limitations. If a running job doesn't complete during the time window, Atlas continues to run the job until it completes. If all archiving jobs reach the maximum threshold for either the size or number of documents to archive during three consecutive archive windows, we recommend that you increase the frequency.
Atlas runs an index sufficiency query to determine the efficiency of the archival process. If the number of documents scanned to the number of documents returned is 10 or more, the query result triggers an Index Sufficiency Warning. This warning indicates that you have insufficient indexes for an efficient archival process. For date-based archives, you must index the date field. For custom criteria that use an expression, Atlas might first convert a value before it evaluates it against the query.
For documents that match the archival criteria, Atlas writes up to 1.8GiB of document data to partitions on the cloud object storage, grouped and sorted using the partitioning scheme that you provided during archive creation. Atlas periodically rebalances the partitions and stores in a format optimized for both query performance and ability to expire data in a reasonable amount of time.
For newly created online archives on timeseries collection, the threshold is 1.8 GiB or 100k documents, based on which limit is reached first.

Note

The time it takes to complete an archival job depends on a number of factors including the cluster resources. The next archive job runs only after the current job finishes.

Online Archive runs on your Atlas cluster and uses the same underlying resources, such as IOPS. The default limit of 1.8GiB per job prevents the operation from using too many resources. If your cluster is currently satisfying workloads at the edge of its resource limits, you could push it past its capacity by activating Online Archive. Ensure that your Atlas cluster has excess resources before activating Online Archive.

If you activate Online Archive, you can select one of the following regions to store your archived data.

Data Federation Regions	AWS Regions
Virginia, USA	us-east-1
Oregon, USA	us-west-2
Sao Paulo, Brazil	sa-east-1
Ireland	eu-west-1
London, England	eu-west-2
Frankfurt, Germany	eu-central-1
Tokyo, Japan	ap-northeast-1
Mumbai, India	ap-south-1
Singapore	ap-southeast-1
Sydney, Australia	ap-southeast-2
Montreal, Canada	ca-central-1

Important

Atlas encrypts your archived data using Amazon's server-side encryption S3-managed keys (SSE-S3) for archived data. Atlas can't use any encryption-at-rest encryption keys that you used on your cluster data.

Data Federation Regions	Azure Regions
Virginia, USA	`US_EAST_2`
Netherlands	`EUROPE_WEST`

Important

Atlas encrypts your archived data using Azure Storage service-side encryption. Atlas can't use any encryption-at-rest encryption keys that you used on your cluster data.

Data Federation Regions	Google Cloud Regions
Belgium	`europe-west1`
Iowa, USA	`us-central1`

Important

Atlas encrypts your archived data using Google Cloud Storage service-side encryption. Atlas can't use any encryption-at-rest encryption keys that you used on your cluster data.

When you archive data, Atlas first copies the data to the cloud object storage and then deletes the data from your Atlas cluster. During archival, for a brief period of time, you might see duplicate documents on your Atlas cluster and the Online Archive. But after the archival and when your Online Archive state is idle, the already archived documents won't be present in your Atlas cluster.

Important

Online Archive deletes documents from the cluster by only using the _id. You must enforce _id uniqueness across all shards in your application. If documents with duplicate _id are present in the cluster during an archival job, Atlas might delete all documents with the same _id, even if only one of them satisfied the archival criteria.

WiredTiger doesn't release the storage blocks of the deleted data back to the OS for performance reasons. However, Atlas eventually automatically reuses these storage blocks for new data. This helps the Atlas cluster to avoid fragmentation. To learn more, see How do I reclaim disk space in WiredTiger?.

Your Online Archive is read-only. Atlas doesn't update archived data. You can configure deletion of archived data after a certain period of time. To purge archived data, configure the Deletion Age Limit setting for your Online Archive when you create or modify the Online Archive. Atlas doesn't sync your Online Archive with the Atlas cluster to maintain consistency after the data is archived.

Atlas provides a unified endpoint. You can use it to query all databases and collections on your live cluster and archived data using the same database and collection name that you use in your Atlas cluster. You can't use the unified endpoint over a Network Peering Connection, but you can set up a private endpoint or use a standard internet connection over TLS.

Note

Configuring an Online Archive doesn't eliminate the need for a backup policy. We recommend that you configure a backup policy that meets your requirements. To learn more about configuring a backup policy, see Back Up Your Cluster.

Atlas Data Federation for Online Archive

When you configure your M10 or greater Atlas cluster for Online Archive, Atlas creates a read-only Federated Database Instance, one per cluster, for your archived data.

Limitations

Online Archive doesn't support the following:

Writing to the Online Archive.
Configuring or administering the Online Archive federated database instance through the Atlas console, Atlas Data Federation CLI, or Atlas Data Federation API.
Archiving a capped collection.
Archiving data below the size of 5 MiB after 7 days. To learn more, see Limitations.
GridFS.
Deleting individual documents.

Viewing the Online Archive

To view your federated database instance for the Online Archive:

In Atlas, go to your federated database instance for your project.

If it's not already displayed, select the organization that contains your project from the Organizations menu in the navigation bar.
If it's not already displayed, select your project from the Projects menu in the navigation bar.
In the sidebar, click Data Federation under the Services heading.
The Data Federation page displays.

View your federated database instance for the Online Archive.

Querying the Online Archive

To query your Online Archive data, use the connection string through the Online Archive or federated database instance Connect button to connect to the federated database instance.

You can also query your Online Archive data with SQL. To learn more, see Query with Atlas SQL.

Managing Query Limits for Online Archive

You can configure limits on the amount of data that is processed for your queries against archived data to control the data processing costs for your Online Archive. When the amount of processed data reaches any applicable configured limit, Atlas won't execute any new queries and returns an error to the client application that a limit has been reached. You can also optionally configure query termination to terminate queries that exceed the limit. To learn more, see Manage Atlas Data Federation Query Limits.

Editing Online Archives

Once Atlas creates the Online Archive, you can't change the archiving criteria from Date Match to Custom Filter, or vice versa.

Deleting Online Archives

If you delete all the Online Archives, Atlas deletes the federated database instances. After deleting all the Online Archives, if you create an Online Archive with the same settings as a deleted Online Archive, Atlas creates a new federated database instance for the new Online Archive.

Online Archive Costs

Online Archive stores infrequently accessed data to lower the data storage costs on your Atlas cluster. However, you incur costs for amount of data that you transfer and query. To learn more, see Online Archive Costs.

Manage Your Online Archive

You can configure an Online Archive for a collection on your cluster through your Atlas console and API. Once you create an Online Archive, you can:

Back

Import S3 Archive

Configure Online Archive