Restore Archived Data
On this page
Important
Feature unavailable in Serverless Instances
Serverless instances don't support this feature at this time. To learn more, see Serverless Instance Limitations.
You can restore archived data to your Atlas cluster. You can use the alternate syntax that Atlas Data Federation provides for the $merge pipeline stage to move the data back into the same or different Atlas cluster, database, or collection within the same Atlas project.
Note
Ensure that your cluster is adequately provisioned for the amount of data that will be restored from your archive so that it doesn't run out of space during or after restoration of archived data. Contact Support for additional technical guidance on setting up the size of the oplog or for troubleshooting any space issues on your Atlas cluster.
This page describes how to restore archived data using the $merge
pipeline stage or MongoDB Tools.
Required Access
To follow this procedure, you must have
Project Data Access Admin
access or higher to the project.
Procedure
If your dataset is small, you can use the $merge
stage to
move your archived data back to you Atlas cluster. This approach
is not recommended for large datasets (around 1TB of data) with large
number of partitions.
Pause the Online Archive associated with the collection whose archived data you wish to restore.
See Pause and Resume Archiving for more information.
Connect to Online Archive using the connection string.
You must use the Archive Only connection string to connect to the Online Archive. To learn more, see Connect to Online Archive.
Use $merge
to move the data from your archive to your Atlas cluster.
To learn more about the $merge
pipeline stage syntax and usage
for moving data back into your Atlas cluster, see the $merge pipeline stage.
Example
Consider the following documents in an S3 archive:
{ "_id" : 1, "item": "cucumber", "source": "nepal", "released": ISODate("2016-05-18T16:00:00Z") } { "_id" : 2, "item": "miso", "source": "canada", "released": ISODate("2016-05-18T16:00:00Z") } { "_id" : 3, "item": "oyster", "source": "luxembourg", "released": ISODate("2016-05-18T16:00:00Z") } { "_id" : 4, "item": "mushroom", "source": "ghana", "released": ISODate("2016-05-18T16:00:00Z") }
Suppose the $merge
syntax for restoring these documents
into the Atlas cluster identifies documents based on the
item
and source
fields during the $merge
stage.
db.<collection>.aggregate([ { "$merge": { "into": { "atlas": { "clusterName": "<atlas-cluster-name>", "db": "<db-name>", "coll": "<collection-name>" } }, "on": [ "item", "source" ], "whenMatched": "keepExisting", "whenNotMatched": "insert" } } ])
In this example, when an archived document matches a document on the Atlas cluster on those two fields, Atlas keeps the existing document in the cluster because the copy of the document on the Atlas cluster is more recent than the copy of the document in the archive. When an archived document doesn't match any document in the Atlas cluster, Atlas inserts the document into the specified collection on the Atlas cluster.
When restoring data back into the Atlas cluster, the archived
data might have duplicate _id
fields. For this example, we can
include a $sort
stage for sorting on the _id
and
released
fields before the $merge
stage to ensure
that Atlas chooses the documents with the recent date if there
are duplicates to resolve.
Note
If there are multiple on
fields, you must create a compound
unique index on the on
identifier fields:
db.<collection>.createIndex( { item: 1, source: 1 }, { unique: true } )
Alternatively, specify merges sequentially, one for each on
identifier field, to a temporary collection. Then merge the data
in the temporary collection to the target collection using the
cluster's connection string. You must still create a unique
index for each on
identifier field.
The aggregation stage can be run in the background by setting the
background
flag to true
. To run this command in
mongosh
, use the db.runCommand
.
db.runCommand( "aggregate": "<collection>", "pipeline": [ { $sort: { "_id": 1, "released": 1, } }, { "$merge": { "into": { "atlas": { "clusterName": "<atlas-cluster-name>", "db": "<db-name>", "coll": "<collection-name>" } }, "on": [ "item", "source" ], "whenMatched": "keepExisting", "whenNotMatched": "insert" } } ], {"background": true} )
To learn more about resolving duplicate fields, see the $merge considerations.
Verify data in the Atlas cluster and delete the online archive.
See Delete an Online Archive for more information.
Note
If you run into issues while migrating data back to your Atlas cluster, contact Support.