Data Migration within Database

Hi,
I am quite new to Mongodb.
I wish to move 10 million documents from one collection to another within a database. The source documents will be nested and need to be flattened , tranformed before loading to target collection.
What is best way to do it considering I would need to do it at rapid pace with monitoring, logging, error handling features , recovery features.
Should I use a ETL tool like AWS Glue which is serverless and supports spark or some inbuilt Mongodb atlas functionality (aggregation pipeline) is best suited here.
Thanks,
Pankaj

We do this quite a lot using the aggregation framework, if you can specify the transform in an aggregation you can have the last stage as a $merge or $out to output the results to another collection within the server.
If you need to run it on a schedule that you’d obviously need to have some method to do that, you could also split up the data and do it in batches if you wanted as well within a shell script.
Post copy, you could have another aggregation to check the source and target data counts.
A benefit of using aggregation with $out/$merge is it’ll run server side so no need to pull data off and back to the server.
If you need more complicated monitoring then it may be best to look at an ETL toolkit, how complicated is your data, have you tested to see how long it’ll take with your 10M documents and an aggregation to transform the data?

1 Like