Hi,
I am quite new to Mongodb.
I wish to move 10 million documents from one collection to another within a database. The source documents will be nested and need to be flattened , tranformed before loading to target collection.
What is best way to do it considering I would need to do it at rapid pace with monitoring, logging, error handling features , recovery features.
Should I use a ETL tool like AWS Glue which is serverless and supports spark or some inbuilt Mongodb atlas functionality (aggregation pipeline) is best suited here.
Thanks,
Pankaj
We do this quite a lot using the aggregation framework, if you can specify the transform in an aggregation you can have the last stage as a $merge or $out to output the results to another collection within the server.
If you need to run it on a schedule that you’d obviously need to have some method to do that, you could also split up the data and do it in batches if you wanted as well within a shell script.
Post copy, you could have another aggregation to check the source and target data counts.
A benefit of using aggregation with $out/$merge is it’ll run server side so no need to pull data off and back to the server.
If you need more complicated monitoring then it may be best to look at an ETL toolkit, how complicated is your data, have you tested to see how long it’ll take with your 10M documents and an aggregation to transform the data?
New & Unread Topics
Topic | Replies | Views | Activity |
---|---|---|---|
mongoose session transactions for multiple db connection | 0 | 627 | May 2024 |
Sorting behavior and sorting sequence for mongodb | 3 | 469 | Jun 2024 |
Migrating from Mongo Atlas Sharded Cluster to a Mongo Atlas Replica Set in a Different Region | 0 | 208 | Aug 2024 |
Aggregate two collections where both collections have a property with array of objects | 0 | 45 | Aug 2024 |
Slow query performance when working with large $in arrays | 0 | 39 | Jan 21 |