Oh yes sorry.

  • Replica set of 4 nodes.

Where is the script running compared to the server?

  • Script is in another machine but communicate with Internal network (1GB/s).

When you write read a collection X and size of X being 27_776_896_250 do you mean you read all of it and then write all of it back into a collection Y?

I read the data from the collection X, do some processing (if needed) and then I apply it to the new collection Y. I work in batches of 300. I create an empty collection, write all data and then create index (maybe it’s better to create the index before? I don’t think so because mongo will have to keep it up-to-date…).

Let me know if you have suggestion / tips / questions.
Thanks.