I am using a Quartz + Spring batch Cluster to read documents from Mongo for bulk processing. Since i cannot tag(add a read flag) the original document as read , i add the ID of the read document into a migration collection and compare the ID’s across the collection using a $lookup with below code in aggregate pipeline

$lookup:{ from:'migration_coll', localField: '_id', foreignField: '_id, pipeline:[ { $project: { "_id":1 } } ], as:'migrtedDocuments' ]}

I typed in the above part to get an idea on how i am using the pipeline to do a look up using _id’s across the collection and then projecting on id only as well to increase the speed. However with large collection size, the Query is really slow. With 2 million plus records it is taking more than 10 to 15s to return a count.

Questions:

  1. is there any better way to do this ?.
  2. What else can i use to keep a tab if i cant modify the existing document?

I am kind of stuck on this issue. Any help is appreciated