$lookup accross 2 collections for ID exists takes too much time

Chalk_Marker · 2024-11-18T12:08:21.208Z

I am using a Quartz + Spring batch Cluster to read documents from Mongo for bulk processing. Since i cannot tag(add a read flag) the original document as read , i add the ID of the read document into a migration collection and compare the ID’s across the collection using a $lookup with below code in aggregate pipeline

$lookup:{
from:'migration_coll',
localField: '_id',
foreignField: '_id,
pipeline:[
{
   $project: {
      "_id":1
   }
 }
],
as:'migrtedDocuments'
]}

I typed in the above part to get an idea on how i am using the pipeline to do a look up using _id’s across the collection and then projecting on id only as well to increase the speed. However with large collection size, the Query is really slow. With 2 million plus records it is taking more than 10 to 15s to return a count.

Questions:

is there any better way to do this ?.
What else can i use to keep a tab if i cant modify the existing document?

I am kind of stuck on this issue. Any help is appreciated

Topic	Replies	Views	Activity
How to properly search a collection and a relation at the same time? Working with Data aggregation queries atlas-search text-search	2	275	Aug 2024
Database Design - Draw diagrams Working with Data charts	0	45	Feb 2
Looking for an approach to reliably extract data from a MongoDB collection Working with Data aggregation queries etl	1	33	Feb 17
Built-in method to compare semantic versions in queries? Working with Data queries node-js	0	19	Feb 25
Migrating from SQL server to mongo db Working with Data	3	68	15d

$lookup accross 2 collections for ID exists takes too much time

New & Unread Topics

Want to read more? Browse other topics in Working with Data or view latest topics.