2 / 2
May 2024

I’m trying to do something conceptually simple. I believe I need to use the aggregation framework and I’ve looked at the doc for aggregation, but it is incomprehensible to me. I have two collections and I want to find the difference between them, excluding the _id field.

That is, if I have a document in collection A that looks like:

{
“_id” : ObjectId(“66438f2e9e425d33267d3527”),
“subjectId” : “GTEX-111CU”,
“sex” : “male”,
“ageBracket” : “50-59”,
}

and an entry in collection B that is identical except for the _id, that should not represent a difference.

Basically, I’m debugging a new loading script and I want to make sure that the results of my loading into a new collection are the same as the contents of the existing collection.

I’m aware that Studio 3T has a tool in the Pro version that will do this, but I only have a license for the Basic version, so that isn’t a viable solution for me.

This will be an expensive operation. Essentially, for each document in A, you will need to read each document in B, and compare every field and every value of every field.

If you have a compound index on the fields you want to compare, you could use $lookup with multiple join conditions:

db.a.aggregate([ { $lookup: { from: "b", let: {"subjectId": "$subjectId", "sex": "$sex", "ageBracket": "$ageBracket"}, pipeline: [ {$match: {$expr: {$and: [ {$eq: ["$subjectId", "$$subjectId"], {$eq: ["$sex", "$$sex"], ... } ])

See https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/#perform-multiple-joins-and-a-correlated-subquery-with--lookup