2 / 2
Aug 2024

Let me explain the data I’m working on:

  • The collection properties contains attributes for items (e.g. “main subject”, “language”, “publication date”, “publisher”). The field id matches the regular expression P[0-9]+.
  • The collection items mainly contains literary works (e.g. academic books). The field id matches the regular expression Q[0-9]+. The field statements is a dictionary where the keys are the id of itmes in the collection properties.

Here’s a minimal working example:

db.properties.deleteMany({}) db.properties.insertMany([{'id': 'P1', 'label': 'main subject'}]) db.properties.insertMany([{'id': 'P2', 'label': 'language of work'}]) db.items.deleteMany({}); db.items.insertMany([ {'id': 'Q1', 'label': 'algorithm'}, {'id': 'Q2', 'label': 'C++'}, {'id': 'Q3', 'label': 'English'}, { 'id': 'Q4', 'label': 'Algorithms in C++', 'statements': { 'P1': [ {'property': 'P1', 'value': 'Q1'}, {'property': 'P1', 'value': 'Q2'} ], 'P2': [ {'property': 'P2', 'value': 'Q3'} ] } }, ]); print('properties') printjson(db.properties.find({})) print('items') printjson(db.items.find({})) properties [ { _id: ObjectId('66b4f9031669d5840e2202d8'), id: 'P1', label: 'main subject' }, { _id: ObjectId('66b4f9031669d5840e2202d9'), id: 'P2', label: 'language of work' } ] items [ { _id: ObjectId('66b4f9031669d5840e2202da'), id: 'Q1', label: 'algorithm' }, { _id: ObjectId('66b4f9031669d5840e2202db'), id: 'Q2', label: 'C++' }, { _id: ObjectId('66b4f9031669d5840e2202dc'), id: 'Q3', label: 'English' }, { _id: ObjectId('66b4f9031669d5840e2202dd'), id: 'Q4', label: 'Algorithms in C++', statements: { P1: [ { property: 'P1', value: 'Q1' }, { property: 'P1', value: 'Q2' } ], P2: [ { property: 'P2', value: 'Q3' } ] } } ]

I want to get the following result. That is, lookup the field property and value of each item in the array stored in the value of the key-value pairs in the field statements.

I don’t know how to instruct $lookup to iterate through all key-value pairs in the dictionary statements. Then, iterate through all items in the array stored in the value of the key-value pairs of statements and perform $lookup in the fields property and value.

[ { _id: ObjectId('66b4f699c56a647e922202dd'), id: 'Q4', label: 'Algorithms in C++', statements: { P1: [ { property: {'id': 'P1', 'label': 'main subject'} value: {'id': 'Q1', 'label': 'algorithm'} }, { property: {'id': 'P1', 'label': 'main subject'} value: {'id': 'Q2', 'label': 'C++'} } ], P2: [ { property: {'id': 'P2', 'label': 'language'} value: {'id': 'Q3', 'label': 'English'} } ] } } ]

I searched information on the Internet, but I only found examples that showed how to perform $lookup on explicit fields. I didn’t find examples on how to perform $lookup on an arbitrary number of fields.

I managed to do it on my own. Consider the following data:

db.properties.deleteMany({}) db.properties.insertMany([{'_id': 'P1', 'label': 'main subject'}]) db.properties.insertMany([{'_id': 'P2', 'label': 'language'}]) db.properties.insertMany([{'_id': 'P3', 'label': 'instance of'}]) db.items.deleteMany({}); db.items.insertMany([ {'_id': 'Q1', 'label': 'algorithm'}, {'_id': 'Q2', 'label': 'C++'}, {'_id': 'Q3', 'label': 'English'}, {'_id': 'Q4', 'label': 'written work'}, { '_id': 'Q5', 'label': 'Algorithms in C++', 'statements': { 'P1': [{'value': 'Q1'}, {'value': 'Q2'}], 'P2': [{'value': 'Q3'}], 'P3': [{'value': 'Q4'}] } }, ]);

This is the query I came up with:

printjson(db.items.aggregate([ // We use objectToArray so that afterwards we can unwind $statements. { $project: { statements: { $objectToArray: '$statements' } } }, // We $unwind $statements so that we have an object for each // different property. { $unwind: "$statements" }, // We $unwind $statements.v so that we have an object for each // different pair of property and value. { $unwind: "$statements.v" }, // lookup properties { $lookup: { 'from': 'properties', 'localField': 'statements.k', 'foreignField': '_id', 'as': 'statements.k' } }, // convert array to object in key "k" { $unwind: "$statements.k" }, // lookup items { $lookup: { 'from': 'items', 'localField': 'statements.v.value', 'foreignField': '_id', 'as': 'statements.v' } }, // convert array to object in key "v" { $unwind: "$statements.v" }, // group by item id and property id { $group: { "_id": { "_id": "$_id", "property_id": "$statements.k._id" }, "statements": { "$push": { "property": "$statements.k", "value": "$statements.v" } } } }, // group by property id { $group: { "_id": "$_id._id", "statements": { "$push": { k: "$_id.property_id", v: "$statements" } } } }, // make statements be an object { $project: { "statements": { $arrayToObject: "$statements" } } }, ]))

output:

[ { _id: 'Q5', statements: { P1: [ { property: { _id: 'P1', label: 'main subject' }, value: { _id: 'Q1', label: 'algorithm' } }, { property: { _id: 'P1', label: 'main subject' }, value: { _id: 'Q2', label: 'C++' } } ], P2: [ { property: { _id: 'P2', label: 'language' }, value: { _id: 'Q3', label: 'English' } } ], P3: [ { property: { _id: 'P3', label: 'instance of' }, value: { _id: 'Q4', label: 'written work' } } ] } } ]

This is the first time I write an aggregate query that involves more than 2 operations, so I wonder if there’s a shorter way to get the same output.