I’m asking for your help MongoDB experts! First of all, thank you for taking the time to read this.
I encountered the following scenario: I’m doing an aggregation for the MasterCollection collection. I’m “joining” this collection with other 9 collections in the aggregation.
In the end, I’m merging everything into the same MasterCollection. The aggregation execution time took 30 minutes, which is not acceptable. We have a single MongoDb instance with 16GB RAM and we are running it in a docker container.
The MasterCollection has 1015787 documents. The average document size is 1.8kB for the MasterCollection. Additional stats for the collections:
Collection name Number of documents Avg Doc size
collection 1016878 40B
collection2 0 0B
collection3 232 94B
collection4 10289 97B
collection5 10289 97B
collection6 1747 102B
collection 1326 103B
collection8 1016878 42B
collection9 1016878 58B
Compound indexes are created for the fields that are used in the lookups.
My aggregation looks like this:
MasterCollection.aggregate([
{
$project: {
_id: 1,
field1: 1,
field2: 1,
field3: 1,
},
},
{
$lookup: {
from: 'collection1',
localField: '_id',
foreignField: '_id',
as: 'collection1',
},
},
{
$lookup: {
from: 'collection8',
localField: '_id',
foreignField: '_id',
as: 'collection8',
},
},
{
$lookup: {
from: 'collection9',
localField: '_id',
foreignField: '_id',
as: 'collection9',
},
},
{
$lookup: {
from: 'collection2',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection2: 1,
},
},
],
as: 'collection2',
},
},
{
$lookup: {
from: 'colelction3',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection3: 1,
},
},
],
as: 'colelction3',
},
},
{
$lookup: {
from: 'collection4',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection4: 1,
},
},
],
as: 'collection4',
},
},
{
$lookup: {
from: 'collection5',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection5: 1,
},
},
],
as: 'collection5',
},
},
{
$lookup: {
from: 'collection6',
let: {
field1Id: '$field1',
field3Id: '$field3',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field3', '$$field3Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection6: 1,
},
},
],
as: 'collection6',
},
},
{
$lookup: {
from: 'collection7',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection7: 1,
},
},
],
as: 'collection7',
},
},
{
$unwind: // from each collection
},
{
$project: {
_id: 1,
// project from each collection
},
},
{
$merge: {
into: 'MasterCollection',
on: '_id',
whenMatched: 'merge',
whenNotMatched: 'discard',
},
},
], { allowDiskUse: true })