7 / 7
Nov 2024

I am using mongodb in nodejs. I want to early select only 50 Documents per pallet number inside my pipeline first stage. How can I do it? I want to do this because given a pallet number it can have 1 to 1000 serial numbers associalted with it.

My Document schema and pipeline looks like below

{ PalletNumber: string // This might repeat SerialNumber: string and unique } const documents = await collection.aggregate([{ $match:{ // I want to early select only 50 Documents per unique pallet number PalletNumber:{ $in: PalletNumbers} }, { $group: {_id: "PalletNumber", SerialNumbers: {$push:"$SerialNumbers"}} } }])

example of documents
{
_id: A
PalletNumber: SS // This remains same
SerialNumber: 01 // this can range from 1 to 1000
},
{
_id: A
PalletNumber: SS1 // This remains same
SerialNumber: 01 // this can range from 1 to 1000
},
{
_id: A
PalletNumber: SS2 // This remains same
SerialNumber: 01 // this can range from 1 to 1000
}

Couple of questions…

Why do you want to do this, is it to get it running faster as it currently runs slow?
How many documents / big is the collection
How do you define when to include a Serial Number? You’re not sorting so is it the first 50 or random 50?

How about something like a $lookup onto itself with a pipeline limiting the matches?

I am going to answer your questions

Why do you want to do this, is it to get it running faster as it currently runs slow?

I come from sql world. Honestly it is not running slow as of now. I just want to reduce the number of reads.

$match:{ // I want to early select only 50 Documents per unique pallet number PalletNumber:{ $in: ["ABC1", "ABC2"]} }, saw this inside my monogdb nodejs query planner and this is just two pallet numbers { in: 6720 // I want to reduce this number that's also out:6720 }

As the number of pallets grow this is going to increase.

How do you define when to include a Serial Number? You’re not sorting so is it the first 50 or random 50

But in reality all i care about it Getting first 50 or 50 random pallets for each pallet number inside my match. Considering pallet numbers are always unique.

How about something like a $lookup onto itself with a pipeline limiting the matches?
So for $lookup it seems I need to create two Collection one for Pallet and one for Serial Numbers ? As of now all of this field is combined into one single Collection like below

{
_id: Objectid
PalletNumber: string // This might repeat
SerialNumber: string and unique
}

You can join a collection onto itself, something like this:

var insertData = []; for (var x=0; x<100;x++ ){ var palletNumber = 'P' + Math.floor(Math.random() * 10000); for (var y=0;y<1000 ; y++ ){ insertData.push({ PalletNumber: palletNumber, SerialNumber: y }) } } db.getCollection("Demo").insertMany(insertData) db.getCollection("Demo").aggregate([ { $match:{ 'PalletNumber':{$in:['P3072']} } }, { $group:{ _id:'$PalletNumber' } }, { $lookup:{ from:'Demo', let:{ pallet_palletID:'$_id' }, pipeline:[ { $match:{ $expr:{ $eq:[ '$$pallet_palletID', '$PalletNumber' ] } } }, { $limit:50 } ], as:'lookuips' } } ])

I’ve left out indexes above as the data is so small but you can setup an index so that the pallet grouping etc is supplied from a covered index so that query can be read directly from the index / ram for performance and then the lookup is again handled by the index so fast.

Thanks for you help.

No problem! Hopefully this points you in a direction that works.