steevej
(Steeve Juneau)
2
Most likely the $group stage is part of the problem. It blocks until all documents are processed and potentially uses a lot of memory before the first resulting document gets out.
I had success in moving out the computation of the $group (the last_event code) and put it in the aggregation pipeline of a $lookup that is added after the $group. Something like
{ "$group": {
"stage_id" : "$stage_id" , "up_value" : "$up_value"
} } ,
{ "$lookup": {
"from": "collection" ,
"as": "last_event" ,
"let": {
"stage_id" : "$stage_id" ,
"up_value" : "$up_value"
} ,
"pipeline": [
{ "$match" : { timestamp, stage_id, uq_value, ... } } ,
{ "$sort" : { timestamp : -1 } } ,
{ "$limit" : 1 }
]
} }
The theory is that $group now uses a lot less memory.
You also uses $$ROOT in your group so all documents of a given group must be fetched. With the $lookup only 1 per group is fetch if the index covers the $match and $sort.
I think it could be even improved by doing the $group in 2 steps. The first is $group on stage_id, do a $lookup that matches stage_id and timestamp, in this $lookup you have another $group but with up_value this time. With this you need a 2nd lookup that would be a lot like the one I shared before.
Another thing I would try is to use $documents as the first stage rather than $in. Then the first $group would instantaneous.
To get a more detailed answer I would need a playground with sample documents so that I can work with. Time is scarce so I try to avoid creating documents that others already have.