I am attempting to execute a query to count documents based on the ledgerName field, which is used to search for the required ledger name. Additionally, I need to obtain distinct counts of each status for the searched ledger name and store all data that matches each status into a list.
Below is the demo data for three collections involved in my query:
1. Settings: (Collection Size: 30K documents)
{
_id: // ObjectId of doc
company_id: // ObjectId of company
ledgerName_id: // ID of ledger in String
}
2. History: (Collection Size: 600K documents)
{
_id: // ObjectId of doc
company_id: // ObjectId of company
ledgerName_id: // ID of ledger in String
createdAt: // Date of creation of history
status: // Status 1 or 0
}
3. Ledger_Name: (Collection Size 50M documents)
{
_id: // ObjectId of doc
company_id: // ObjectId of company
name_id: // ID of ledger in String
ledgerName: // Name of ledger in string
}
Here is the query that utilizes indexed scans across the Settings collection and also leverages indices in the other two collections during the lookup stages:
Query:
db.getCollection("Settings").aggregate(
[{
$match:{
"company_id": // Some Company ID
}
},
{
$lookup:{
as: "SettingsData",
from: "Settings",
"let": {
"company_id": "$company_id",
"ledgerName_id": "$ledgerName_id"
},
pipeline:[
{
"$match": {
"$expr": {
"$and": [
{ "$eq": [ "$company_id", "$company_id" ] },
{ "$eq": [ "$ledgerName_id", "$ledgerName_id" ] }
]
},
"createdAt": {
"$gte": ISODate("2024-04-01T00:00:00.000+0530"),
"$lt": ISODate("2025-04-01T00:00:00.000+0530")
},
}
},
{ "$sort": { "createdAt": -1 } }, // Sort by createdAt in descending order
{ "$limit": 1 }, //
{
"$project": {
"_id": 1,
"createdAt": 1,
"status": 1,
}
}
]
}
},
{
"$unwind": {
"path": "$SettingsData",
"preserveNullAndEmptyArrays": true
}
},
{
"$lookup": {
"as": "ledgersData",
"from": "Ledger_Name",
"let": {
"company_id": "$company_id",
"ledgerName_id": "$ledgerName_id"
},
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{ "$eq": [ "$company_id", "$company_id" ] },
{ "$eq": [ "$name_id", "$ledgerName_id" ] }
]
}
}
},
{
"$project": {
"_id": 0,
"ledgerName": 1 // Search Term
}
}
]
}
},
},
{
"$unwind": {
"path": "$ledgersData",
"preserveNullAndEmptyArrays": true
}
},
{
"$count": "count"
}
When I run this query without any aggregation operations like $count
or $group
, it completes within 500 milliseconds. However, adding these stages extends the execution time to 15-20 seconds. Could anyone help me in optimizing this query?