3 / 3
Apr 4

I’m trying to do an aggregation match on a couple of field but the index is not used properly. The filed inside the $Or are not used and need to be fetch afterward. Am i missing something?

I have an compound index on
TenantId_1_LifecycleState_1_State_1_Name_1_Username_1_ExternalReferenceId1_1_ExternalReferenceId2_1

If i’m doing this aggregation the index will be used for the TenantId, LifecycleState and state filter but after that it does a fetch for the $or

collection.aggregate([
{
$match: {
$and: [
{ TenantId: “tenant1” },
{ LifecycleState: “Alive” },
{ “State”: “open” },
{
$or: [
{ “Name”: { $regex: “map” } },
{ “Username”: { $regex: “map” } },
{ “ExternalReferenceId1”: { $regex: “map” } },
{ “ExternalReferenceId2”: { $regex: “map” } }
]
}
]
}
}
])

I was thinking that it could be because of the regex but if i change the aggregation it kind of work.
In this second aggregate, it use the index but because of the way it’s done, it does 4 ixscan before merging the result. Which doesn’t seems to be the best thing.

collection.aggregate([
{
$match: {
$or: [
{
$and: [
{ TenantId: “tenant1” },
{ LifecycleState: “Alive” },
{ “State”: “open” },
{ “Name”: { $regex: “map” } }
]
},
{
$and: [
{ TenantId: “tenant1” },
{ LifecycleState: “Alive” },
{ “State”: “open” },
{ “Username”: { $regex: “map” } }
]
},
{
$and: [
{ TenantId: “tenant1” },
{ LifecycleState: “Alive” },
{ “State”: “open” },
{ “ExternalReferenceId1”: { $regex: “map” } }
]
},
{
$and: [
{ TenantId: “tenant1” },
{ LifecycleState: “Alive” },
{ “State”: “open” },
{ “ExternalReferenceId2”: { $regex: “map” } }
]
}
]
}
}
])

Creating 4 index won’t work with the first aggregation, it will use the first one with name and than fetch the result. So this is the same result from using the index i sent.
For the second one it’s already working, so creating 4 index that in total will take more memory than just one that include them all doesn’t seems to make sense.

Verified, indeed it is. I was thinking of index usage of sort which requires fields to be in the same order and direction; or only uses it partially if not all left-most fields are used.

TIL for fields which aren’t part of the check, it uses indexBounds MinKey, MaxKey and then uses actual values for other fields. Like here for “TenantId AND LifecycleState AND State AND ExternalReferenceId1” (skipping Name and Username):

{ "stage": "IXSCAN", "filter": { "ExternalReferenceId1": { "$regex": "map" } }, // ... "indexName": "TenantId_1_LifecycleState_1_State_1_Name_1_Username_1_ExternalReferenceId1_1_ExternalReferenceId2_1", "indexBounds": { "TenantId": [ "[\"tenant1\", \"tenant1\"]" ], "LifecycleState": [ "[\"Alive\", \"Alive\"]" ], "State": ["[\"open\", \"open\"]"], "Name": ["[MinKey, MaxKey]"], "Username": ["[MinKey, MaxKey]"], "ExternalReferenceId1": [ "[\"\", {})", "[/map/, /map/]" ], "ExternalReferenceId2": [ "[MinKey, MaxKey]" ] } }