IXSCAN with very large keysExamined value

Mark_Barinstein · November 26, 2024, 5:20am

Hello.
Mongodb 4.4.12 Community.

I try to understand very strange performance characteristics of the query like below. There is a unique index on the myID field of the mycoll collection having quite a large number of documents (XX millions).

db.mycoll.aggregate([
  { $match: {
      $and: [
        { 'myId': { $in: [~ 1000 constants] } } 
      ]
    }
  }
])

I may see very different performance values for queries with the same queryHash and planCacheKey pair and slightly different number of ids in $in (but still ~1000). For example:

{
  attr: {
    keysExamined: 8 000,
    docsExamined: 4 000,
    storage: {
      data: {
        bytesRead: 500 000 000
      }
    },
    durationMillis: 1 000
  }
}

{
  attr: {
    keysExamined: 350 000,
    docsExamined: 180 000,
    storage: {
      data: {
        bytesRead: 13 000 000 000
      }
    },
    durationMillis: 50 000
  }
}

It doesn’t depend on if a query was replanned or not.
Query plans look the same in both cases (w/ and w/o $and in $match) like below.

...
        "optimizedPipeline" : true,
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "myId" : 1
                },
                "indexName" : "myId",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "myId" : [ ]
                },
                "isUnique" : true,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "myId" : [
                        "[\"id1\", \"id1\"]",
                        ...,
                        "[\"idX\", \"idX\"]"
                    ]
                }
...

Questions:

What would be a reason for such a huge value in keysExamined (even in the better case) when not more than 1000 ids only passed to the query to check against a unique index?
Is there any chance, that the reason is - unnecessary $and inside $match, and we must run the following query instead to get better performance?

db.mycoll.aggregate([
  { $match: 
    { 'myId': { $in: [~ 1000 constants] } } 
  }
])

Could be the reason of such a huge keysExamined value some parallel write activity into the collection?

Regards,
Mark

steevej · December 2, 2024, 2:48am

Starting with a $documents stage like explain here might provide a different result.

Basically, your 1000 constants are the _id of the document of $documents stage and they you $lookup in mycoll to get the matching documents. A $lookup with localField and foreignField should be quite efficient with the index on myId.

Mark_Barinstein · December 2, 2024, 2:27pm

There is no $documents aggregation stage in MongoDB v4.4.
It was introduced in v6.0 only.
$documents (aggregation)

steevej · December 3, 2024, 1:52am

Sorry, I missed that part of your post despite being the first sentence. Ooops.