We have a cluster with a collection of ~1M documents. Each document itself is also moderately large at ~1kbyte.
I tried using Mongosync to sync this remote cluster to a mongo cluster running on my machine. During the collection copy phase, however, the tool gets frequently “stuck”, where the sync is in the “collection copy” phase, mongod continuously consumes a large amount of CPU but the mongosync logs do not show an increase in estimatedCopiedBytes
.
Recently I had to stop the sync during the collection copy phase and started it again after several hours. I noticed that the progress, as measured by estimatedCopiedBytes
, was still stuck, with mongo still using a lot of CPU. Stopping mongosync did not stop the mongod’s CPU usage, so I used mongosh to run a db.currentOp()
to figure out what was going on.
I noticed that mongosync had issued several commands to delete entire _id
ranges, roughly the following:
db.MyCollection.deleteMany({
$and: [
{ '$expr': { '$gte': [ '$_id', 'some id' ] } },
{ '$expr': { '$lt': [ '$_id', 'a greater id' ] } },
]
})
Which is fair, but the deletion itself was also progressing very slowly, at ~several hundred documents per second. I have a rather recent machine with a half-decent CPU and a pretty fast nvme SSD, mind you. So I ran an explain()
on the query, and the results were interesting:
db.MyCollection.find( {$and: [{ '$expr': { '$gte': [ '$_id', 'some id' ] } }, { '$expr': { '$lt': [ '$_id', 'a greater id' ] } }]}, {_id: 1}).explain()
winningPlan: { stage: 'PROJECTION_SIMPLE', transformBy: { _id: 1 }, inputStage: { stage: 'FETCH', filter: { '$and': [ { '$expr': { '$gte': [ '$_id', { '$const': 'some id' } ] } }, { '$expr': { '$lt': [ '$_id', { '$const': 'a greater id' } ] } } ] }, inputStage: { stage: 'IXSCAN', keyPattern: { _id: 1 }, indexName: '_id_', isMultiKey: false, multiKeyPaths: { _id: [] }, isUnique: true, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { _id: [ '["some id", "a greater id")' ] } } } }
This does not look efficient. Not only is mongo doing a completely unnecessary fetch and filter, it does not even realize that the _id
index covers the query. Changing the query to an equivalent one without $expr
s fixes the issue:
db.MyCollection.find( {$and: [{_id: {$gte: 'some id'}}, {_id: {$lt: 'a greater id'}} ] } , {_id: 1}).explain()
winningPlan: { stage: 'PROJECTION_COVERED', transformBy: { _id: 1 }, inputStage: { stage: 'IXSCAN', keyPattern: { _id: 1 }, indexName: '_id_', isMultiKey: false, multiKeyPaths: { _id: [] }, isUnique: true, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { _id: [ '["some id", "a greater id")' ] } } }
That looks much better, and is how I’d expect mongo to run the query. It does a bounded index scan over the _id
index, and does a covered projection, no fetch to be seen.
So I guess my question is two-fold:
- Is this a known limitation of the query planner with
$expr
queries? We are using Mongo 6.0 - Why is mongosync deleting documents in the first place? And how can I avoid it getting stuck?