Hello MongoDB Community,
I’m currently using MongoDB 7.0 and developing with Golang 1.21. I’ve encountered an issue with cursor behavior and would greatly appreciate your insights.
Here’s the situation:
When querying a large dataset that requires multiple batches (triggering getMore), I’ve observed that documents inserted before a getMore operation may sometimes be included in subsequent results, but not always, leading to inconsistent behavior.
Context:
- Each document in my dataset has a uid field, which is a UUID string.
- New documents are inserted with randomly generated UUIDs.
- The uid field has an index.
- If I force the query to use this index (via hint) or sort by uid, I encounter the issue where newly inserted documents can appear in the query results.
- However, if I sort by a different field (w/o index) and even use the index, newly inserted documents do not appear in the results.
Experimental Observations:
For the cases where newly inserted documents do appear, I conducted the following experiment:
- Query results are sorted by uid.
- Before triggering a getMore operation, I insert new documents.
- Assume the last uid in the current batch is:
8aa71ab2-034b-40a8-b9d7-9889ee37665b
. - Inserted documents have the following uid values:
- 5aa71ab2-034b-40a8-b9d7-9889ee37665b
- 6aa71ab2-034b-40a8-b9d7-9889ee37665b
- 9aa71ab2-034b-40a8-b9d7-9889ee37665b
- aaa71ab2-034b-40a8-b9d7-9889ee37665b
- Only documents
9aa71ab2-034b-40a8-b9d7-9889ee37665b
andaaa71ab2-034b-40a8-b9d7-9889ee37665b
were included in the results of the subsequent getMore. - Documents
5aa71ab2-034b-40a8-b9d7-9889ee37665b
and6aa71ab2-034b-40a8-b9d7-9889ee37665b
were skipped.
Questions:
What is causing this behavior?
I’ve referred to the official documentation regarding cursor iteration and concurrent updates and read isolation.
The documentation mentions that updates to unique indexes can lead to duplicate results but doesn’t explicitly address whether newly inserted documents or deleted documents during cursor iteration can be included or skipped.
How can I prevent the cursor from including newly inserted documents during iteration?
Would adding a query condition such as "_id" < {start_timestamp}
(converting a timestamp to ObjectID with second-level precision) be a reliable approach?
Or using Read Concern “snapshot” and atClusterTime help in this case?
I am concerned about potential overhead, as the query may handle 20M+ documents, and I’m worried about excessive memory usage or pressure on WiredTiger’s cache.
Thank you in advance for your guidance! Any advice or insights would be greatly appreciated.
Best regards.