EventJoin us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases. Learn more >>
CRITICAL ALERT

Atlas, Ops Manager, and Cloud Manager "rolling index builds" can introduce index inconsistencies

Issue Description and Impact

An issue was identified with the "rolling index build" feature of MongoDB Atlas, Cloud Manager, and Ops Manager under any of the following version combinations:

  • Atlas running MongoDB versions 4.2.19+, 4.4.13+, 5.0.6+ (including rapid release versions 5.1-5.3), and 6.0.0+. A fix has been released on Atlas, but clusters may have been impacted in the past.
  • Cloud Manager running MongoDB Agent version from 11.13.0.7438-1 to 12.4.0.7702-1, and MongoDB versions 4.2.19+, 4.4.13+, 5.0.6+, and 6.0.0+.
  • Ops Manager versions 5.0.10-5.0.14 and 6.0.0-6.0.2 running MongoDB versions 4.2.19+, 4.4.13+, 5.0.6+, and 6.0.0+.

In these combinations, new indexes created with the "rolling index build" feature are vulnerable to index key inconsistencies during a subsequent replication rollback or server crash.

Each affected index includes "log=(enabled=true)" in the indexDetails.<index name>.creationString field of the collStats command output. For collections outside the local database, this setting is not compatible with normal replica set operations.

As a result of this setting, indexes can become inconsistent when either:

  • A secondary replica set member experiences replication rollback, and the window of changes rolled back includes writes that updated the index.
  • A primary replica set member crashes for unrelated reasons, and the journalCommitInterval (default 50ms) includes writes that updated the index.

During these circumstances, MongoDB rolls back documents to the correct version of data. But because of the invalid index configuration, affected indexes are not correctly rolled back. This can lead to index inconsistencies in the form of:

  • Extra index entries, which lead to incorrect query results for covered queries.
  • Out of date or missing index entries, which lead to incomplete query results for queries using the index.

Note: In many cases, operations issued with writeConcern majority that are retried on failure will immediately correct the index inconsistencies that result from this issue.

Incorrect query results can cause applications to introduce logical data inconsistencies. See the Workarounds and Remediation sections below.

Note: Unique indexes (including the _id index) will not be affected by this issue because unique indexes are not created using the "rolling index build" process.

Workarounds

The workarounds for this issue involve avoiding or correcting the invalid table logging settings on indexes built using the "rolling index build" feature. For information about remediating index inconsistencies introduced during replication rollbacks or crashes, see the Remediation section below.

For Atlas Customers:

A fix was deployed in Atlas on August 22, 2023 that makes "rolling index builds" safe for all MongoDB versions. However, some clusters may have been impacted in the past. MongoDB is performing validations and will notify you directly if any action is necessary.

However, if you have previously performed a "rolling index build" on Atlas and would like to act immediately, open a support case or start a chat to request assistance ensuring a cluster is not impacted by this issue.

For Cloud and Ops Manager customers:

Cloud Manager and Ops Manager "rolling index builds" are currently affected for the version combinations listed above. Unique indexes (including _id indexes) and indexes built using the documented manual process for building rolling indexes in replica sets are not affected.

MongoDB Agent versions 11.0.20.7105-1+ (for Ops Manager 5.0), 12.0.11.7605-1+ (for Ops Manager 6.0), 12.4.0.7703-1+ (for Cloud Manager) correct this issue. Avoid building indexes using the "rolling index build" feature until your cluster is managed by a MongoDB Agent running a fixed version.

If an index build is critical and you are unable to upgrade your MongoDB Agent version, see the Workarounds section below for more information. If you have any questions, please open a support case.

Remediation

Index inconsistencies on a node can be remediated by performing a logical initial sync. The workarounds described above prevent the introduction of new inconsistencies but do not correct existing inconsistencies.

For Atlas Customers:

If your cluster is hosted on MongoDB Atlas, we are performing validations and will reach out to you if you need to take action. No action is needed at this time.

For Cloud and Ops Manager customers:

After reviewing the validate command documentation for important considerations, run the validate command on each collection on each node of your replica set. If validate reports any failures, the ideal resolution is to resync the impacted node from another node.

If possible, consider the indexes that were impacted by inconsistencies and evaluate whether applications could have introduced logical data inconsistencies in response to incorrect query results. Identifying and responding to this possibility requires application-specific knowledge and, potentially, reference to backups.

If you are unsure which indexes were impacted by inconsistencies, you can prevent further impact by either:

  • Resyncing the impacted node from another node, or
  • Rebuilding indexes in a rolling fashion by following the documented manual process for building rolling indexes in replica sets
  • (For MongoDB 4.4+) Rebuild the index with the optimized build process that is non-blocking. This may not be suitable for workloads which cannot tolerate performance decrease due to index builds.

Documentation