`mongosync` Behavior

The mongosync binary is the primary process used in Cluster-to-Cluster Sync. mongosync migrates data from one cluster to another and can keep the clusters in continuous sync.

For an overview of the mongosync process, see About mongosync.

To get started with mongosync, refer to the Quick Start Guide.

For more detailed information, refer to the Installation or Connecting mongosync page that best fits your situation.

Embedded Verifier Disclaimer

Starting in 1.9, mongosync includes an embedded verifier to perform a series of verification checks on all supported collections on the destination cluster to confirm that it was successful in transferring documents from the source cluster to the destination.

When you start the mongosync process, it provides the following disclaimer:

Embedded verification is enabled by default. Verification checks for data
consistency between the source and destination clusters. Verification will
cause mongosync to fail if any inconsistencies are detected, but it does not
check for all possible data inconsistencies. Please see the documentation at
https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/verification/embedded
for more details. Verification requires approximately 0.5 GB of memory per 1
million documents on the source cluster and will fail if insufficient memory
is available. Accepting this disclaimer indicates that you understand the
limitations and memory requirements for this tool. To skip this disclaimer
prompt, use –-acceptDisclaimer.
To disable the embedded verifier, specify 'verification: false' when starting
mongosync. Please see https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/verification/
for alternative verification methods.
Do you want to continue? (y/n):

If you have already read and accepted the disclaimer, you can start mongosync with the --acceptDisclaimer option to skip this notification.

Settings

Cluster Independence

mongosync syncs collection data between a source cluster and destination cluster. mongosync does not synchronize users or roles. As a result, you can create users with different access permissions on each cluster.

Configuration File

Options for mongosync can be set in a YAML configuration file. Use the --config option. For example:

$ mongosync --config /etc/mongosync.conf

For information on available settings, see Configuration.

Cluster and Collection Types

Sharded Clusters

Cluster-to-Cluster Sync supports replication between sharded clusters. mongosync replicates individual shards in parallel from the source cluster to the destination cluster. However mongosync does not preserve the source cluster's sharding configuration.

Important

When the source or destination cluster is a sharded cluster, you must stop the balancer on both clusters and not run the moveChunk or moveRange commands for the duration of the migration. To stop the balancer, run the balancerStop command and wait for the command to complete.

Pre-Split Chunks

When mongosync syncs to a sharded destination cluster, it pre-splits chunks for sharded collections on the destination cluster. For each sharded collection, mongosync creates twice as many chunks as there are shards in the destination cluster.

Chunk Distribution

mongosync does not preserve chunk distribution from the source to the destination, even with multiple mongosync instances. It is not possible to reproduce a particular pre-split of chunks from a source cluster on the destination cluster.

The only sharding configuration that mongosync preserves from the source cluster to the destination cluster is the sharding key. Once the migration finishes, you can enable the destination cluster's balancer which distributes documents independently of the source cluster's distribution.

Primary Shards

When you sync to a sharded destination cluster, mongosync assigns a primary shard to each database by means of a round-robin.

Warning

Running movePrimary on the source or desintation cluster during migration may result in a fatal error or require you to restart the migration from the start. For more information, see Sharded Clusters.

Multiple Clusters

To sync a source cluster to multiple destination clusters, use one mongosync instance for each destination cluster. For more information, see Multiple Clusters Limitations.

Capped Collections

Starting in 1.3.0, Cluster-to-Cluster Sync supports capped collections with some limitations.

convertToCapped is not supported. If you run convertToCapped, mongosync exits with an error.
cloneCollectionAsCapped is not supported.

Capped collections on the source cluster work normally during sync.

Capped collections on the destination cluster have temporary changes during sync:

There is no maximum number of documents.
The maximum collection size is 1PB.

mongosync restores the original values for maximum number of documents and maximum document size during commit.

Reads and Writes

Write Blocking

By default, mongosync enables destination-only write-blocking on the destination cluster. mongosync unblocks writes right before the /progress endpoint reports that canWrite is true. You can explicitly enable destination-only write-blocking by using the /start endpoint to set enableUserWriteBlocking to "destinationOnly".

You can enable dual write-blocking. If you enable dual write-blocking, mongosync blocks writes:

On the destination cluster during the migration. mongosync unblocks writes right before it sets canWrite to true
On the source cluster after you call /commit

To enable dual write-blocking, use /start to set enableUserWriteBlocking to "sourceAndDestination".

You can use /start to set enableUserWriteBlocking to "none".

You cannot enable dual write-blocking or disable write-blocking after the sync starts.

If you want to use reverse synchronization later, you must enable dual write-blocking when you start mongosync.

User Permissions

To set enableUserWriteBlocking, the mongosync user must have a role that includes the setUserWriteBlockMode and bypassWriteBlockingMode ActionTypes.

Note

When using enableUserWriteBlocking, writes are only blocked for users that do not have the bypassWriteBlockingMode ActionType. Users who have this ActionType are able to perform writes.

Permissible Reads

Read operations on the source cluster are always permitted.

When the /progress endpoint reports canWrite is true, the data on the source and destination clusters is consistent.

Permissible Writes

To see what state mongosync is in, call the /progress API endpoint. The /progress output includes a boolean value, canWrite.

When canWrite is true, it is safe to write to the destination cluster.
When canWrite is false, do not write to the destination cluster.

You can safely write to the source cluster while mongosync is syncing. Do not write to the destination cluster unless canWrite is true.

Read and Write Concern

By default, mongosync sets the read concern level to "majority" for reads on the source cluster. For writes on the destination cluster, mongosync sets the write concern level to "majority" with j: true.

For more information on read and write concern configuration and behavior, see Read Concern and Write Concern.

Read Preference

mongosync requires the primary read preference when connecting to the source and destination clusters. For more information, see Read Preference Options.

Legacy Index Handling

mongosync rewrites legacy index values, like 0 or an empty string, to 1 on the destination. mongosync also removes any invalid index options on the destination.

Considerations for Continuous Sync

For any continuous synchronization use cases with mongosync, ensure that mongosync commits before cutting over from the source to the destination.

If the source cluster shuts down before mongosync can commit, such as in a disaster scenario, the destination cluster might not have a consistent snapshot of the source data. To learn more, see Consistency.

Note

After commit, you can't resume continuous sync between two clusters since mongosync can only sync into empty destination clusters. If you need to use the same two clusters after cutover, you can call the reverse endpoint to keep the clusters in sync. Otherwise, start a new continuous sync operation by using a new empty destination cluster.

Temporary Changes to Collection Characteristics

mongosync temporarily alters the following collection characteristics during synchronization. The original values are restored during the commit process.

Change	Description
Unique Indexes	Unique indexes on the source cluster are synced as non-unique indexes on the destination cluster.
TTL Indexes	Synchronization sets `expireAfterSeconds` to the value of `MAX_INT` on the destination cluster.
Hidden Indexes	Synchronization replicates hidden indexes as non-hidden.
Write Blocking	If you enable dual write-blocking, `mongosync` blocks writes: On the destination cluster during sync. On the source cluster when `commit` is received. `mongosync` enables destination-only write-blocking by default. To learn more, see Write Blocking.
Capped Collections	Synchronization sets capped collections to the maximum allowable size.
Dummy Indexes	In some cases, synchronization may create dummy indexes on the destination to support writes on sharded or collated collections.

Rolling Index Builds

mongosync does not support rolling index builds during migration. To avoid building indexes in a rolling fashion during migration, use one of the following methods to ensure that your destination indexes match your source indexes:

Build the index on the source before migration.
Build the index on the source during migration with a default index build.
Build the index on the destination after migration.

Destination Clusters

Consistency

mongosync supports eventual consistency on the destination cluster. Read consistency is not guaranteed on the destination cluster until commit. Before committing, the source and destination clusters may differ at a given point in time. To learn more, see Considerations for Continuous Sync.

While mongosync is syncing, mongosync may reorder or combine writes as it relays them from source to destination. For a given document, the total number of writes may differ between source and destination.

Transactions might not appear atomically on the destination cluster. Retryable writes may not be retryable on the destination cluster.

Profiling

If profiling is enabled on a source database, MongoDB creates a special collection named <db>.system.profile. After synchronization is complete, Cluster-to-Cluster Sync will not drop the <db>.system.profile collection from the destination even if the source database is dropped at a later time. The <db>.system.profile collection will not change the accuracy of user data on the destination.

Views

If a database with views is dropped on the source, the destination may show an empty system.views collection in that database. The empty system.views collection will not change the accuracy of user data on the destination.

System Collections

Cluster-to-Cluster Sync does not replicate system collections to the destination cluster.

If you issue a dropDatabase command on the source cluster, this change is not directly applied on the destination cluster. Instead, Cluster-to-Cluster Sync drops user collections and views in the database on the destination cluster, but it does not drop system collections on that database.

For example, on the destination cluster:

The drop operation does not affect a user-created system.js collection.
If you enable profiling, the system.profile collection remains.
If you create views on the source cluster and then drop the database, replicating the drop removes the views, but leaves an empty system.views collection.

In these cases, the replication of dropDatabase removes all user-created collections from the database, but leaves its system collections on the destination cluster.

UUIDs

mongosync creates collections with new UUIDs on the destination cluster. There is no relationship between UUIDs on the source cluster and the destination cluster. If applications contain hard-coded UUIDs (which MongoDB does not recommend), you may need to update those applications before they work properly with the migrated cluster.

Sorting

mongosync inserts documents on the destination cluster in an undefined order which does not preserve natural sort order from the source cluster. If applications depend on document order but don't have a defined sort method, you may need to update those applications to specify the expected sort order before the applications work properly with the migrated cluster.

Performance

Resilience

mongosync is resilient and able to handle non-fatal errors. Logs that contain the word "error" or "failure" do not indicate that mongosync is failing or corrupting data. For example, if a network error occurs, the mongosync log may contain the word "error' but mongosync is still able to complete the sync. In the case that a sync does not complete, mongosync writes a fatal log entry.

Data Definition Language (DDL) Operations

Using DDL operations (operations that act on collections or databases such as db.createCollection() and db.dropDatabase()) during sync increase the risk of migration failure and may negatively impact mongosync performance. For best performance, refrain from performing DDL operations on the source cluster while the sync is in progress.

For more information on DDL operations, see Pending DDL Operations and Transactions.

Network Latency

Network latency or long physical distances between migration components can negatively affect sync speed.

Latency between mongosync and destination shards: For each operation on the source cluster, mongosync does two roundtrips to the destination server. The larger the latency, the slower the sync.
Latency between destination shards: mongosync runs operations and updates its own metadata in batches in a transaction on the destination cluster. This can result in cross-shard transactions, which may be more costly if the shards are far apart.
Latency between the nodes of any replica set on the source or destination cluster: mongosync uses "majority" writes and "majority" reads, which require acknowledgement from multiple nodes in a replica set, including shard-backing replica sets. If the majority of these nodes aren't in the same region, there will be negative performance implications.

Interruptions During Sync

The following considerations pertain to interruptions during the mongosync process.

Errors and Crashes

If mongosync encounters an error or becomes unavailable during synchronization, or you can resume your mongosync operation from where it stopped. The mongosync binary is stateless and stores the metadata for a restart on the destination cluster.

To continue sync, restart mongosync once it becomes available again and use the same parameters as your interupted sync. Once you restart mongosync, the process resumes from where it stopped.

Cluster Availability

If your source or destination cluster crashes unexpectedly, you can safely restart mongosync from where it left off. Once your cluster is available again, restart mongosync and use the same parameters as your interupted sync.

Paused Sync

If mongosync is in the PAUSED state, mongosync does not support the following actions:

Upgrading the MongoDB version of the source or destination cluster
Enabling and then disabling the balancer

You can upgrade mongosync while it is in the PAUSED state.

Learn More

Back

mongosync

Configuration