Docs Menu
Docs Home
/
MongoDB Cluster-to-Cluster Sync
/

Filtered Sync

On this page

  • Filter Syntax
  • Configure a Filter
  • Replace an Existing Filter
  • Adding and Renaming Collections
  • Filtering with mapReduce and $out
  • Limitations
  • Examples

New in version 1.1.

Cluster-to-Cluster Sync provides continuous data synchronization or a one-time data migration between two MongoDB clusters. You can use filtered sync to specify which databases and collections the mongosync utility transfers between the source and destination clusters.

Starting in 1.1, mongosync supports inclusion filters to specify which databases and collections to include in sync. Starting in 1.6, mongosync also supports exclusion filters and regular expressions.

  • With inclusion filters, mongosync syncs matching databases and collections.

  • With exclusion filters, mongosync syncs all databases and collections, except for those that match the filters.

  • With both inclusion and exclusion filters, mongosync only syncs databases and collections that match the inclusion filters then excludes any that also match the exclusion filters.

  • With no filters, mongosync syncs all databases and collections.

The start API endpoint accepts two fields that configure filtered sync: includeNamespaces and excludeNamespaces. Each field takes an array of filters that specify the databases and collections to include or exclude from sync.

Note

If the start call uses both includeNamespaces and excludeNamespaces parameters, mongosync first matches databases and collections from the inclusion filters, then excludes those that also match an exclusion filter.

Filters have the following syntax:

"includeNamespaces": [
{
"database": "<database-name>",
"collections": [
"<collection-name>"
]
"databaseRegex": {
"pattern": "<regex-pattern>",
"options": "<options>"
},
"collectionsRegex": {
"pattern": "<regex-pattern>",
"options": "<options>"
}
}
],
"excludeNamespaces": [
{
"database": "<database-name>",
"collections": [
"<collection-name>"
]
"databaseRegex": {
"pattern": "<regex-pattern>",
"options": "<options>"
},
"collectionsRegex": {
"pattern": "<regex-pattern>",
"options": "<options>"
}
}
]

Filters must include either the database field or the databaseRegex field.

If you need the filter to match specific collections, you can use either the collections array to specify collections individually or define a regular expression using the collectionsRegex field.

Important

Once you start mongosync with a filter in place, the filter cannot be modified. If you do need to create a new filter, see: Replace an Existing Filter.

1

Identify the databases and collections that you want to sync to the destination cluster.

  • When you add a set of databases to the filter, you also exclude any other databases in the cluster.

  • When you specify a collection in your filter, you also exclude any other collections that are in the same database.

2

The start API accepts two parameters that configure optional filters:

  • The includeNamespaces parameter takes an array of filters, which are used to determines which databases and collections mongosync should include in the sync.

  • The excludeNamespaces parameter takes an array of filters, which are used to determine which databases and collections mongosync should exclude from the sync.

If you don't specify a filter, mongosync performs a full cluster sync.

Create inclusion and/or exclusion filters to identify the databases and collections you want to sync.

For example, this inclusion filter would configure mongosync to only sync collections whose names begin with accounts_ from the sales database, except for the accounts_old collection:

"includeNamespaces": [
{
"database": "sales",
"collectionsRegex": {
"pattern": "^accounts_.+?$",
"options": "ms"
}
],
"excludeNamespaces": [
{
"database": "sales",
"collections": [
"accounts_old"
]
}
]

For more information on filters, see Filter Syntax.

3

To use the filter, attach the filter json when you make the /start API call to begin syncing.

curl -X POST "http://localhost:27182/api/v1/start" --data '
{
"source": "cluster0",
"destination": "cluster1",
"includeNamespaces": [
{
"database": "sales",
"collectionsRegex": {
"pattern": "^accounts_.+$",
"options": "i"
}
}, {
"database": "marketing"
}
]
} '

For an example configuration, see: Start mongosync with a Filter.

You cannot update an existing filter. You must stop the ongoing sync process, prepare the destination cluster, and restart mongosync with a new filter.

When mongosync ran your original filter, it created databases with your data ("user databases") and the mongosync_reserved_for_internal_use system database on the destination cluster. You must remove those databases before restarting mongosync with your new filter.

Follow these steps to prepare the destination cluster for a new filter.

1
  1. Stop the mongosync process.

  2. Connect to the destination cluster with mongosh. If the destination is a sharded cluster, connect to the mongos instance. If the destination is a replica set, connect to the primary mongod instance.

  3. Drop the mongosync_reserved_for_internal_use system database.

    use mongosync_reserved_for_internal_use
    db.dropDatabase()
2
  1. List the databases in the cluster

    show databases
  2. Remove user databases. The admin, local, and config databases are system databases. You should not edit these system databases without instructions from MongoDB support.

    If the show databases command lists any user databases on the destination cluster, you must remove them.

    Repeat this step for each user database list:

    use <user database name>
    db.dropDatabase()

    Note: After the first db.dropDatabase() operation completes, you may need to run it a second time to remove the database.

3
  1. Decide which databases and collections you want to filter on. Add the databases and collections to the includeNamespaces array. For configuration details, see Configure a Filter.

  2. Run mongosync to reconnect to the source and destination clusters.

  3. Use the /start API end point to start syncing. Be sure to attach your new filter when you call /start.

You can, with some restrictions, add or rename a collection during a filtered sync.

Warning

If your renaming operation violates the renaming restrictions, mongosync stops syncing and reports an error.

To clean up and restart after an error, follow the steps to replace an existing filter.

You can add new collections or rename an existing collection if the entire database is part of the filter.

You can also rename a collection if the old name and the new name are both specified in the filter.

See the renaming examples.

You can only rename a collection across databases if the entire target database is part of a filter. If the filter specifies individual collections in the target database, renaming across databases does not work.

See the renaming examples.

To use the $out aggregation stage or the mapReduce command (when set to create or replace a collection) with filtering, you must filter the whole database and not just the specified collection.

For example, consider this aggregation pipeline:

use library
db.books.aggregate( [
{ $group : { _id : "$author", titles: { $push: "$title" } } },
{ $out : "authors" }
] )

The $out stage creates the authors collection in the library database. If you want to sync the authors collection, you must specify the entire library database in your filter. The filter will not work if you only specify the authors collection.

This filter works:

"includeNamespaces": [
{
"database": "library"
}
]

This filter does not work:

"includeNamespaces": [
{
"database": "library",
"collections": [ "authors", "books" ] // DOES NOT WORK WITH $OUT
}
]
  • Filtering is not supported with reversible sync.

  • The destination cluster must not contain user data prior to starting.

  • The destination cluster must not contain the mongosync_reserved_for_internal_use system database prior to starting.

  • You cannot modify a filter that is in use. To create a new filter, see: Replace an Existing Filter.

  • You can only rename collections in certain situations. For more details see: Adding and Renaming Collections.

  • If a filter includes a view but not the base collection, only the view metadata syncs to the destination cluster. To include the view documents, you must also sync the base collection.

  • You cannot specify system collections or system databases in a filter.

  • To use the $out aggregation stage or the mapReduce command (when set to create or replace a collection) with filtering, you must configure the filter to use the entire database. You cannot limit the filter to collections within the database.

    For more information, see Filtering with mapReduce and $out.

The following example starts a sync job between cluster0 and cluster1. The source cluster is cluster0 and the destination cluster is cluster1.

cluster0 contains the sales, marketing, and engineering databases.

The sales database contains the EMEA, APAC, and AMER collections.

The includeNamespaces array in this example defines a filter on two of the databases, sales and marketing.

The sales database also filters on the EMEA and APAC collections.

"includeNamespaces" : [
{
"database" : "sales",
"collections": [ "EMEA", "APAC" ]
},
{
"database" : "marketing"
}
]

After you call the /start API with this filter in place, mongosync:

  • Syncs all of the collections in the marketing database

  • Filters out the engineering database

  • Syncs the EMEA and APAC collections from the sales database

  • Filters out the AMER collection

The following example starts a sync job between cluster0 and cluster1. The source cluster is cluster0 and the destination cluster is cluster1.

cluster0 contains the students, staff, and prospects databases.

  • The students database contains the undergrad and graduate collections.

  • The staff database contains the employees and contractors collections.

The includeNamespaces array in this example defines a filter on two of the databases:

{
"source": "cluster0",
"destination": "cluster1",
"includeNamespaces":
[
{ "database" : "students", "collections": ["undergrad", "graduate", "adjuncts"] },
{ "database" : "staff" }
]
}

With this filter in place, mongosync syncs:

  • The entire staff database

  • The undergrad, graduate, and adjuncts collections in the students database

mongosync does not sync any information from the prospects database.

mongosync syncs the entire staff database. If you add new collections to the staff database, mongosync syncs them too.

mongosync does not sync new collections that are added to the students database unless the collection is a part of the filter.

For example, mongosync does not sync the new collection if you add the postdocs collection to the students database. If you add the adjuncts collection, mongosync syncs it since adjuncts is part of the filter.

You can rename any collection in the staff database.

// This code works
use admin
db.runCommand( { renameCollection: "staff.employees", to: "staff.salaried" } )

You can only rename a collection within the students database if the new and old names are both in the filter. If either of the names is not in the filter, monogsync reports an error and exists.

// This code works
use admin
db.runCommand( { renameCollection: "students.graduate", to: "students.adjuncts" } )

If a collection is specified in the filter, you can drop it, but you cannot rename it to remove it from the filter.

// This code produces an error and mongosync stops syncing
use admin
db.runCommand( { renameCollection: "students.graduate", to: "students.notAFilteredCollection" } )

When the whole target database is included in the filter, you can rename collections to add them to the filter:

  • Source collection is specified in the filter

    use admin
    db.runCommand( { renameCollection: "students.adjuncts", to: "staff.adjuncts" } )
  • Source collection is not specified in the filter

    use admin
    db.runCommand( { renameCollection: "prospects.current", to: "staff.newHires" } )

You can also rename collections in the source database when the whole target database is in the filter:

use admin
db.runCommand( { renameCollection: "staff.employees", to: "staff.onPayroll" } )

Important

If you anticipate renaming collections, consider adding the entire database to the filter rather than specifying individual collections.

Back

mongosync States