Filtered Sync
On this page
New in version 1.1.
Cluster-to-Cluster Sync provides continuous data synchronization or a one-time data migration between two MongoDB clusters. You can use filtered sync to specify which databases and collections the mongosync utility transfers between the source and destination clusters.
Starting in 1.1, mongosync
supports inclusion filters to specify which
databases and collections to include in sync. Starting in 1.6, mongosync
also supports exclusion filters and regular expressions.
With inclusion filters,
mongosync
syncs matching databases and collections.With exclusion filters,
mongosync
syncs all databases and collections, except for those that match the filters.With both inclusion and exclusion filters,
mongosync
only syncs databases and collections that match the inclusion filters then excludes any that also match the exclusion filters.With no filters,
mongosync
syncs all databases and collections.
Filter Syntax
The start
API endpoint accepts two fields that configure
filtered sync: includeNamespaces
and excludeNamespaces
.
Each field takes an array of filters that specify the databases and collections
to include or exclude from sync.
Note
If the start
call uses both includeNamespaces
and
excludeNamespaces
parameters, mongosync
first matches databases
and collections from the inclusion filters, then excludes those that
also match an exclusion filter.
Filters have the following syntax:
"includeNamespaces": [ { "database": "<database-name>", "collections": [ "<collection-name>" ] "databaseRegex": { "pattern": "<regex-pattern>", "options": "<options>" }, "collectionsRegex": { "pattern": "<regex-pattern>", "options": "<options>" } } ], "excludeNamespaces": [ { "database": "<database-name>", "collections": [ "<collection-name>" ] "databaseRegex": { "pattern": "<regex-pattern>", "options": "<options>" }, "collectionsRegex": { "pattern": "<regex-pattern>", "options": "<options>" } } ]
Filters must include either the database
field or the databaseRegex
field.
If you need the filter to match specific collections, you can use either
the collections
array to specify collections individually or define
a regular expression using the collectionsRegex
field.
Configure a Filter
Important
Once you start mongosync
with a filter in place, the filter
cannot be modified. If you do need to create a new filter,
see: Replace an Existing Filter.
Identify Databases and Collections.
Identify the databases and collections that you want to sync to the destination cluster.
When you add a set of databases to the filter, you also exclude any other databases in the cluster.
When you specify a collection in your filter, you also exclude any other collections that are in the same database.
Create a Filter.
The start
API accepts two parameters that configure
optional filters:
The
includeNamespaces
parameter takes an array of filters, which are used to determines which databases and collectionsmongosync
should include in the sync.The
excludeNamespaces
parameter takes an array of filters, which are used to determine which databases and collectionsmongosync
should exclude from the sync.
If you don't specify a filter, mongosync
performs a full cluster
sync.
Create inclusion and/or exclusion filters to identify the databases and collections you want to sync.
For example, this inclusion filter would configure mongosync
to only
sync collections whose names begin with accounts_
from the sales
database, except for the accounts_old
collection:
"includeNamespaces": [ { "database": "sales", "collectionsRegex": { "pattern": "^accounts_.+?$", "options": "ms" } ], "excludeNamespaces": [ { "database": "sales", "collections": [ "accounts_old" ] } ]
For more information on filters, see Filter Syntax.
Use the Filter.
To use the filter, attach the filter json when you make the /start API call to begin syncing.
curl -X POST "http://localhost:27182/api/v1/start" --data ' { "source": "cluster0", "destination": "cluster1", "includeNamespaces": [ { "database": "sales", "collectionsRegex": { "pattern": "^accounts_.+$", "options": "i" } }, { "database": "marketing" } ] } '
For an example configuration, see: Start mongosync
with a Filter.
Replace an Existing Filter
You cannot update an existing filter. You must stop the ongoing sync
process, prepare the destination cluster, and restart mongosync
with
a new filter.
When mongosync
ran your original filter, it created databases with
your data ("user databases") and the
mongosync_reserved_for_internal_use
system database on the
destination cluster. You must remove those databases before restarting
mongosync
with your new filter.
Follow these steps to prepare the destination cluster for a new filter.
Remove mongosync_reserved_for_internal_use
.
Stop the
mongosync
process.Connect to the destination cluster with
mongosh
. If the destination is a sharded cluster, connect to themongos
instance. If the destination is a replica set, connect to the primarymongod
instance.Drop the
mongosync_reserved_for_internal_use
system database.use mongosync_reserved_for_internal_use db.dropDatabase()
Remove user databases.
List the databases in the cluster
show databases Remove user databases. The
admin
,local
, andconfig
databases are system databases. You should not edit these system databases without instructions from MongoDB support.If the
show databases
command lists any user databases on the destination cluster, you must remove them.Repeat this step for each user database list:
use <user database name> db.dropDatabase() Note: After the first
db.dropDatabase()
operation completes, you may need to run it a second time to remove the database.
Configure a new filter.
Decide which databases and collections you want to filter on. Add the databases and collections to the
includeNamespaces
array. For configuration details, see Configure a Filter.Run
mongosync
to reconnect to the source and destination clusters.Use the
/start
API end point to start syncing. Be sure to attach your new filter when you call/start
.
Adding and Renaming Collections
You can, with some restrictions, add or rename a collection
during a filtered sync.
Warning
If your renaming operation violates the renaming restrictions,
mongosync
stops syncing and reports an error.
To clean up and restart after an error, follow the steps to replace an existing filter.
Adding and Renaming Within a Single Database
You can add new collections or rename an existing collection if the entire database is part of the filter.
You can also rename a collection if the old name and the new name are both specified in the filter.
See the renaming examples.
Renaming Across Different Databases
You can only rename a collection across databases if the entire target database is part of a filter. If the filter specifies individual collections in the target database, renaming across databases does not work.
See the renaming examples.
Filtering with mapReduce and $out
To use the $out
aggregation stage or
the mapReduce
command (when set to create
or replace a collection) with filtering, you must
filter the whole database and not just
the specified collection.
For example, consider this aggregation pipeline:
use library db.books.aggregate( [ { $group : { _id : "$author", titles: { $push: "$title" } } }, { $out : "authors" } ] )
The $out
stage creates the authors
collection in the library
database. If you want to sync the authors
collection, you must
specify the entire library
database in your filter. The filter will
not work if you only specify the authors
collection.
This filter works:
"includeNamespaces": [ { "database": "library" } ]
This filter does not work:
"includeNamespaces": [ { "database": "library", "collections": [ "authors", "books" ] // DOES NOT WORK WITH $OUT } ]
Limitations
Filtering is not supported with reversible sync.
The destination cluster must not contain user data prior to starting.
The destination cluster must not contain the
mongosync_reserved_for_internal_use
system database prior to starting.You cannot modify a filter that is in use. To create a new filter, see: Replace an Existing Filter.
You can only rename collections in certain situations. For more details see: Adding and Renaming Collections.
If a filter includes a view but not the base collection, only the view metadata syncs to the destination cluster. To include the view documents, you must also sync the base collection.
You cannot specify system collections or system databases in a filter.
To use the
$out
aggregation stage or themapReduce
command (when set to create or replace a collection) with filtering, you must configure the filter to use the entire database. You cannot limit the filter to collections within the database.For more information, see Filtering with mapReduce and $out.
Examples
Start mongosync
with a Filter
The following example starts a sync job between cluster0
and
cluster1
. The source cluster is cluster0
and the destination
cluster is cluster1
.
cluster0
contains the sales
, marketing
, and
engineering
databases.
The sales
database contains the EMEA
, APAC
, and AMER
collections.
The includeNamespaces
array in this example defines a filter on two
of the databases, sales
and marketing
.
The sales
database also filters on the EMEA
and APAC
collections.
"includeNamespaces" : [ { "database" : "sales", "collections": [ "EMEA", "APAC" ] }, { "database" : "marketing" } ]
After you call the /start
API with this filter in place,
mongosync
:
Syncs all of the collections in the
marketing
databaseFilters out the
engineering
databaseSyncs the
EMEA
andAPAC
collections from thesales
databaseFilters out the
AMER
collection
Adding and Renaming Collections While Syncing
The following example starts a sync job between cluster0
and
cluster1
. The source cluster is cluster0
and the destination
cluster is cluster1
.
cluster0
contains the students
, staff
, and prospects
databases.
The
students
database contains theundergrad
andgraduate
collections.The
staff
database contains theemployees
andcontractors
collections.
The includeNamespaces
array in this example defines a filter on two
of the databases:
{ "source": "cluster0", "destination": "cluster1", "includeNamespaces": [ { "database" : "students", "collections": ["undergrad", "graduate", "adjuncts"] }, { "database" : "staff" } ] }
With this filter in place, mongosync
syncs:
The entire
staff
databaseThe
undergrad
,graduate
, andadjuncts
collections in thestudents
database
mongosync
does not sync any information from the prospects
database.
Adding a Collection
mongosync
syncs the entire staff
database. If you add new
collections to the staff
database, mongosync
syncs them too.
mongosync
does not sync new collections that are added to
the students
database unless the collection is a part of the filter.
For example, mongosync
does not sync the new collection if you add
the postdocs
collection to the students
database. If you add the
adjuncts
collection, mongosync
syncs it since adjuncts
is
part of the filter.
Renaming a Collection
You can rename any collection in the staff
database.
// This code works use admin db.runCommand( { renameCollection: "staff.employees", to: "staff.salaried" } )
You can only rename a collection within the students
database if the
new and old names are both in the filter. If either of the names is not
in the filter, monogsync
reports an error and exists.
// This code works use admin db.runCommand( { renameCollection: "students.graduate", to: "students.adjuncts" } )
If a collection is specified in the filter, you can drop it, but you cannot rename it to remove it from the filter.
// This code produces an error and mongosync stops syncing use admin db.runCommand( { renameCollection: "students.graduate", to: "students.notAFilteredCollection" } )
When the whole target database is included in the filter, you can rename collections to add them to the filter:
Source collection is specified in the filter
use admin db.runCommand( { renameCollection: "students.adjuncts", to: "staff.adjuncts" } ) Source collection is not specified in the filter
use admin db.runCommand( { renameCollection: "prospects.current", to: "staff.newHires" } )
You can also rename collections in the source database when the whole target database is in the filter:
use admin db.runCommand( { renameCollection: "staff.employees", to: "staff.onPayroll" } )
Important
If you anticipate renaming collections, consider adding the entire database to the filter rather than specifying individual collections.