MongoDB Atlas - to - MongoDB AWS Community Edition Syncing

Mehul_Sanghvi · 2024-09-04T06:29:31.568Z

Hii Community

I need to understand how to transfer my data (data size: 1TB) from Atlas to Community Edition (created on AWS ec2 instance). Also need continous syncing as my application is connected with Atlas cluster.

Michael_Lynn · 2024-09-04T09:37:11.160Z

Mehul_Sanghvi:

I need to understand how to transfer my data (data size: 1TB) from Atlas to Community Edition (created on AWS ec2 instance). Also need continous syncing as my application is connected with Atlas cluster.

Curious about your motivation to move from Atlas to EC2 self managed… managing your own instance can come with hidden costs… and management headaches… but it is definitely possible. Here’s how I’d start to think about doing this…

Getting the Data Over: First, you’ll want to use mongodump to create a backup of your data from Atlas. Then, you can restore it onto your EC2 instance with mongorestore. Here’s how that looks:

Dump from Atlas:

mongodump --uri "<your Atlas connection string>" --gzip --archive=backup.gz

Restore to EC2: After transferring the dump file to your EC2 instance, you can use mongorestore to load it into your self-hosted setup.

mongorestore --gzip --archive=backup.gz --uri "mongodb://<EC2 instance connection string>"

Keeping Things in Sync: Depending on your update frequency… and the app in place… you can use MongoDB Change Streams. Change Streams allow you to watch your collections in real time and capture any inserts, updates, or deletes. You could set up a process that listens for changes in Atlas and applies those changes to your EC2 instance. Another option to explore is Atlas Triggers, which can fire off scripts/actions based on changes in your data (for example, docs inserted or modified). You could use a trigger to send updates to your EC2 instance. Depending on your needs, you could build a simple process that continuously listens for changes and updates the EC2 instance accordingly.

Not sure what language or frameworks you’re using but here’s some pseudo-code in python leveraging changestreams.

from pymongo import MongoClient
from pymongo.errors import ConnectionFailure
import pprint

# Replace these connection strings with your actual Atlas and EC2 MongoDB connection strings
ATLAS_URI = "mongodb+srv://<atlas-username>:<password>@<atlas-cluster-url>/<database>?retryWrites=true&w=majority"
EC2_URI = "mongodb://<ec2-username>:<password>@<ec2-instance-url>:27017/<database>"

# Set up MongoDB clients for Atlas and EC2 instances
atlas_client = MongoClient(ATLAS_URI)
ec2_client = MongoClient(EC2_URI)

# Specify the database and collection to sync
db_name = "<database>"
collection_name = "<collection>"

atlas_db = atlas_client[db_name]
ec2_db = ec2_client[db_name]

atlas_collection = atlas_db[collection_name]
ec2_collection = ec2_db[collection_name]

# Function to sync changes from Atlas to EC2
def sync_changes(change):
    operation_type = change["operationType"]
    document = change["fullDocument"]

    if operation_type == "insert":
        # Insert the document into the EC2 collection
        ec2_collection.insert_one(document)
        print(f"Document inserted in EC2: {document}")

    elif operation_type == "update":
        # Update the document in the EC2 collection
        document_id = change["documentKey"]["_id"]
        updated_fields = change["updateDescription"]["updatedFields"]
        ec2_collection.update_one({"_id": document_id}, {"$set": updated_fields})
        print(f"Document updated in EC2: {updated_fields}")

    elif operation_type == "delete":
        # Delete the document from the EC2 collection
        document_id = change["documentKey"]["_id"]
        ec2_collection.delete_one({"_id": document_id})
        print(f"Document deleted from EC2: {document_id}")

# Watch the Atlas collection for changes
try:
    with atlas_collection.watch() as stream:
        print("Listening for changes in Atlas...")
        for change in stream:
            pprint.pprint(change)
            sync_changes(change)

except ConnectionFailure as e:
    print(f"Error connecting to MongoDB: {e}")

Hope this helps… let us know how you make out.

Mehul_Sanghvi · 2024-09-06T07:45:37.262Z

@Michael_Lynn
I have already used mongodump & mongorestore. But it consume more time for my 1TB of data.
And python’s pseudo-code look’s great, I can use that for Keeping Things in Sync.

Michael_Lynn · 2024-09-08T11:05:02.269Z

I completely blanked on this earlier - but what about mongosync… My colleague @Andrew_Davidson reminded me… this sounds like a perfect use case for mongosync.

Here’s a great video featuring @Jake_Cosme explaining how it works.
And here’s a link to the documentation.

Mehul_Sanghvi · 2024-09-09T08:19:49.913Z

@Michael_Lynn I have also read about mongosync, but mongosync too require downtime on source database because during mongosync start, no write operation should perform on Source Cluster (as given in ducmentation), which raises a concern because we can’t stop write operations on our Source Cluster.

Fabio_Ramohitaj · 2024-09-09T09:53:26.010Z

Hi @Mehul_Sanghvi,
Where is write that thing? If I remember correctly, write operations could not be done in the target cluster.

Regards

Mehul_Sanghvi · 2024-09-09T11:54:59.968Z

@Fabio_Ramohitaj Oops!! My bad. I have mis-read the source cluster with destination cluster because of following statement given in https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/limitations/#std-label-c2c-limitations

image1020×96 3.19 KB

Mehul_Sanghvi · 2024-09-09T12:32:15.029Z

I will see this video and get back to you if have any queries. Thanks for your time.

Mehul_Sanghvi · 2024-09-10T12:46:33.708Z

@Michael_Lynn
As given in documentation and in video too, i have tried and successfully created file till mongosync-config.yaml part where I have defined my source-cluster’s URI & also destination-cluster’s URI

config.yaml

cluster0: "mongodb+srv://<username>:<password>@uri-to-connect-atlas/"
cluster1: "mongodb://<username>:<password>@uri-to-connect-ec2/"
logPath: "/path/to/log/file"
verbosity: "WARN"

But when I run following command it throws curl: (7) Failed to connect to localhost port 27182 after 0 ms: Connection refused :

curl localhost:27182/api/v1/start -XPOST --data '
   {
      "source": "cluster0",
      "destination": "cluster1",
   } '

Unable to understand why this is happening because my log command was running with this port (seen in log too)

Fabio_Ramohitaj · 2024-09-10T13:02:33.692Z

Hi @Mehul_Sanghvi,
Can you attach the output of:

netstat -nap | grep 27182

I think that the mongo process has not been started.

Best Regards

Mehul_Sanghvi · 2024-09-10T13:03:35.616Z

@Fabio_Ramohitaj No output.

Fabio_Ramohitaj · 2024-09-10T13:32:10.674Z

Hey @Mehul_Sanghvi,
So I think you need to run before:

/path/mongosync --config /path/mongosync.conf

To start the mongosync process and than you can start the synchronization process.
Regards

Maria_van_Keulen · 2024-09-10T13:50:19.556Z

Thank you for the feedback on mongosync’s write blocking docs; I will share this with the team.

As a quick clarification, MongoDB does not offer support for Cluster-to-Cluster Sync with Community deployments in most cases. Please see these docs for more details:

* MongoDB Community Edition
* Define a Source and Destination Cluster

I would recommend pursuing the other solutions @Michael_Lynn mentioned above.

Mehul_Sanghvi · 2024-09-10T14:03:22.975Z

@Maria_van_Keulen I can’t use mongodump & mongorestore command, because of size of data.

Mehul_Sanghvi · 2024-09-10T14:05:26.592Z

@Fabio_Ramohitaj This has given me the response {success: true} But on destination cluster only databases & collections were created, no documents were transferred from source to destined cluster

Is there any way to track that progress?

There are some processes running on my Atlas cluster (source). But I am unable to observe any changes on my EC2 instance (destination).

Hope following logs would guide you in solving this query:

 tail -f /mnt/d/path/to/MongoSync/Logs/mongosync.log

{"time":"2024-09-10T19:07:39.299674+05:30","level":"info","serverID":"[REDACTED]","mongosyncID":"coordinator","version":"1.8.0","commit":"045f7567c13e6fdea252c591fed930d22d7fd8f7","go_version":"go1.21.12","os":"linux","arch":"amd64","compiler":"gc","message":"Version info"}
{"time":"2024-09-10T19:07:39.299887+05:30","level":"info","serverID":"[REDACTED]","mongosyncID":"coordinator","cluster0":"mongodb+srv://MongoSyncUser:<REDACTED>@[REDACTED].mongodb.net/","cluster1":"mongodb://MongoSyncUser:<REDACTED>@[REDACTED].compute.amazonaws.com/","verbosity":"INFO","logPath":"[REDACTED]","port":27182,"pprofPort":0,"id":"coordinator","isMultipleReplicatorConfiguration":false,"retryDurationLimit":"10m0s","retryRandomly":false,"disableTelemetry":false,"parameters":{"numParallelPartitions":4,"numInsertersPerPartition":4,"maxNumDocsToSamplePerPartition":10,"partitionSizeInBytes":419430400,"docBufferMaxNumDocs":10000,"docBufferFlushBytes":15728640,"numDocBuffersPerInserter":2,"partitionCursorBatchSize":10000,"numEventApplierThreads":128,"ceaMaxMemoryGB":4,"crudEventBatchFlushInterval":"100ms","loadLevelProvided":false,"slowOperationWarningThreshold":"1m0s","transactionLifetimeLimit":"1m0s","mongosyncSleepBeforeInitDuration":"0s","orchestrator":"noOrchestrator","acceptRemoteAPIRequest":false,"retainAllMetadataAfterCommit":false,"skipSrcAndDstVersionsCheck":false,"fatalLogPath":"","atlasLiveMigrateID":"","syncStartDelay":"2m0s"},"migrationName":"","metadataDBName":"mongosync_reserved_for_internal_use","message":"Mongosync Options"}
{"time":"2024-09-10T19:07:39.305380+05:30","level":"info","serverID":"[REDACTED]","mongosyncID":"coordinator","message":"Telemetry will be tracked."}
{"time":"2024-09-10T19:07:39.305722+05:30","level":"warn","serverID":"[REDACTED]","mongosyncID":"coordinator","totalRAMGB":10,"message":"At least 10 GB of RAM should be available to run mongosync."}
{"time":"2024-09-10T19:07:39.395603+05:30","level":"info","serverID":"[REDACTED]","mongosyncID":"coordinator","clusterType":"dst","operationID":"9534dd50","operationDescription":"Pinging deployment at mongodb+srv://MongoSyncUser:<REDACTED>@[REDACTED].mongodb.net/ to establish initial connection.","attemptNumber":0,"totalTimeSpent":"1.834µs","retryAttemptDurationSoFarSecs":0,"retryAttemptDurationLimitSecs":600,"message":"Trying operation."}
{"time":"2024-09-10T19:07:39.894152+05:30","level":"info","serverID":"[REDACTED]","mongosyncID":"coordinator","clusterType":"dst","operationID":"0833d155","operationDescription":"Pinging deployment at mongodb://MongoSyncUser:<REDACTED>@[REDACTED].compute.amazonaws.com/ to establish initial connection.","attemptNumber":0,"totalTimeSpent":"3.426µs","retryAttemptDurationSoFarSecs":0,"retryAttemptDurationLimitSecs":600,"message":"Trying operation."}
{"time":"2024-09-10T19:07:40.543111+05:30","level":"info","serverID":"[REDACTED]","mongosyncID":"coordinator","database":"mongosync_reserved_for_internal_use","collection":"globalState","message":"Global state document does not exist in destination cluster. Skipping live upgrade."}
{"time":"2024-09-10T19:07:40.544812+05:30","level":"info","serverID":"[REDACTED]","mongosyncID":"coordinator","port":27182,"message":"Running webserver."}
{"time":"2024-09-10T19:07:40.545150+05:30","level":"warn","serverID":"[REDACTED]","mongosyncID":"coordinator","error":{"msErrorLabels":["unclassifiedError"],"message":"listen tcp 127.0.0.1:27182: bind: address already in use"},"message":"Web server failed to handle incoming requests."}
{"time":"2024-09-10T19:07:40.545583+05:30","level":"fatal","serverID":"[REDACTED]","mongosyncID":"coordinator","stack":[{"func":"Wrapf","line":"258","source":"/data/mci/[REDACTED]/src/github.com/10gen/mongosync/internal/labelederror/labelederror_constructors.go"},{"func":"Wrap","line":"238","source":"/data/mci/[REDACTED]/src/github.com/10gen/mongosync/internal/labelederror/labelederror_constructors.go"},{"func":"runMongosyncWithCtx","line":"138","source":"/data/mci/[REDACTED]/src/github.com/10gen/mongosync/internal/mongosync/app.go"},{"func":"runMongosync","line":"83","source":"/data/mci/[REDACTED]/src/github.com/10gen/mongosync/internal/mongosync/app.go"},{"func":"(*Command).Run","line":"274","source":"/data/mci/[REDACTED]/src/github.com/10gen/mongosync/vendor/github.com/urfave/cli/v2/command.go"},{"func":"(*App).RunContext","line":"332","source":"/data/mci/[REDACTED]/src/github.com/10gen/mongosync/vendor/github.com/urfave/cli/v2/app.go"},{"func":"main","line":"31","source":"/data/mci/[REDACTED]/src/github.com/10gen/mongosync/cmd/mongosync/main.go"},{"func":"main","line":"267","source":"/opt/golang/go1.21/src/runtime/proc.go"},{"func":"goexit","line":"1650","source":"/opt/golang/go1.21/src/runtime/asm_amd64.s"}],"error":{"msErrorLabels":["unclassifiedError"],"message":"web server exited with an error: listen tcp 127.0.0.1:27182: bind: address already in use"},"message":"Mongosync exited with an error."}

Fabio_Ramohitaj · 2024-09-10T17:31:10.809Z

Hi @Mehul_Sanghvi,
From the documentation:

Mehul_Sanghvi:

Is there any way to track that progress?

mongodb.com

progress - MongoDB Cluster-to-Cluster Sync v1.8

About the error, it seems that you have tried to start mongosync several times, so try to kill all
mongosync process and restart it.

Last but not least, as mentioned from @Maria_van_Keulen i would also evaluate this:

Maria_van_Keulen:

MongoDB does not offer support for Cluster-to-Cluster Sync with Community deployments in most cases.

Best Regards

Mehul_Sanghvi · 2024-09-11T05:05:31.814Z

So it means that I can’t use mongosync between Atlas & Community?

Fabio_Ramohitaj · 2024-09-11T10:01:39.917Z

Hey @Mehul_Sanghvi,
from the documentation:

Limitations - MongoDB Cluster-to-Cluster Sync v1.8

Best Regards

Mehul_Sanghvi · 2025-02-10T08:29:09.993Z

Is there any alternative to mongosync?

Mehul_Sanghvi · 2025-02-10T09:05:20.151Z

Can we link oplog of my Atlas cluster with oplog of AWS cluster for syncing, if yes then could anyone guide me how to do that?