The Great Continuous Migration: CDC Jobs With Confluent Cloud and Relational Migrator

Anaiya Raisinghani11 min read • Published Aug 23, 2024 • Updated Aug 23, 2024

AWS Kafka SQL MongoDB

Rate this tutorial

Are you ready to finally move your relational data over to MongoDB while ensuring every change to your database is properly handled? While this process can be jarring, MongoDB’s Relational Migrator is here to help simplify things. In this tutorial, we will go through in-depth how to conduct change data captures from your relational data from MySQL to MongoDB Atlas using Confluent Cloud and Relational Migrator.

What are CDC jobs?

Change data capture or CDC jobs are specific processes that track any and all changes in a database! Even if there is a small update to one row (or 100), a change data capture job will ensure that this change is accurately reflected. This is very important in a world where people want accurate results immediately — data needs to be updated constantly. From basic CRUD (create, read, update, delete) instances to more complex data changes, CDC jobs are incredibly important when dealing with data.

What is MongoDB Relational Migrator?

MongoDB Relational Migrator is our tool to help developers migrate their relational databases to MongoDB. The great part about it is that Relational Migrator will actually help you to write new code or edit existing code to ensure your migration process works as smoothly as possible, as well as automate the conversion process of your database's schema design. This means there’s less complexity and downtime and fewer errors than if tasked with dealing with this manually.

What is Confluent Cloud and why are we using it?

Built by the original creators of Apache Kafka®, Confluent Cloud is a cloud-native and complete data streaming platform available everywhere businesses need it—in the cloud, across clouds, on-premises, and hybrid environments. We are using it to deal with streaming real-time changes from our relational database to our MongoDB Atlas cluster. The great thing about Confluent Cloud is that it’s simple to set up and provides 120+ pre-built connectors for easy, seamless connectivity with any other data system. Also, you don’t need Kafka to run production migrations as the embedded mode is sufficient for the majority of migrations.

We also recommend that users start off with the embedded version even if they are planning to use Relational Migrator in the future for a quick start since it has all of the same features, except for the additional resilience in long-running jobs.

Kafka can be relatively complex, so it’s best added to your migration job as a specific step to ensure there is limited confusion with the process. We recommend working immediately on your migration plan and schema design and then adding Kafka when planning your production cutover. Let’s get started.

Pre-requisites for success

MongoDB Atlas account
Amazon RDS account
Confluent Cloud account
MongoDB Relational Migrator — this tutorial uses version 1.5.
MySQL
MySQL Workbench — this tutorial uses version 8.0.36. Workbench is so you can visually interact with your MySQL database, so it is optional, but if you’d like to follow the tutorial exactly, please download it onto your machine.

Download MongoDB Relational Migrator

Please make sure you download Relational Migrator on your machine. The version we are using for this tutorial is version 1.5.0. Make sure it works and you can see it in your browser before moving on.

Create your sink database

While our relational database is our source database, where our data ends up is called our sink database. In this tutorial, we want our data and all our changes to end up in MongoDB, so let’s create a MongoDB Atlas cluster to ensure that happens. If you need help creating a cluster, please refer to the documentation. Please keep note of the region you’re creating your cluster in and ensure you are choosing to host your cluster in AWS. Keep your username and password somewhere safe since you’ll need them later on in this tutorial, and please make sure you’ve allowed access from anywhere (0.0.0.0/0) in your “Network Access” tab. If you do not have the proper network access in place, you will not be able to connect to any of the other necessary platforms. Note that “Access from Anywhere” is not recommended for production and is used for this tutorial for ease of reference. Grab your cluster’s connection string and save it in a safe place. We will need it later.

Create a database in Amazon RDS

As of right now, Confluent Cloud’s Custom Connector only supports Amazon instances, so please ensure you’re using Amazon RDS for your relational databases since other cloud providers will not work at the moment. Since it’s important to keep everything secure, you will need to ensure networking access, with the possibility of requiring AWS Privatelink. Sign in to your Amazon account and head over to “Amazon RDS.” You can find it in the search bar at the top of the screen.

Click on “Databases” on the left-hand side of the screen. If you don’t have a database ready to use (specifically in your Amazon account), please create one by clicking the orange button. You’ll be taken to this page. Please select the MySQL option:

After selecting this, scroll down and change the MySQL version to the version compatible with your version of Workbench. For the tutorial, we are using version 8.0.36.

Then, please fill out the Settings area. For your DB cluster identifier, choose a name for your database cluster. Choose a Master username, hit the Self managed credentials toggle, and fill in a password. Please do not forget this username and password, you will need it throughout the tutorial to successfully set up your various connections.

For the rest of this database set-up process, you can keep everything default except please press the toggle to ensure the database allows Public Access. This is crucial! Follow the rest of the steps to complete and create your database.

When you see the green “Available” status button, that means your database is ready to go.

Create a parameter group

Now that our database is set up, we need to create a parameter group and modify some things to ensure we can do CDC jobs. We need to make sure this part works in order to successfully handle our CDC jobs. On the left-hand side of your Amazon RDS homepage, you’ll see the “Parameter groups” button. Please press that and create a new parameter group.

Under the dropdown “Parameter group family,” please pick mysql8.0 since that is the version we are running for this tutorial. If you’re using something different, please feel free to use a different version. Give the parameter group a name and a description and hit the orange “create” button.

Once it’s created, click on the parameter name, hit the “Edit” button, search for binlog_format, and change the “Value” column from “MIXED” to “ROW.”

This is important to do because changing this setting allows for recording any database changes at a “row” level. This means each and every little change to your database will be accurately recorded. Without making this change, you won’t be able to properly conduct any CDC jobs.

Now, let’s associate our database with this new parameter group. Click on “Databases,” choose the one we just created, and hit “Modify.” Scroll all the way down to “DB Parameter Group.” Click on the drop-down and associate it with the group you just created. As an example, here is mine:

Modify the instance and click “Save.” Once you’re done, go in and “Reboot” your database to ensure these changes are properly saved. Please keep in mind that you’re unable to reboot while the database is being modified and need to wait until it’s in the “Available” state.

Head over to the “Connectivity & security” tab in your database and copy your “Endpoint” under where it says “Endpoint & port.” Now, we’re going to connect our Amazon RDS database to our MySQL Workbench!

Connect Amazon RDS to relational database

Launch MySQL Workbench and click the “+” button to establish a new connection. Your endpoint that was copied above will go into your “Hostname.” Keep the port the same. (It should be 3306.) Your username and password are from when you created your cluster. It should look something like this:

Click on “Test Connection” and you should see a successful connection.

If you’re unable to connect when you click on “Test Connection,” go into your Amazon RDS database, click on the VPC security group, click on “Edit inbound rules,” click on “Add rule,” select “All traffic” under “Type,” select “Anywhere-IPv4,” and save it. Try again and it will work.

Import the sample data into your relational database

For this tutorial, the data used is taken from Kaggle in the form of a .csv file, if you want to use the same one: World Happiness Index: 2019. You can run the following SQL script in MySQL Workbench to create a new table and load the data from the CSV (be sure to update the file path). Alternatively if you have existing data you could use that instead.

1 CREATE TABLE world_happiness_report (
2   overall_rank int DEFAULT NULL,
3   country_or_region text,
4   score double DEFAULT NULL,
5   gdp_per_capita double DEFAULT NULL,
6   social_support double DEFAULT NULL,
7   healthy_life_expectancy double DEFAULT NULL,
8   freedom_to_make_life_choices double DEFAULT NULL,
9   generosity double DEFAULT NULL,
10   perceptions_of_corruption double DEFAULT NULL
11 );
12 
13 
14 LOAD DATA LOCAL INFILE 'C:/Your/Path/2019.csv'
15 INTO TABLE world_happiness_report
16 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
17 IGNORE 1 ROWS;

Now, let’s configure our Confluent Cloud account!

Configure Confluent Cloud account

Our first step is to create a new environment. We can use a free account here as well:

On the cluster page, please choose the “Basic” tier. This tier is free as well. Please make sure you have configured your zones and your region for where you are. These need to match up with both your MongoDB Atlas cluster region and your Amazon RDS database region.

Once your cluster is configured, we need to take note of a number of keys and IDs in order to properly connect to Relational Migrator. We need to take note of the:

Cluster ID.
Environment ID.
Bootstrap server.
REST endpoint.
Cloud API key and secret.
Kafka API key and secret.

You can find most of these from your “Cluster Settings,” and the Environment ID can be found on the right-hand side of your environment page in Confluent.

For Cloud API keys, click on the three lines on the right-hand side of Confluent’s homepage.

Click on “Cloud API keys” and grab the “key” and “secret” if you’ve already created them, or create them if necessary.

For the Kafka API keys, head over to your Cluster Overview, and on the left-hand side, click “API Keys” to create them. Once again, save your “key” and “secret.”

All of this information is crucial since you’re going to need it to insert into your user.properties folder to configure the connection between Confluent Cloud and MongoDB’s Relational Migrator.

As you can see from the documentation linked above, your Cloud API keys will be saved in your user.properties file as:

migrator.confluent.cloud-credentials.api-key
migrator.confluent.cloud-credentials.api-secret

And your Kafka API keys as:

migrator.confluent.kafka-credentials.api-key
migrator.confluent.kafka-credentials.api-secret

Now that we have our Confluent Cloud configured and all our necessary information saved, let’s configure our connection to MongoDB Relational Migrator.

Connect Confluent Cloud to MongoDB Relational Migrator

Prior to this step, please ensure you have successfully downloaded Relational Migrator locally.

We are going to use our terminal to access our user.properties file located inside our Relational Migrator download and edit it accordingly to ensure a smooth connection takes place.

Use the commands to find our file in your terminal window:

1 cd ~/Library/Application\ Support /MongoDB/Relational\ Migrator/
2 ls

Once you see your user.properties file, open it with:

1 nano user.properties

Once your file is opened, we need to make some edits. At the very top of the file, uncomment the line that says:

1 spring.profiles.active: confluent

Be sure to comment out anything else in this section that is uncommented. We only want the Confluent profile active. Immediately under this section, we need to add in all our keys from above. Do it as such:

1 migrator.confluent.environment.environment-id: <Environment ID>
2 migrator.confluent.environment.cluster-id: <Cluster ID>
3 migrator.confluent.environment.bootstrap-server: <xx.region.provider.confluent.cloud:9092>
4 migrator.confluent.environment.rest-endpoint: <https://xxx.region.provider.confluent.cloud:443>
5 
6 migrator.confluent.cloud-credentials.api-key: <Cloud API key>
7 migrator.confluent.cloud-credentials.api-secret: <Cloud API Secret>
8 
9 migrator.confluent.kafka-credentials.api-key: <Cluster API key>
10 migrator.confluent.kafka-credentials.api-secret: <Cluster API Secret>

There is no need to edit anything else in this file. Just please make sure you’re using the correct server port: 8278.

Once this is properly edited, write it to the file using Ctr + O. Press enter, and exit the file using Ctr + X.

Now, once the file is saved, let’s run MongoDB Relational Migrator.

Running MongoDB Relational Migrator

We can get it up and running straight from our terminal. Use the commands shown below to do so:

1 cd "/Applications/MongoDB Relational Migrator.app/Contents/app"
2 java -jar application-1.5.0.jar

This will open Spring and the Relational Migrator in your browser:

Once Relational Migrator is running in your browser, connect it to your MySQL database:

You want to put in your host name (what we used to connect our Amazon RDS to MySQL Workbench in the beginning), the database with your data in it (mine is called amazonTest but yours will be different), and then your username and password. Hit the “Test connection” button to ensure the connection is successful. You’ll see a green bar at the bottom if it is.

Now, we want to select the tables to use. We are just going to click our database:

Then, define your initial schema. We are just going to start with a recommended MongoDB schema because it’s a little easier to work with.

Once this is done, you’ll see what your relational schema will look like once it’s migrated as documents in MongoDB Atlas!

Now, click on the “Data Migration” tab at the top of the screen. Remember we created a MongoDB cluster at the beginning of this tutorial for our sink data? We need all that connection information.

First, enter in again all your AWS RDS information that we had loaded in earlier. That is our source data, and now we are setting up our destination, or sink, database.

Enter in the MongoDB connection string for your cluster. Please ensure you are putting in the correct username and password.

Then, hit “Test connection” to make sure you can properly connect to your Atlas database.

When you first specify that you want a continuous migration, you will get this message saying you need to generate a script to do so. Click the button and a script will download and then will be placed in your MySQL Workbench. The script looks like this:

1 /*
2 * Relational Migrator needs source database to allow change data capture.
3 * The following scripts must be executed on MySQL source database before starting migration.
4 * For more details, please see https://debezium.io/documentation/reference/stable/connectors/mysql.html#setting-up-mysql
5 */
6 
7 /*
8 * Before initiating migration job, the MySQL user is required to be able to connect to the source database.
9 * This MySQL user must have appropriate permissions on all databases for which the Relational Migrator is supposed to capture changes.
10 *
11 * Connect to Amazon RDS Mysql instance, follow the below link for instructions:
12 * https://dev.mysql.com/doc/mysql-cluster-excerpt/8.0/en/mysql-cluster-replication-schema.html
13 *
14 * Grant the required permissions to the user
15 */
16 
17 GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.*  TO 'anaiya'@'%' ;
18 
19 /* Finalize the user’s permissions: */
20 FLUSH PRIVILEGES;
21 
22 
23 /* Furthermore, binary logging must be enabled for MySQL replication on AWS RDS instance. Please see the below for instructions:
24 * https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/
25 *
26 * If the instance is using the default parameter group, you will need to create a new one before you can make any changes.
27 * For MySQL RDS instances, create a Parameter Group for your chosen MySQL version.
28 * For Aurora MySQL clusters, create a DB Cluster Parameter Group for your chosen MySQL version.
29 * Edit the group and set the "binlog_format" parameter to "ROW".
30 * Make sure your database or cluster is configured to use the new Parameter Group.
31 *
32 * Please note that you must reboot the database cluster or instance to apply changes, follow below for instructions:
33 * https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_RebootCluster.html
34 */

Run this script in MySQL Workbench by hitting the lightning button. You’ll know it was successful if you don’t see any error messages in Workbench. You will also see that in Relational Migrator, the “Generate Script” message is gone, telling you that you can now use continuous snapshot.

Start it and it’ll run! Your snapshot stage will finish first, and then your continuous stage will run:

While the continuous snapshot is running, make a change in your database. I am changing the happiness score for Finland from 7.8 to 5.8:

1 UPDATE world_happiness_report
2 SET Score = 5.800
3 WHERE `country_or_region`= `Finland`
4 LIMIT 1;

Once you run your change in MySQL Workbench, click on the “Complete CDC” button in Relational Migrator.

Now, let’s check out our MongoDB Atlas cluster and see if the data is properly loaded with the correct schema and our change has been properly streamed:

As you can see, all your information from your original MySQL database has been migrated to MongoDB Atlas, and you’re capable of streaming in any changes to your database!

Conclusion

In this tutorial, we have successfully migrated your MySQL data and set up continuous data captures to MongoDB Atlas using Confluent Cloud and MongoDB Relational Migrator. This is super important since it means you are able to see real-time changes in your MongoDB Atlas database which mirrors the changes impacting your relational database.

For more information and help, please use the following resources:

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

How to Import Data Into MongoDB With mongoimport

Jun 12, 2024 | 15 min read

Podcast

Schema Suggestions with Julia Oppenheim - Podcast Episode 59

Aug 09, 2024 | 13 min

Quickstart

Building Quarkus Application with MongoDB and Panache

Dec 03, 2024 | 5 min read