Easy Migration: From Relational Database to MongoDB with MongoDB Relational Migrator

Fabio Ramohitaj6 min read • Published Jan 04, 2024 • Updated Jan 04, 2024

MongoDB

Rate this tutorial

Defining the process of data migration from a relational database to MongoDB has always been a complex task. Some have opted for a custom approach, adopting custom solutions such as scripts, whereas others have preferred to use third-party tools.

It is in this context that the Relational Migrator enters the picture, melting the complexity of this transition from a relational database to MongoDB as naturally as the sun melts the snow.

How Relational Migrator comes to our help

In the context of a relational database to MongoDB migration project, several questions arise — for example:

What tool should you use to best perform this migration?
How can this migration process be made time-optimal for a medium/large size database?
How will the data need to be modeled on MongoDB?
How much time/resources will it take to restructure SQL queries to MQL?

Consider the following architecture, as an example:

We present a first approach, which will not make use of Relational Migrator but instead will use a third-party tool to orchestrate the migration within this architecture: Logstash.

We will make observations and define what could be the limitations of this tool.

What exactly is Logstash? For a correct definition, we can refer to the official documentation:

Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash."

This tool effectively achieves the goal by dynamically performing the following operations:

Ingestion of data from the source — PostgreSQL
Data transformation — Logstash
Distribution of the transformed data to the destination — MongoDB

Great! So, it will be possible to migrate data and benefit from high flexibility in its transformation, and we also assume relatively short time frames because we've done some very good tuning, but different pipelines will have to be defined manually.

Let us concretize with an example what we have been telling ourselves so far by considering the following scheme:

matriculation	name	surname	birthdate	address	email	phone
1	Mario	Rossi	1990-05-15	Via Roma, 123	mario.rossi@email.com	123-456-7890
2	Laura	Bianchi	1992-08-22	Via Garibaldi, 56	laura.b@email.com	987-654-3210
3	Giuseppe	Verdi	1991-03-10	Via Dante, 78	giuseppe.verdi@email	555-123-4567
4	Sofia	Conti	1993-11-30	Via Milano, 45	sofia.conti@email.com	333-555-9999
5	Alessia	Moretti	1992-06-18	Via Firenze, 7	alessia.more@email.com	111-222-3333
6	Luca	Ferrari	1994-09-25	Via Venezia, 21	luca.f@email.com	444-777-8888
7	Marta	Romano	1990-02-03	Via Bologna, 34	marta.rom@email.com	999-888-7777
8	Marco	Galli	1993-07-12	Via Genova, 56	marco.galli@email.com	666-333-1111
9	Elena	Piazza	1995-04-09	Via Torino, 78	elena.p@email.com	222-999-4444
10	Andrea	Mancini	1991-12-28	Via Padova, 12	andrea.m@email.com	777-111-5555

Enrollment table:

enrollmentid	studentmatriculation	coursecode	enrollmentdate
11	1	CS101	2023-01-15
12	2	LI301	2023-02-02
13	3	MA201	2023-03-10
14	1	CH401	2023-04-05
15	4	CS101	2023-04-15
16	5	PH501	2023-05-20
17	6	EC601	2023-06-08
18	7	HI701	2023-07-12
19	8	EN801	2023-08-05
20	9	MU901	2023-09-01

Course table:

Coursecode	coursename	credits	department
CS101	Informatics	4	Informatics
MA201	Mathematics	3	Mathematics
LI301	Italian Language	5	Letters
CH401	Chemistry	4	Chemistry
PH501	Physics	4	Physics
EC601	Economics	5	Economics
HI701	History	3	History
EN801	English	4	Languages
MU901	Music	2	Music
GE1001	Geography	3	Geography

Our goal, based on an analysis done previously, is to model all of these tables in a single document. Therefore, we will proceed by defining, for Logstash, the pipeline which will consist in the definition of:

Input stage: will report the basic parameters for connecting to PostgreSQL as the driver, the connection string, user, password, and statement
Filter stage: will define how the data should be modeled
Output stage: will report the basic parameters for connecting to MongoDB as a connection string, database, and collection

This will allow us to generate documents in accordance with the desired specification:

1 input {
2    jdbc {
3  	jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/postgresql-42.7.0.jar"
4  	jdbc_driver_class => "org.postgresql.Driver"
5  	jdbc_connection_string => "jdbc:postgresql://postgresql.test.com:5432/formigration?sslmode=disable"
6  	jdbc_user => "migratoruser"
7  	jdbc_password => "test"
8  	statement => "SELECT *
9                	FROM enrollment INNER JOIN student ON student.matriculation=enrollment.studentmatriculation
10                                	INNER JOIN course ON course.coursecode=enrollment.coursecode
11                                	ORDER BY student.matriculation"
12    }
13  }
14  
15  filter {
16    aggregate {
17  	task_id => "%{matriculation}"
18  	code => "
19          map['matriculation'] = event.get('matriculation')
20      	map['address'] = event.get('address')
21      	map['birthdate'] = event.get('birthdate')
22      	map['email'] = event.get('email')
23      	map['name'] = event.get('name')
24      	map['phone'] = event.get('phone')
25      	map['surname'] = event.get('surname')
26      	map['enrollments'] ||= []
27      	map['enrollments'] << {'enrollmentid' => event.get('enrollmentid'),  'studentmatriculation' => event.get('studentmatriculation'),  'coursecode' => event.get('coursecode'),  'enrollmentdate' => event.get('enrollmentdate'), 'course' =>  {'coursecode' => event.get('coursecode'), 'coursename' => event.get('coursename'), 'credits'=> event.get('credits'), 'department' => event.get('department')} }
28      	event.cancel()
29     "
30     push_previous_map_as_event => true
31   }
32  }
33 
34  output {
35    mongodb {
36  	uri => "mongodb://mongodbstandalone.test.com:27017/test"
37  	database => "test"
38  	collection => "test_migration"
39  	isodate => true
40    }
41  }

And the result will be as follows:

This results in the data modeled as documents, but it is also necessary to draw attention to some points:

How much time/resources will it take to restructure SQL queries into MQL?
How much effort was required to model the filter part?
No data can be written to the source during the migration, otherwise the consistency of the data is lost.
It will also be useful to define a custom solution for data validation.

Also, it should be noted that Logstash does not allow direct migration of data to the target MongoDB cluster. It is necessary to go through an intermediate machine and then perform a dump/restore from the standalone MongoDB to the target MongoDB Atlas cluster, which will introduce additional effort into the migration.

Referring to the last part of this migration process, it is appropriate to ask:

Would customers be inclined to invest additional economic resources for a bridge server? (Perhaps there might be budget constraints).

Having made several considerations about this architecture, we can move on to discuss the next architecture.

Let's consider this other architecture:

To orchestrate the migration within this architecture, we chose to adopt Relational Migrator, the official product of MongoDB. But what is Relational Migrator?

MongoDB Relational Migrator is a tool to help you migrate relational workloads to MongoDB. Relational Migrator allows you to:

Design an effective MongoDB schema, derived from an existing relational schema.
Migrate data from Oracle, SQL Server, MySQL, and PostgreSQL to MongoDB while transforming to the target schema.
Generate code artifacts to reduce the time required to update application code.

Taking as reference the schema and tables defined previously, once the connection to the relational database is made, we will be able to see, as in the following picture, the visual representation of the relational schema at the top, and below that, the collection modeled by Relational Migrator:

Once the data model is defined, it will be possible and very easy to define a new sync job. We need to configure the source, destination, and migration option. It will also be possible to set a data validation check and drop the collections that might be the residue of some steps done previously.

Once everything is configured, simply start the synchronization job, getting the following result:

Last but not least, we have the possibility to use the code app generator and the query converter in private preview!

Conclusion

Armed with what we've discussed so far about the features of the two products, here’s what we can say:

Relational Migrator
- Its ease of configuration provides a significant advantage to users, saving them time and effort in implementation.
- Its intuitive interface and advanced features make the overall experience more efficient and effective.
- It represents an ideal solution for those seeking a user-friendly experience.
- The user can contact support for product-related issues.
Logstash
- It allows great flexibility in document modeling.
- It is possible to define multiple pipelines and run more than one pipeline in the same process.

Regardless of the solution that will be used to migrate, you will be able to contact MongoDB support to make a successful migration to its infrastructure.

If you have questions or comments, you can find us in the MongoDB Developer Community!

Rate this tutorial

Article

Bloated Documents

Oct 01, 2024 | 6 min read

Quickstart

Introduction to MongoDB and Helidon

Nov 04, 2024 | 6 min read

Article

Resumable Initial Sync in MongoDB 4.4

Sep 11, 2024 | 5 min read

Article

Massive Number of Collections

Oct 01, 2024 | 6 min read

How Relational Migrator comes to our help
Conclusion

MongoDB

Easy Migration: From Relational Database to MongoDB with MongoDB Relational Migrator

How Relational Migrator comes to our help

Conclusion

Related

Bloated Documents

Introduction to MongoDB and Helidon

Resumable Initial Sync in MongoDB 4.4

Massive Number of Collections

Table of Contents

1	input {
2	jdbc {
3	jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/postgresql-42.7.0.jar"
4	jdbc_driver_class => "org.postgresql.Driver"
5	jdbc_connection_string => "jdbc:postgresql://postgresql.test.com:5432/formigration?sslmode=disable"
6	jdbc_user => "migratoruser"
7	jdbc_password => "test"
8	statement => "SELECT *
9	FROM enrollment INNER JOIN student ON student.matriculation=enrollment.studentmatriculation
10	INNER JOIN course ON course.coursecode=enrollment.coursecode
11	ORDER BY student.matriculation"
12	}
13	}
14
15	filter {
16	aggregate {
17	task_id => "%{matriculation}"
18	code => "
19	map['matriculation'] = event.get('matriculation')
20	map['address'] = event.get('address')
21	map['birthdate'] = event.get('birthdate')
22	map['email'] = event.get('email')
23	map['name'] = event.get('name')
24	map['phone'] = event.get('phone')
25	map['surname'] = event.get('surname')
26	map['enrollments'] \|\|= []
27	map['enrollments'] << {'enrollmentid' => event.get('enrollmentid'), 'studentmatriculation' => event.get('studentmatriculation'), 'coursecode' => event.get('coursecode'), 'enrollmentdate' => event.get('enrollmentdate'), 'course' => {'coursecode' => event.get('coursecode'), 'coursename' => event.get('coursename'), 'credits'=> event.get('credits'), 'department' => event.get('department')} }
28	event.cancel()
29	"
30	push_previous_map_as_event => true
31	}
32	}
33
34	output {
35	mongodb {
36	uri => "mongodb://mongodbstandalone.test.com:27017/test"
37	database => "test"
38	collection => "test_migration"
39	isodate => true
40	}
41	}