Easy Migration: From Relational Database to MongoDB with MongoDB Relational Migrator
Rate this tutorial
Defining the process of data migration from a relational database to MongoDB has always been a complex task. Some have opted for a custom approach, adopting custom solutions such as scripts, whereas others have preferred to use third-party tools.
It is in this context that the Relational Migrator enters the picture, melting the complexity of this transition from a relational database to MongoDB as naturally as the sun melts the snow.
In the context of a relational database to MongoDB migration project, several questions arise — for example:
- What tool should you use to best perform this migration?
- How can this migration process be made time-optimal for a medium/large size database?
- How will the data need to be modeled on MongoDB?
- How much time/resources will it take to restructure SQL queries to MQL?
Consider the following architecture, as an example:
We present a first approach, which will not make use of Relational Migrator but instead will use a third-party tool to orchestrate the migration within this architecture: Logstash.
We will make observations and define what could be the limitations of this tool.
Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash."
This tool effectively achieves the goal by dynamically performing the following operations:
- Ingestion of data from the source — PostgreSQL
- Data transformation — Logstash
- Distribution of the transformed data to the destination — MongoDB
Great! So, it will be possible to migrate data and benefit from high flexibility in its transformation, and we also assume relatively short time frames because we've done some very good tuning, but different pipelines will have to be defined manually.
Let us concretize with an example what we have been telling ourselves so far by considering the following scheme:
matriculation | name | surname | birthdate | address | phone | |
---|---|---|---|---|---|---|
1 | Mario | Rossi | 1990-05-15 | Via Roma, 123 | mario.rossi@email.com | 123-456-7890 |
2 | Laura | Bianchi | 1992-08-22 | Via Garibaldi, 56 | laura.b@email.com | 987-654-3210 |
3 | Giuseppe | Verdi | 1991-03-10 | Via Dante, 78 | giuseppe.verdi@email | 555-123-4567 |
4 | Sofia | Conti | 1993-11-30 | Via Milano, 45 | sofia.conti@email.com | 333-555-9999 |
5 | Alessia | Moretti | 1992-06-18 | Via Firenze, 7 | alessia.more@email.com | 111-222-3333 |
6 | Luca | Ferrari | 1994-09-25 | Via Venezia, 21 | luca.f@email.com | 444-777-8888 |
7 | Marta | Romano | 1990-02-03 | Via Bologna, 34 | marta.rom@email.com | 999-888-7777 |
8 | Marco | Galli | 1993-07-12 | Via Genova, 56 | marco.galli@email.com | 666-333-1111 |
9 | Elena | Piazza | 1995-04-09 | Via Torino, 78 | elena.p@email.com | 222-999-4444 |
10 | Andrea | Mancini | 1991-12-28 | Via Padova, 12 | andrea.m@email.com | 777-111-5555 |
Enrollment table:
enrollmentid | studentmatriculation | coursecode | enrollmentdate |
---|---|---|---|
11 | 1 | CS101 | 2023-01-15 |
12 | 2 | LI301 | 2023-02-02 |
13 | 3 | MA201 | 2023-03-10 |
14 | 1 | CH401 | 2023-04-05 |
15 | 4 | CS101 | 2023-04-15 |
16 | 5 | PH501 | 2023-05-20 |
17 | 6 | EC601 | 2023-06-08 |
18 | 7 | HI701 | 2023-07-12 |
19 | 8 | EN801 | 2023-08-05 |
20 | 9 | MU901 | 2023-09-01 |
Course table:
Coursecode | coursename | credits | department |
---|---|---|---|
CS101 | Informatics | 4 | Informatics |
MA201 | Mathematics | 3 | Mathematics |
LI301 | Italian Language | 5 | Letters |
CH401 | Chemistry | 4 | Chemistry |
PH501 | Physics | 4 | Physics |
EC601 | Economics | 5 | Economics |
HI701 | History | 3 | History |
EN801 | English | 4 | Languages |
MU901 | Music | 2 | Music |
GE1001 | Geography | 3 | Geography |
Our goal, based on an analysis done previously, is to model all of these tables in a single document. Therefore, we will proceed by defining, for Logstash, the pipeline which will consist in the definition of:
- Input stage: will report the basic parameters for connecting to PostgreSQL as the driver, the connection string, user, password, and statement
- Filter stage: will define how the data should be modeled
- Output stage: will report the basic parameters for connecting to MongoDB as a connection string, database, and collection
This will allow us to generate documents in accordance with the desired specification:
1 input { 2 jdbc { 3 jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/postgresql-42.7.0.jar" 4 jdbc_driver_class => "org.postgresql.Driver" 5 jdbc_connection_string => "jdbc:postgresql://postgresql.test.com:5432/formigration?sslmode=disable" 6 jdbc_user => "migratoruser" 7 jdbc_password => "test" 8 statement => "SELECT * 9 FROM enrollment INNER JOIN student ON student.matriculation=enrollment.studentmatriculation 10 INNER JOIN course ON course.coursecode=enrollment.coursecode 11 ORDER BY student.matriculation" 12 } 13 } 14 15 filter { 16 aggregate { 17 task_id => "%{matriculation}" 18 code => " 19 map['matriculation'] = event.get('matriculation') 20 map['address'] = event.get('address') 21 map['birthdate'] = event.get('birthdate') 22 map['email'] = event.get('email') 23 map['name'] = event.get('name') 24 map['phone'] = event.get('phone') 25 map['surname'] = event.get('surname') 26 map['enrollments'] ||= [] 27 map['enrollments'] << {'enrollmentid' => event.get('enrollmentid'), 'studentmatriculation' => event.get('studentmatriculation'), 'coursecode' => event.get('coursecode'), 'enrollmentdate' => event.get('enrollmentdate'), 'course' => {'coursecode' => event.get('coursecode'), 'coursename' => event.get('coursename'), 'credits'=> event.get('credits'), 'department' => event.get('department')} } 28 event.cancel() 29 " 30 push_previous_map_as_event => true 31 } 32 } 33 34 output { 35 mongodb { 36 uri => "mongodb://mongodbstandalone.test.com:27017/test" 37 database => "test" 38 collection => "test_migration" 39 isodate => true 40 } 41 }
And the result will be as follows:
This results in the data modeled as documents, but it is also necessary to draw attention to some points:
- How much time/resources will it take to restructure SQL queries into MQL?
- How much effort was required to model the filter part?
- No data can be written to the source during the migration, otherwise the consistency of the data is lost.
- It will also be useful to define a custom solution for data validation.
Also, it should be noted that Logstash does not allow direct migration of data to the target MongoDB cluster. It is necessary to go through an intermediate machine and then perform a dump/restore from the standalone MongoDB to the target MongoDB Atlas cluster, which will introduce additional effort into the migration.
Referring to the last part of this migration process, it is appropriate to ask:
Would customers be inclined to invest additional economic resources for a bridge server? (Perhaps there might be budget constraints).
Having made several considerations about this architecture, we can move on to discuss the next architecture.
Let's consider this other architecture:
To orchestrate the migration within this architecture, we chose to adopt Relational Migrator, the official product of MongoDB. But what is Relational Migrator?
MongoDB Relational Migrator is a tool to help you migrate relational workloads to MongoDB. Relational Migrator allows you to:
- Design an effective MongoDB schema, derived from an existing relational schema.
- Migrate data from Oracle, SQL Server, MySQL, and PostgreSQL to MongoDB while transforming to the target schema.
- Generate code artifacts to reduce the time required to update application code.
Taking as reference the schema and tables defined previously, once the connection to the relational database is made, we will be able to see, as in the following picture, the visual representation of the relational schema at the top, and below that, the collection modeled by Relational Migrator:
Once the data model is defined, it will be possible and very easy to define a new sync job. We need to configure the source, destination, and migration option. It will also be possible to set a data validation check and drop the collections that might be the residue of some steps done previously.
Once everything is configured, simply start the synchronization job, getting the following result:
Last but not least, we have the possibility to use the code app generator and the query converter in private preview!
Armed with what we've discussed so far about the features of the two products, here’s what we can say:
- Relational Migrator
- Its ease of configuration provides a significant advantage to users, saving them time and effort in implementation.
- Its intuitive interface and advanced features make the overall experience more efficient and effective.
- It represents an ideal solution for those seeking a user-friendly experience.
- The user can contact support for product-related issues.
- Logstash
- It allows great flexibility in document modeling.
- It is possible to define multiple pipelines and run more than one pipeline in the same process.
Regardless of the solution that will be used to migrate, you will be able to contact MongoDB support to make a successful migration to its infrastructure.