Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Easy Migration: From Relational Database to MongoDB with MongoDB Relational Migrator

Fabio Ramohitaj6 min read • Published Jan 04, 2024 • Updated Jan 04, 2024
MongoDB
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Defining the process of data migration from a relational database to MongoDB has always been a complex task. Some have opted for a custom approach, adopting custom solutions such as scripts, whereas others have preferred to use third-party tools.
It is in this context that the Relational Migrator enters the picture, melting the complexity of this transition from a relational database to MongoDB as naturally as the sun melts the snow.

How Relational Migrator comes to our help

In the context of a relational database to MongoDB migration project, several questions arise — for example:
  • What tool should you use to best perform this migration?
  • How can this migration process be made time-optimal for a medium/large size database?
  • How will the data need to be modeled on MongoDB?
  • How much time/resources will it take to restructure SQL queries to MQL?
Consider the following architecture, as an example:
Data Architecture Example
We present a first approach, which will not make use of Relational Migrator but instead will use a third-party tool to orchestrate the migration within this architecture: Logstash.
We will make observations and define what could be the limitations of this tool.
What exactly is Logstash? For a correct definition, we can refer to the official documentation:
Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash."
This tool effectively achieves the goal by dynamically performing the following operations:
  • Ingestion of data from the source — PostgreSQL
  • Data transformation — Logstash
  • Distribution of the transformed data to the destination — MongoDB
Great! So, it will be possible to migrate data and benefit from high flexibility in its transformation, and we also assume relatively short time frames because we've done some very good tuning, but different pipelines will have to be defined manually.
Let us concretize with an example what we have been telling ourselves so far by considering the following scheme:
Database Schema for Student Enrollment
matriculationnamesurnamebirthdateaddressemailphone
1MarioRossi1990-05-15Via Roma, 123mario.rossi@email.com123-456-7890
2LauraBianchi1992-08-22Via Garibaldi, 56laura.b@email.com987-654-3210
3GiuseppeVerdi1991-03-10Via Dante, 78giuseppe.verdi@email555-123-4567
4SofiaConti1993-11-30Via Milano, 45sofia.conti@email.com333-555-9999
5AlessiaMoretti1992-06-18Via Firenze, 7alessia.more@email.com111-222-3333
6LucaFerrari1994-09-25Via Venezia, 21luca.f@email.com444-777-8888
7MartaRomano1990-02-03Via Bologna, 34marta.rom@email.com999-888-7777
8MarcoGalli1993-07-12Via Genova, 56marco.galli@email.com666-333-1111
9ElenaPiazza1995-04-09Via Torino, 78elena.p@email.com222-999-4444
10AndreaMancini1991-12-28Via Padova, 12andrea.m@email.com777-111-5555
Enrollment table:
enrollmentidstudentmatriculationcoursecodeenrollmentdate
111CS1012023-01-15
122LI3012023-02-02
133MA2012023-03-10
141CH4012023-04-05
154CS1012023-04-15
165PH5012023-05-20
176EC6012023-06-08
187HI7012023-07-12
198EN8012023-08-05
209MU9012023-09-01
Course table:
Coursecodecoursenamecreditsdepartment
CS101Informatics4Informatics
MA201Mathematics3Mathematics
LI301Italian Language5Letters
CH401Chemistry4Chemistry
PH501Physics4Physics
EC601Economics5Economics
HI701History3History
EN801English4Languages
MU901Music2Music
GE1001Geography3Geography
Our goal, based on an analysis done previously, is to model all of these tables in a single document. Therefore, we will proceed by defining, for Logstash, the pipeline which will consist in the definition of:
  • Input stage: will report the basic parameters for connecting to PostgreSQL as the driver, the connection string, user, password, and statement
  • Filter stage: will define how the data should be modeled
  • Output stage: will report the basic parameters for connecting to MongoDB as a connection string, database, and collection
This will allow us to generate documents in accordance with the desired specification:
1input {
2 jdbc {
3 jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/postgresql-42.7.0.jar"
4 jdbc_driver_class => "org.postgresql.Driver"
5 jdbc_connection_string => "jdbc:postgresql://postgresql.test.com:5432/formigration?sslmode=disable"
6 jdbc_user => "migratoruser"
7 jdbc_password => "test"
8 statement => "SELECT *
9 FROM enrollment INNER JOIN student ON student.matriculation=enrollment.studentmatriculation
10 INNER JOIN course ON course.coursecode=enrollment.coursecode
11 ORDER BY student.matriculation"
12 }
13 }
14
15 filter {
16 aggregate {
17 task_id => "%{matriculation}"
18 code => "
19 map['matriculation'] = event.get('matriculation')
20 map['address'] = event.get('address')
21 map['birthdate'] = event.get('birthdate')
22 map['email'] = event.get('email')
23 map['name'] = event.get('name')
24 map['phone'] = event.get('phone')
25 map['surname'] = event.get('surname')
26 map['enrollments'] ||= []
27 map['enrollments'] << {'enrollmentid' => event.get('enrollmentid'), 'studentmatriculation' => event.get('studentmatriculation'), 'coursecode' => event.get('coursecode'), 'enrollmentdate' => event.get('enrollmentdate'), 'course' => {'coursecode' => event.get('coursecode'), 'coursename' => event.get('coursename'), 'credits'=> event.get('credits'), 'department' => event.get('department')} }
28 event.cancel()
29 "
30 push_previous_map_as_event => true
31 }
32 }
33
34 output {
35 mongodb {
36 uri => "mongodb://mongodbstandalone.test.com:27017/test"
37 database => "test"
38 collection => "test_migration"
39 isodate => true
40 }
41 }
And the result will be as follows:
MongoDB Schema Example
This results in the data modeled as documents, but it is also necessary to draw attention to some points:
  • How much time/resources will it take to restructure SQL queries into MQL?
  • How much effort was required to model the filter part?
  • No data can be written to the source during the migration, otherwise the consistency of the data is lost.
  • It will also be useful to define a custom solution for data validation.
Also, it should be noted that Logstash does not allow direct migration of data to the target MongoDB cluster. It is necessary to go through an intermediate machine and then perform a dump/restore from the standalone MongoDB to the target MongoDB Atlas cluster, which will introduce additional effort into the migration.
Referring to the last part of this migration process, it is appropriate to ask:
Would customers be inclined to invest additional economic resources for a bridge server? (Perhaps there might be budget constraints).
Having made several considerations about this architecture, we can move on to discuss the next architecture.
Let's consider this other architecture:
MongoDB Relational Migrator Architecture
To orchestrate the migration within this architecture, we chose to adopt Relational Migrator, the official product of MongoDB. But what is Relational Migrator?
MongoDB Relational Migrator is a tool to help you migrate relational workloads to MongoDB. Relational Migrator allows you to:
  • Design an effective MongoDB schema, derived from an existing relational schema.
  • Migrate data from Oracle, SQL Server, MySQL, and PostgreSQL to MongoDB while transforming to the target schema.
  • Generate code artifacts to reduce the time required to update application code.
Taking as reference the schema and tables defined previously, once the connection to the relational database is made, we will be able to see, as in the following picture, the visual representation of the relational schema at the top, and below that, the collection modeled by Relational Migrator:
Schema Modeled by MongoDB Relational Migrator
Once the data model is defined, it will be possible and very easy to define a new sync job. We need to configure the source, destination, and migration option. It will also be possible to set a data validation check and drop the collections that might be the residue of some steps done previously.
Migration Job Instructions
Once everything is configured, simply start the synchronization job, getting the following result:
Relational Migrator Results
Last but not least, we have the possibility to use the code app generator and the query converter in private preview!
Code Generator and Query Converter
Code Generator and Query Converter

Conclusion

Armed with what we've discussed so far about the features of the two products, here’s what we can say:
  • Relational Migrator
    • Its ease of configuration provides a significant advantage to users, saving them time and effort in implementation.
    • Its intuitive interface and advanced features make the overall experience more efficient and effective.
    • It represents an ideal solution for those seeking a user-friendly experience.
    • The user can contact support for product-related issues.
  • Logstash
    • It allows great flexibility in document modeling.
    • It is possible to define multiple pipelines and run more than one pipeline in the same process.
Regardless of the solution that will be used to migrate, you will be able to contact MongoDB support to make a successful migration to its infrastructure.
If you have questions or comments, you can find us in the MongoDB Developer Community!

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Article

Bloated Documents


Oct 01, 2024 | 6 min read
Quickstart

Introduction to MongoDB and Helidon


Nov 04, 2024 | 6 min read
Article

Resumable Initial Sync in MongoDB 4.4


Sep 11, 2024 | 5 min read
Article

Massive Number of Collections


Oct 01, 2024 | 6 min read
Table of Contents
  • How Relational Migrator comes to our help