Mapping Terms and Concepts from SQL to MongoDB
Rate this article
Perhaps, like me, you grew up on SQL databases. You can skillfully
normalize a database, and, after years of working with tables, you think
in rows and columns as well.
But now you've decided to dip your toe into the wonderful world of NoSQL
databases, and you're exploring MongoDB. Perhaps you're wondering what
you need to do differently. Can you just translate your rows and columns
into fields and values and call it a day? Do you really need to change
the way you think about storing your data?
We'll answer those questions and more in this three-part article series.
Below is a summary of what we'll cover today:
I'm a huge fan of the best tv show ever created: Parks and Recreation.
Yes, I wrote that previous sentence as if it were a fact, because it
actually is.
This is Ron. Ron likes strong women, bacon, and staying off the grid.
In season 6, Ron discovers Yelp. Ron thinks Yelp
is amazing, because he loves the idea of reviewing places he's been.
However, Yelp is way too "on the grid" for Ron. He pulls out his beloved
typewriter and starts typing reviews that he intends to send via snail
mail.
Ron writes some amazing reviews. Below is one of my favorites.
Unfortunately, I see three big problems with his plan:
- Snail mail is way slower than posting the review to Yelp where it will be instantly available for anyone to read.
- The business he is reviewing may never open the letter he sends as they may just assume it's junk mail.
- No one else will benefit from his review. (These are exactly the type of reviews I like to find on Amazon!)
Ok, so why am I talking about Ron in the middle of this article about
moving from SQL to MongoDB?
Ron saw the value of Yelp and was inspired by the new technology.
However, he brought his old-school ways with him and did not realize the
full value of the technology.
This is similar to what we commonly see as people move from a SQL
database to a NoSQL database such as MongoDB. They love the idea of
MongoDB, and they are inspired by the power of the flexible document
data model. However, they frequently bring with them their SQL mindsets
and don't realize the full value of MongoDB. In fact, when people don't
change the way they think about modeling their data, they struggle and
sometimes fail.
Don't be like Ron. (At least in this case, because, in most cases, Ron
is amazing.) Don't be stuck in your SQL ways. Change your mindset and
realize the full value of MongoDB.
Before we jump into how to change your mindset, let's begin by answering
some common questions about non-relational databases and discussing the
basics of how to store data in MongoDB.
When I talk with developers, they often ask me questions like, "What use
cases are good for MongoDB?" Developers often have this feeling that
non-relational
databases
(or NoSQL databases) like MongoDB are for specific, niche use cases.
MongoDB is a general-purpose database that can be used in a variety of
use cases across nearly every industry. For more details, see MongoDB
Use Cases, MongoDB
Industries, and the MongoDB Use
Case Guidance
Whitepaper
that includes a summary of when you should evaluate other database
options.
MongoDB is considered a non-relational database. However, that doesn't
mean MongoDB doesn't store relationship data well. (I know I just used a
double-negative. Stick with me.) MongoDB stores relationship data in a
different way. In fact, many consider the way MongoDB stores
relationship data to be more intuitive and more reflective of the
real-world relationships that are being modeled.
Let's take a look at how MongoDB stores data.
Instead of tables, MongoDB stores data in documents. No, Clippy, I'm not
talking about Microsoft Word Documents.
I'm talking about BSON
documents. BSON is a
binary representation of JSON (JavaScript Object Notation)
documents.
Documents will likely feel comfortable to you if you've used any of the
C-family of programming languages such as C, C#, Go, Java, JavaScript,
PHP, or Python.
Documents typically store information about one object as well as any
information related to that object. Related documents are grouped
together in collections. Related collections are grouped together and
stored in a database.
Let's discuss some of the basics of a document. Every document begins
and ends with curly braces.
1 { 2 }
Inside of those curly braces, you'll find an unordered set of
field/value pairs that are separated by commas.
1 { 2 field: value, 3 field: value, 4 field: value 5 }
The fields are strings that describe the pieces of data being stored.
The values can be any of the BSON data types.
BSON has a variety of data
types including
Double, String, Object, Array, Binary Data, ObjectId, Boolean, Date,
Null, Regular Expression, JavaScript, JavaScript (with scope), 32-bit
Integer, Timestamp, 64-bit Integer, Decimal128, Min Key, and Max Key.
With all of these types available for you to use, you have the power to
model your data as it exists in the real world.
Every document is required to have a field named
_id. The
value of
_id
must be unique for each document in a collection, is
immutable, and can be of any type other than an array.Ok, that's enough definitions. Let's take a look at a real example, and
compare and contrast how we would model the data in SQL vs MongoDB.
Let's say we need to store information about a user named Leslie. We'll
store her contact information including her first name, last name, cell
phone number, and city. We'll also store some extra information about
her including her location, hobbies, and job history.
Let's begin with Leslie's contact information. When using SQL, we'll
create a table named
Users
. We can create columns for each piece of
contact information we need to store: first name, last name, cell phone
number, and city. To ensure we have a unique way to identify each row,
we'll include an ID column.Users
ID | first_name | last_name | cell | city |
---|---|---|---|---|
1 | Leslie | Yepp | 8125552344 | Pawnee |
Now let's store that same information in MongoDB. We can create a new
document for Leslie where we'll add field/value pairs for each piece of
contact information we need to store. We'll use
_id
to uniquely
identify each document. We'll store this document in a collection named
Users
.Users
1 { 2 "_id": 1, 3 "first_name": "Leslie", 4 "last_name": "Yepp", 5 "cell": "8125552344", 6 "city": "Pawnee" 7 }
Now that we've stored Leslie's contact information, let's store the
coordinates of her current location.
When using SQL, we'll need to split the latitude and longitude between
two columns.
Users
ID | first_name | last_name | cell | city | latitude | longitude |
---|---|---|---|---|---|---|
1 | Leslie | Yepp | 8125552344 | Pawnee | 39.170344 | -86.536632 |
MongoDB has an array data type, so we can store the latitude and
longitude together in a single field.
Users
1 { 2 "_id": 1, 3 "first_name": "Leslie", 4 "last_name": "Yepp", 5 "cell": "8125552344", 6 "city": "Pawnee", 7 "location": [ -86.536632, 39.170344 ] 8 }
Bonus Tip: MongoDB has a few different built-in ways to visualize
location data including the schema analyzer in MongoDB
Compass
and the Geospatial Charts in MongoDB
Charts.
I generated the map below with just a few clicks in MongoDB Charts.
We're successfully storing Leslie's contact information and current
location. Now let's store her hobbies.
When using SQL, we could choose to add more columns to the Users table.
However, since a single user could have many hobbies (meaning we need to
represent a one-to-many relationship), we're more likely to create a
separate table just for hobbies. Each row in the table will contain
information about one hobby for one user. When we need to retrieve
Leslie's hobbies, we'll join the
Users
table and our new Hobbies
table.Hobbies
ID | user_id | hobby |
---|---|---|
10 | 1 | scrapbooking |
11 | 1 | eating waffles |
12 | 1 | working |
Since MongoDB supports arrays, we can simply add a new field named
"hobbies" to our existing document. The array can contain as many or as
few hobbies as we need (assuming we don't exceed the 16 megabyte
document size
limit).
When we need to retrieve Leslie's hobbies, we don't need to do an
expensive join to bring the data together; we can simply retrieve her
document in the
Users
collection.Users
1 { 2 "_id": 1, 3 "first_name": "Leslie", 4 "last_name": "Yepp", 5 "cell": "8125552344", 6 "city": "Pawnee", 7 "location": [ -86.536632, 39.170344 ], 8 "hobbies": ["scrapbooking", "eating waffles", "working"] 9 }
Let's say we also need to store Leslie's job history.
Just as we did with hobbies, we're likely to create a separate table
just for job history information. Each row in the table will contain
information about one job for one user.
ID | user_id | job_title | year_started |
---|---|---|---|
20 | 1 | "Deputy Director" | 2004 |
21 | 1 | "City Councillor" | 2012 |
22 | 1 | "Director, National Parks Service, Midwest Branch" | 2014 |
So far in this article, we've used arrays in MongoDB to store
geolocation data and a list of Strings. Arrays can contain values of any
type, including objects. Let's create a document for each job Leslie has
held and store those documents in an array.
Users
1 { 2 "_id": 1, 3 "first_name": "Leslie", 4 "last_name": "Yepp", 5 "cell": "8125552344", 6 "city": "Pawnee", 7 "location": [ -86.536632, 39.170344 ], 8 "hobbies": ["scrapbooking", "eating waffles", "working"], 9 "jobHistory": [ 10 { 11 "title": "Deputy Director", 12 "yearStarted": 2004 13 }, 14 { 15 "title": "City Councillor", 16 "yearStarted": 2012 17 }, 18 { 19 "title": "Director, National Parks Service, Midwest Branch", 20 "yearStarted": 2014 21 } 22 ] 23 }
Now that we've decided how we'll store information about our users in
both tables and documents, let's store information about Ron. Ron will
have almost all of the same information as Leslie. However, Ron does his
best to stay off the grid, so he will not be storing his location in the
system.
Let's begin by examining how we would store Ron's information in the
same tables that we used for Leslie's. When using SQL, we are required
to input a value for every cell in the table. We will represent Ron's
lack of location data with
NULL
. The problem with using NULL
is that
it's unclear whether the data does not exist or if the data is unknown,
so many people discourage the use of NULL
.Users
ID | first_name | last_name | cell | city | latitude | longitude |
---|---|---|---|---|---|---|
1 | Leslie | Yepp | 8125552344 | Pawnee | 39.170344 | -86.536632 |
2 | Ron | Swandaughter | 8125559347 | Pawnee | NULL | NULL |
Hobbies
ID | user_id | hobby |
---|---|---|
10 | 1 | scrapbooking |
11 | 1 | eating waffles |
12 | 1 | working |
13 | 2 | woodworking |
14 | 2 | fishing |
JobHistory
ID | user_id | job_title | year_started |
---|---|---|---|
20 | 1 | "Deputy Director" | 2004 |
21 | 1 | "City Councillor" | 2012 |
22 | 1 | "Director, National Parks Service, Midwest Branch" | 2014 |
23 | 2 | "Director" | 2002 |
24 | 2 | "CEO, Kinda Good Building Company" | 2014 |
25 | 2 | "Superintendent, Pawnee National Park" | 2018 |
In MongoDB, we have the option of representing Ron's lack of location
data in two ways: we can omit the
location
field from the document or
we can set location
to null
. Best practices suggest that we omit the
location
field to save space. You can choose if you want omitted
fields and fields set to null
to represent different things in your
applications.Users
1 { 2 "_id": 2, 3 "first_name": "Ron", 4 "last_name": "Swandaughter", 5 "cell": "8125559347", 6 "city": "Pawnee", 7 "hobbies": ["woodworking", "fishing"], 8 "jobHistory": [ 9 { 10 "title": "Director", 11 "yearStarted": 2002 12 }, 13 { 14 "title": "CEO, Kinda Good Building Company", 15 "yearStarted": 2014 16 }, 17 { 18 "title": "Superintendent, Pawnee National Park", 19 "yearStarted": 2018 20 } 21 ] 22 }
Let's say we are feeling pretty good about our data models and decide to
launch our apps using them.
Then we discover we need to store information about a new user: Lauren
Burhug. She's a fourth grade student who Ron teaches about government.
We need to store a lot of the same information about Lauren as we did
with Leslie and Ron: her first name, last name, city, and hobbies.
However, Lauren doesn't have a cell phone, location data, or job
history. We also discover that we need to store a new piece of
information: her school.
Let's begin by storing Lauren's information in the SQL tables as they
already exist.
Users
ID | first_name | last_name | cell | city | latitude | longitude |
---|---|---|---|---|---|---|
1 | Leslie | Yepp | 8125552344 | Pawnee | 39.170344 | -86.536632 |
2 | Ron | Swandaughter | 8125559347 | Pawnee | NULL | NULL |
3 | Lauren | Burhug | NULL | Pawnee | NULL | NULL |
Hobbies
ID | user_id | hobby |
---|---|---|
10 | 1 | scrapbooking |
11 | 1 | eating waffles |
12 | 1 | working |
13 | 2 | woodworking |
14 | 2 | fishing |
15 | 3 | soccer |
We have two options for storing information about Lauren's school. We
can choose to add a column to the existing Users table, or we can create
a new table. Let's say we choose to add a column named "school" to the
Users table. Depending on our access rights to the database, we may need
to talk to the DBA and convince them to add the field. Most likely, the
database will need to be taken down, the "school" column will need to be
added, NULL values will be stored in every row in the Users table where
a user does not have a school, and the database will need to be brought
back up.
Let's examine how we can store Lauren's information in MongoDB.
Users
1 { 2 "_id": 3, 3 "first_name": "Lauren", 4 "last_name": "Burhug", 5 "city": "Pawnee", 6 "hobbies": ["soccer"], 7 "school": "Pawnee Elementary" 8 }
As you can see above, we've added a new field named "school" to Lauren's
document. We do not need to make any modifications to Leslie's document
or Ron's document when we add the new "school" field to Lauren's
document. MongoDB has a flexible schema, so every document in a
collection does not need to have the same fields.
For those of you with years of experience using SQL databases, you might
be starting to panic at the idea of a flexible schema. (I know I started
to panic a little when I was introduced to the idea.)
Don't panic! This flexibility can be hugely valuable as your
application's requirements evolve and change.
MongoDB provides schema
validation so
you can lock down your schema as much or as little as you'd like when
you're ready.
Now that we've compared how you model data in SQL and MongoDB, let's be a bit more explicit with the terminology. Let's map terms and concepts from SQL to MongoDB.
Row ⇒ Document
A row maps roughly to a document.
Depending on how you've normalized your data, rows across several tables could map to a single document. In our examples above, we saw that rows for Leslie in the
A row maps roughly to a document.
Depending on how you've normalized your data, rows across several tables could map to a single document. In our examples above, we saw that rows for Leslie in the
Users
, Hobbies
, and JobHistory
tables mapped to a single document.Column ⇒ Field
A column maps roughly to a field. For example, when we modeled Leslie's data, we had a
A column maps roughly to a field. For example, when we modeled Leslie's data, we had a
first_name
column in the Users
table and a first_name
field in a User document.Table ⇒ Collection
A table maps roughly to a collection. Recall that a collection is a group of documents. Continuing with our example above, our
A table maps roughly to a collection. Recall that a collection is a group of documents. Continuing with our example above, our
Users
table maps to our Users
collection.Database ⇒ Database
The term
The term
database
is used fairly similarly in both SQL and MongoDB.
Groups of tables are stored in SQL databases just as groups of
collections are stored in MongoDB databases.Index ⇒ Index
Indexes provide fairly similar functionality in both SQL and MongoDB. Indexes are data structures that optimize queries. You can think of them like an index that you'd find in the back of a book; indexes tell the database where to look for specific pieces of information. Without an index, all information in a table or collection must be searched.
Indexes provide fairly similar functionality in both SQL and MongoDB. Indexes are data structures that optimize queries. You can think of them like an index that you'd find in the back of a book; indexes tell the database where to look for specific pieces of information. Without an index, all information in a table or collection must be searched.
New MongoDB users often forget how much indexes can impact performance.
If you have a query that is taking a long time to run, be sure you have
an index to support it. For example, if we know we will be commonly
searching for users by first or last name, we should add a text index on
the first and last name fields.
Remember: indexes slow down write performance but speed up read
performance. For more information on indexes including the types of
indexes that MongoDB supports, see the MongoDB
Manual.
View ⇒ View
Views are fairly similar in both SQL and MongoDB. In MongoDB, a view is defined by an aggregation pipeline. The results of the view are not stored—they are generated every time the view is queried.
Views are fairly similar in both SQL and MongoDB. In MongoDB, a view is defined by an aggregation pipeline. The results of the view are not stored—they are generated every time the view is queried.
MongoDB added support for On-Demand Materialized Views in version 4.2.
To learn more, see the MongoDB
Manual.
Join ⇒ Embedding
When you use SQL databases, joins are fairly common. You normalize your data to prevent data duplication, and the result is that you commonly need to join information from multiple tables in order to perform a single operation in your application
When you use SQL databases, joins are fairly common. You normalize your data to prevent data duplication, and the result is that you commonly need to join information from multiple tables in order to perform a single operation in your application
In MongoDB, we encourage you to model your data differently. Our rule of
thumb is Data that is accessed together should be stored together. If
you'll be frequently creating, reading, updating, or deleting a chunk of
data together, you should probably be storing it together in a document
rather than breaking it apart across several documents.
You can use embedding to model data that you may have broken out into separate tables when using SQL. When we modeled Leslie's data for MongoDB earlier, we saw that we embedded her job history in her User document instead of creating a separate
JobHistory
document.For more information, see the MongoDB Manual's pages on modeling one-to-one relationships with embedding and modeling one-to-many relationships with embedding.
Join ⇒ Database References
As we discussed in the previous section, embedding is a common solution for modeling data in MongoDB that you may have split across one or more tables in a SQL database.
As we discussed in the previous section, embedding is a common solution for modeling data in MongoDB that you may have split across one or more tables in a SQL database.
However, sometimes embedding does not make sense. Let's say we wanted to
store information about our Users' employers like their names,
addresses, and phone numbers. The number of Users that could be
associated with an employer is unbounded. If we were to embed
information about an employer in a
User
document, the employer data
could be replicated hundreds or perhaps thousands of times. Instead, we
can create a new Employers
collection and create a database
reference between User
documents and Employer
documents.For more information on modeling one-to-many relationships with
database references, see the MongoDB
Manual.
Left Outer Join ⇒ $lookup (Aggregation Pipeline)
When you need to pull all of the information from one table and join it with any matching information in a second table, you can use a left outer join in SQL.
When you need to pull all of the information from one table and join it with any matching information in a second table, you can use a left outer join in SQL.
MongoDB has a stage similar to a left outer join that you can use with
the aggregation framework.
For those not familiar with the aggregation framework, it allows you to
analyze your data in real-time. Using the framework, you can create an
aggregation pipeline that consists of one or more stages. Each stage
transforms the documents and passes the output to the next stage.
$lookup is an aggregation framework stage that allows you to perform a
left outer join to an unsharded collection in the same database.
MongoDB University has a fantastic free course on the aggregation
pipeline that will walk you in detail through using
$lookup
: MongoDB Aggregation.Recursive Common Table Expressions ⇒ $graphLookup (Aggregation Pipeline)*
When you need to query hierarchical data like a company's organization
chart in SQL, we can use recursive common table expressions.
MongoDB provides an aggregation framework stage that is similar to
recursive common table expressions:
$graphLookup
. $graphLookup
performs a recursive search on a collection.For more information, see the MongoDB Manual's page on $graphLookup and MongoDB University's free course on the aggregation
framework.
Multi-Record ACID Transaction ⇒ Multi-Document ACID Transaction
Finally, let's talk about ACID transactions. Transactions group database operations together so they all succeed or none succeed. In SQL, we call these multi-record ACID transactions. In MongoDB, we call these multi-document ACID transactions.
Finally, let's talk about ACID transactions. Transactions group database operations together so they all succeed or none succeed. In SQL, we call these multi-record ACID transactions. In MongoDB, we call these multi-document ACID transactions.
We've just covered a lot of concepts and terminology. The three term
mappings I recommend you internalize as you get started using MongoDB
are:
- Rows map to documents.
- Columns map to fields.
- Tables map to collections.
I created the following diagram you can use as a reference in the future
as you begin your journey using MongoDB.
Be on the lookout for the next post in this series where we'll discuss
the top four reasons you should use MongoDB.