An Introduction to Indexes for MongoDB Atlas Search
Rate this tutorial
Imagine reading a long book like "A Song of Fire and Ice," "The Lord of
the Rings," or "Harry Potter." Now imagine that there was a specific
detail in one of those books that you needed to revisit. You wouldn't
want to search every page in those long books to find what you were
looking for. Instead, you'd want to use some sort of book index to help
you quickly locate what you were looking for. This same concept of
indexing content within a book can be carried to MongoDB Atlas
Search with search indexes.
Atlas Search makes it easy to build fast, relevant, full-text search on
top of your data in the cloud. It's fully integrated, fully managed, and
available with every MongoDB Atlas cluster running MongoDB version 4.2
or higher.
Correctly defining your indexes is important because they are
responsible for making sure that you're receiving relevant results when
using Atlas Search. There is no one-size-fits-all solution and different
indexes will bring you different benefits.
In this tutorial, we're going to get a gentle introduction to creating
indexes that will be valuable for various full-text search use cases.
Before we get too invested in this introduction, it's important to note
that Atlas Search uses Apache Lucene. This
means that search indexes are not unique to Atlas Search and if you're
already comfortable with Apache Lucene, your existing knowledge of
indexing will transfer. However, the tutorial could act as a solid
refresher regardless.
Before we start creating indexes, we should probably define what our
data model will be for the example. In an effort to cover various
indexing scenarios, the data model will be complex.
Take the following for example:
1 { 2 "_id": "cea29beb0b6f7b9187666cbed2f070b3", 3 "name": "Pikachu", 4 "pokedex_entry": { 5 "red": "When several of these Pokemon gather, their electricity could build and cause lightning storms.", 6 "yellow": "It keeps its tail raised to monitor its surroundings. If you yank its tail, it will try to bite you." 7 }, 8 "moves": [ 9 { 10 "name": "Thunder Shock", 11 "description": "A move that may cause paralysis." 12 }, 13 { 14 "name": "Thunder Wave", 15 "description": "An electrical attack that may paralyze the foe." 16 } 17 ], 18 "location": { 19 "type": "Point", 20 "coordinates": [-127, 37] 21 } 22 }
The above example document is around Pokemon, but Atlas Search can be
used on whatever documents are part of your application.
Example documents like the one above allow us to use text search, geo
search, and potentially others. For each of these different search
scenarios, the index might change.
When we create an index for Atlas Search, it is created at the
collection level.
There are two ways to map fields within a document when creating an
index:
- Dynamic Mappings
- Static Mappings
If your document schema is still changing or your use case doesn't allow
for it to be rigidly defined, you might want to choose to dynamically
map your document fields. A dynamic mapping will automatically assign
fields when new data is inserted.
Take the following for example:
1 { 2 "mappings": { 3 "dynamic": true 4 } 5 }
The above JSON represents a valid index. When you add it to a
collection, you are essentially mapping every field that exists in the
documents and any field that might exist in the future.
We can do a simple search using this index like the following:
1 db.pokemon.aggregate([ 2 { 3 "$search": { 4 "text": { 5 "query": "thunder", 6 "path": ["moves.name"] 7 } 8 } 9 } 10 ]);
We didn't explicitly define the fields for this index, but attempting to
search for "thunder" within the
moves
array will give us matching
results based on our example data.To be clear, dynamic mappings can be applied at the document level or
the field level. At the document level, a dynamic mapping automatically
indexes all common data types. At both levels, it automatically indexes
all new and existing data.
While convenient, having a dynamic mapping index on all fields of a
document comes at a cost. These indexes will take up more disk space and
may be less performant.
The alternative is to use a static mapping, in which case you specify
the fields to map and what type of fields they are. Take the following
for example:
1 { 2 "mappings": { 3 "dynamic": false, 4 "fields": { 5 "name": { 6 "type": "string" 7 } 8 } 9 } 10 }
In the above example, the only field within our document that is being
indexed is the
name
field.The following search query would return results:
1 db.pokemon.aggregate([ 2 { 3 "$search": { 4 "text": { 5 "query": "pikachu", 6 "path": ["name"] 7 } 8 } 9 } 10 ]);
If we try to search on any other field within our document, we won't end
up with results because those fields are not statically mapped nor is
the document schema dynamically mapped.
There is, however, a way to get the best of both worlds if we need it.
Take the following which uses static and dynamic mappings:
1 { 2 "mappings": { 3 "dynamic": false, 4 "fields": { 5 "name": { 6 "type": "string" 7 }, 8 "pokedex_entry": { 9 "type": "document", 10 "dynamic": true 11 } 12 } 13 } 14 }
In the above example, we are still using a static mapping for the
name
field. However, we are using a dynamic mapping on the pokedex_entry
field. The pokedex_entry
field is an object so any field within that
object will get the dynamic mapping treatment. This means all sub-fields
are automatically mapped, as well as any new fields that might exist in
the future. This could be useful if you want to specify what top level
fields to map, but map all fields within a particular object as well.Take the following search query as an example:
1 db.pokemon.aggregate([ 2 { 3 "$search": { 4 "text": { 5 "query": "pokemon", 6 "path": ["name", "pokedex_entry.red"] 7 } 8 } 9 } 10 ]);
The above search will return results if "pokemon" appears in the
name
field or the red
field within the pokedex_entry
object.When using a static mapping, you need to specify a type for the field or
have
dynamic
set to true on the field. If you only specify a type,
dynamic
defaults to false. If you only specify dynamic
as true, then
Atlas Search can automatically default certain field types (e.g.,
string, date, number).With the basic dynamic versus static mapping discussion out of the way
for MongoDB Atlas Search indexes, now we can focus on more complicated
or specific scenarios.
Let's first take a look at what our fully mapped index would look like
for the document in our example:
1 { 2 "mappings": { 3 "dynamic": false, 4 "fields": { 5 "name": { 6 "type": "string" 7 }, 8 "moves": { 9 "type": "document", 10 "fields": { 11 "name": { 12 "type": "string" 13 }, 14 "description": { 15 "type": "string" 16 } 17 } 18 }, 19 "pokedex_entry": { 20 "type": "document", 21 "fields": { 22 "red": { 23 "type": "string" 24 }, 25 "yellow": { 26 "type": "string" 27 } 28 } 29 }, 30 "location": { 31 "type": "geo" 32 } 33 } 34 } 35 }
In the above example, we are using a static mapping for every field
within our documents. An interesting thing to note is the
moves
array
and the pokedex_entry
object in the example document. Even though one
is an array and the other is an object, the index is a document
for
both. While writing searches isn't the focus of this tutorial, searching
an array and object would be similar using dot notation.Had any of the fields been nested deeper within the document, the same
approach would be applied. For example, we could have something like
this:
1 { 2 "mappings": { 3 "dynamic": false, 4 "fields": { 5 "pokedex_entry": { 6 "type": "document", 7 "fields": { 8 "gameboy": { 9 "type": "document", 10 "fields": { 11 "red": { 12 "type": "string" 13 }, 14 "yellow": { 15 "type": "string" 16 } 17 } 18 } 19 } 20 } 21 } 22 } 23 }
In the above example, the
pokedex_entry
field was changed slightly to
have another level of objects. Probably not a realistic way to model
data for this dataset, but it should get the point across about mapping
deeper nested fields.Up until now, each of the indexes have only had their types defined in
the mapping. The default options are currently being applied to every
field. Options are a way to refine the index further based on your data
to ultimately get more relevant search results. Let's play around with
some of the options within the mappings of our index.
Most of the fields in our example use the
string
data type, so there's so much more we can do using options. Let's see
what some of those are.
1 { 2 "mappings": { 3 "dynamic": false, 4 "fields": { 5 "name": { 6 "type": "string", 7 "searchAnalyzer": "lucene.spanish", 8 "ignoreAbove": 3000 9 } 10 } 11 } 12 }
The 3000 characters is just a random number for this example, but adding
a limit, depending on your use case, could improve performance or the
index size.
In a future tutorial, we're going to explore the finer details in
regards to what the search analyzers are and what they can accomplish.
These are just some of the available options for the string data type.
Each data type will have its own set of options. If you want to use the
default for any particular option, it does not need to be explicitly
added to the mapped field.
You just received what was hopefully a gentle introduction to creating
indexes to be used in Atlas Search. To use Atlas Search, you will need
at least one index on your collection, even if it is a default dynamic
index. However, if you know your schema and are able to create static
mappings, it is usually the better way to go to fine-tune relevancy and
performance.
To learn more about Atlas Search indexes and the various data types,
options, and analyzers available, check out the official
documentation.
To learn how to build more on Atlas Search, check out my other
tutorials: Building an Autocomplete Form Element with Atlas Search and
JavaScript
and Visually Showing Atlas Search Highlights with JavaScript and
HTML.
Have a question or feedback about this tutorial? Head to the MongoDB
Community Forums and let's chat!