Exact Matches in Atlas Search: Beginners Guide

Erik Hatcher5 min read • Published Aug 19, 2022 • Updated Oct 09, 2024

Atlas Search

Rate this article

Contributors

Much of this article was contributed by a MongoDB Intern, Humayara Karim. Thanks for spending your summer with us!

Introduction

Search engines are powerful tools that users rely on when they're looking for information. They oftentimes rely on them to handle the misspelling of words through a feature called fuzzy matching. Fuzzy matching identifies text, string, and even queries that are very similar but not the same. This is very useful.

But a lot of the time, the search that is most useful is an exact match. I'm looking for a word, foobar, and I want foobar, not foobarr and not greenfoobart.

Luckily, Atlas Search has solutions for both fuzzy searches as well as exact matches. This tutorial will focus on the different ways users can achieve exact matches as well as the pros and cons of each. In fact, there are quite a few ways to achieve exact matches with Atlas Search.

(Let us count the) Ways to Exact Matching with Atlas Search

Just like the NYC subway system, there are many ways to get to the same destination, and not all of them are good. So let's talk about the various methods of doing exact match searches, and the pros and cons.

Here, we mean "exact matching" to mean literally exact, or roughly the same, values. Roughly exact could mean case insensitive matching, or matching regardless of whether diacritic marks are used or not, or even matching words regardless of being singular or plural.

The token field type

If you wanted to find an exact match for a string of text, the best field type to use is the token type, as this indexes a string field as a single term. The token type also has the ability to normalize to lowercase, for case insensitive matches. The token type works with the equals and in operators. Here's an index configuration to map a category field for exact value matching and sorting:

1 {
2   "mappings": {
3     "dynamic": true,
4     "fields":{
5       "category": [
6         {
7           "type": "token"
8         }
9       ]
10     }
11   }
12 }

Here's an example matching the category token field type using equals:

1 [
2   {
3     $search: {
4       equals: {
5         path: "category",
6         value: "Technology"
7       }
8     }
9   }
10 ]

To see that query in action, check it out in the search playground

Pros: This is as exact as it gets, with an option for case insensitive matching. As an added bonus, the token field type is sortable as well.

Cons: This is exact, or case insenstively exact - so there's no room for fuzziness with the token field type.

Atlas Search Index Analyzers

If you wanted to return matches that contain a specific word, the Standard Analyzer would be your go-to as it divides texts based on word-boundaries. It's crucial to first identify and understand the appropriate analyzer you will need based on your use case. This is where MongoDB makes our life easier because you can find all the built-in analyzers Atlas Search supports and their purposes all in one place, as shown below:

Pros: Users can also make custom and multi analyzers to cater to specific application needs. There are examples on the MongoDB Developer Community Forums demonstrating folks doing this in the wild.

Here's some code for case insensitive search using a custom analyzer and with the keyword tokenizer and a lowercase token filter:

1     {
2       "charFilters": [],
3       "name": "search_keyword_lowercaser",
4       "tokenFilters": [
5         {
6           "type": "lowercase"
7         }
8       ],
9       "tokenizer": {
10         "type": "keyword"
11       }
12     }
13   ]

Or, a lucene.keyword analyzer for single-word exact match queries and phrase query for multi-word exact match queries here:

1 {
2   $search: {
3      "index": "movies_search_index"
4      "phrase": {
5        "query": "Red Robin",
6        "path": "title"
7      }
8   }
9 }

And here's an example of diacritic insenstivity.

Cons: When an analyzer is involved, queries operate on the terms from the analyzed string; matching depends on the combination of the analyzed terms and the type of query constructed.

Text Operator

As the name suggests, this operator allows users to search text. Here is how the syntax for the text operator looks:

1 {
2   $search: {
3     "index": <index name>, // optional, defaults to "default"
4     "text": {
5       "query": "<search-string>",
6       "path": "<field-to-search>",
7       "fuzzy": <options>,
8       "score": <options>,
9       "synonyms": "<synonyms-mapping-name>"
10     }
11   }
12 }

If you're searching for a single term and want to use full text search to do it, this is the operator for you. Simple, effective, no frills. It's simplicity means it's hard to mess up, and you can use it in complex use cases without worrying. You can also layer the text operator with other items.

The text operator also supports synonyms and score matching as shown here:

1 db.movies.aggregate([
2   {
3     $search: {
4       "text": {
5         "path": "title",
6         "query": "automobile",
7         "synonyms": "transportSynonyms"
8       }
9     }
10   },
11   {
12     $limit: 10
13   },
14   {
15     $project: {
16       "_id": 0,
17       "title": 1,
18       "score": { $meta: "searchScore" }
19     }
20   }
21 ])

1 db.movies.aggregate([
2   {
3     $search: {
4       "text": {
5         "query": "Helsinki",
6         "path": "plot"
7       }
8     }
9   },
10   {
11     $project: {
12       plot: 1,
13       title: 1,
14       score: { $meta: "searchScore" }
15     }
16   }
17 ])

Pros: Straightforward, easy to use.

Cons: Matches require all, or any, of the terms to match. There's no middle ground of more than one must match but not necessarily all of the terms.

The Phrase Operator

The phrase operator can get exact match queries on multiple words (terms) in a field. But why use a phrase operator instead of text? It’s because the phrase operator searches for an ordered sequence of terms with the help of an analyzer defined in the index configuration. Take a look at this example, where we want to search the phrases “the man” and “the moon” in a movie titles collection:

1 db.movies.aggregate([
2 {
3   "$search": {
4     "phrase": {
5       "path": "title",
6       "query": ["the man", "the moon"]
7     }
8   }
9 },
10 { $limit: 10 },
11 {
12   $project: {
13     "_id": 0,
14     "title": 1,
15     score: { $meta: "searchScore" }
16   }
17 }
18 ])

As you can see, the query returns all the results the contain ordered sequence terms “the man” and “the moon.”

1 { "title" : "The Man in the Moon", "score" : 4.500046730041504 }
2 { "title" : "Shoot the Moon", "score" : 3.278003215789795 }
3 { "title" : "Kick the Moon", "score" : 3.278003215789795 }
4 { "title" : "The Man", "score" : 2.8860299587249756 }
5 { "title" : "The Moon and Sixpence", "score" : 2.8754563331604004 }
6 { "title" : "The Moon Is Blue", "score" : 2.8754563331604004 }
7 { "title" : "Racing with the Moon", "score" : 2.8754563331604004 }
8 { "title" : "Mountains of the Moon", "score" : 2.8754563331604004 }
9 { "title" : "Man on the Moon", "score" : 2.8754563331604004 }
10 { "title" : "Castaway on the Moon", "score" : 2.8754563331604004 }

Pros: There are quite a few options you can use with phrase that gives users the flexibility to customize the exact phrases they want to match.

Cons: All of the terms in the query are required to match, in that specified order.

Highlighting

Although this feature isn't about matching, it's worth highlighting. (See what I did there?!)

I love this feature. It's super useful. Highlight allows users to visually see exact matches. This option also allows users to visually return search terms in their original context. In your application UI, the highlight feature looks like so:

If you’re interested in learning how to build an application like this, here is a step by step tutorial visually showing Atlas Search highlights with JavaScript and HTML.

Pros: Aesthetically, this feature enhances user search experience because users can easily see what they are searching for in a given text.

Cons: It can be costly if passages are long because a lot more RAM will be needed to hold the data.

Conclusion

Ultimately, there are many ways to achieve exact('ish) matches with Atlas Search. Your best approach is to skim through a few of the tutorials in the documentation and take a look at the Atlas search section here in the DevCenter and then tinker with it.

Rate this article

Code Example

Blogue

Sep 11, 2024 | 1 min read

Podcast

Database Automation Series - Automated Indexes

Oct 01, 2024 | 23 min

Article

Using SuperDuperDB to Accelerate AI Development on MongoDB Atlas Vector Search

Sep 18, 2024 | 6 min read

News & Announcements

Deprecating MongoDB Atlas GraphQL and Hosting Services

Feb 10, 2025 | 2 min read

Contributors
Introduction
(Let us count the) Ways to Exact Matching with Atlas Search
The token field type
Atlas Search Index Analyzers
Text Operator
The Phrase Operator
Highlighting
Conclusion

Atlas