Exact Matches in Atlas Search: Beginners Guide
Rate this article
Much of this article was contributed by a MongoDB Intern, Humayara Karim. Thanks for spending your summer with us!
Search engines are powerful tools that users rely on when they're looking for information. They oftentimes rely on them to handle the misspelling of words through a feature called fuzzy matching. Fuzzy matching identifies text, string, and even queries that are very similar but not the same. This is very useful.
But a lot of the time, the search that is most useful is an exact match. I'm looking for a word,
foobar
, and I want foobar
, not foobarr
and not greenfoobart
.Luckily, Atlas Search has solutions for both fuzzy searches as well as exact matches. This tutorial will focus on the different ways users can achieve exact matches as well as the pros and cons of each. In fact, there are quite a few ways to achieve exact matches with Atlas Search.
Just like the NYC subway system, there are many ways to get to the same destination, and not all of them are good. So let's talk about the various methods of doing exact match searches, and the pros and cons.
Here, we mean "exact matching" to mean literally exact, or roughly the same, values. Roughly exact could mean case insensitive matching, or matching regardless of whether diacritic marks are used or not, or even matching words regardless of being singular or plural.
If you wanted to find an exact match for a string of text, the best field type to use is the token type, as this indexes a string field as a single term. The
token
type also has the ability to normalize to lowercase, for case insensitive matches. The token
type works with the equals
and in
operators. Here's an index configuration to map a category
field for exact value matching and sorting:1 { 2 "mappings": { 3 "dynamic": true, 4 "fields":{ 5 "category": [ 6 { 7 "type": "token" 8 } 9 ] 10 } 11 } 12 }
Here's an example matching the
category
token field type using equals
:1 [ 2 { 3 $search: { 4 equals: { 5 path: "category", 6 value: "Technology" 7 } 8 } 9 } 10 ]
To see that query in action, check it out in the search playground
Pros: This is as exact as it gets, with an option for case insensitive matching. As an added bonus, the
token
field type is sortable as well.Cons: This is exact, or case insenstively exact - so there's no room for fuzziness with the
token
field type.If you wanted to return matches that contain a specific word, the Standard Analyzer would be your go-to as it divides texts based on word-boundaries. It's crucial to first identify and understand the appropriate analyzer you will need based on your use case. This is where MongoDB makes our life easier because you can find all the built-in analyzers Atlas Search supports and their purposes all in one place, as shown below:
Pros: Users can also make custom and multi analyzers to cater to specific application needs. There are examples on the MongoDB Developer Community Forums demonstrating folks doing this in the wild.
Here's some code for case insensitive search using a custom analyzer and with the keyword tokenizer and a lowercase token filter:
1 { 2 "charFilters": [], 3 "name": "search_keyword_lowercaser", 4 "tokenFilters": [ 5 { 6 "type": "lowercase" 7 } 8 ], 9 "tokenizer": { 10 "type": "keyword" 11 } 12 } 13 ]
Or, a lucene.keyword analyzer for single-word exact match queries and phrase query for multi-word exact match queries here:
1 { 2 $search: { 3 "index": "movies_search_index" 4 "phrase": { 5 "query": "Red Robin", 6 "path": "title" 7 } 8 } 9 }
Cons: When an analyzer is involved, queries operate on the terms from the analyzed string; matching depends on the combination of the analyzed terms and the type of query constructed.
As the name suggests, this operator allows users to search text.
Here is how the syntax for the text operator looks:
1 { 2 $search: { 3 "index": <index name>, // optional, defaults to "default" 4 "text": { 5 "query": "<search-string>", 6 "path": "<field-to-search>", 7 "fuzzy": <options>, 8 "score": <options>, 9 "synonyms": "<synonyms-mapping-name>" 10 } 11 } 12 }
If you're searching for a single term and want to use full text search to do it, this is the operator for you. Simple, effective, no frills. It's simplicity means it's hard to mess up, and you can use it in complex use cases without worrying. You can also layer the text operator with other items.
1 db.movies.aggregate([ 2 { 3 $search: { 4 "text": { 5 "path": "title", 6 "query": "automobile", 7 "synonyms": "transportSynonyms" 8 } 9 } 10 }, 11 { 12 $limit: 10 13 }, 14 { 15 $project: { 16 "_id": 0, 17 "title": 1, 18 "score": { $meta: "searchScore" } 19 } 20 } 21 ])
1 db.movies.aggregate([ 2 { 3 $search: { 4 "text": { 5 "query": "Helsinki", 6 "path": "plot" 7 } 8 } 9 }, 10 { 11 $project: { 12 plot: 1, 13 title: 1, 14 score: { $meta: "searchScore" } 15 } 16 } 17 ])
Pros: Straightforward, easy to use.
Cons: Matches require all, or any, of the terms to match. There's no middle ground of more than one must match but not necessarily all of the terms.
The
phrase
operator can get exact match queries on multiple words (terms) in a field. But why use a phrase operator instead of text
? It’s because the phrase operator searches for an ordered sequence of terms with the help of an analyzer defined in the index configuration. Take a look at this example, where we want to search the phrases “the man” and “the moon” in a movie titles collection:1 db.movies.aggregate([ 2 { 3 "$search": { 4 "phrase": { 5 "path": "title", 6 "query": ["the man", "the moon"] 7 } 8 } 9 }, 10 { $limit: 10 }, 11 { 12 $project: { 13 "_id": 0, 14 "title": 1, 15 score: { $meta: "searchScore" } 16 } 17 } 18 ])
As you can see, the query returns all the results the contain ordered sequence terms “the man” and “the moon.”
1 { "title" : "The Man in the Moon", "score" : 4.500046730041504 } 2 { "title" : "Shoot the Moon", "score" : 3.278003215789795 } 3 { "title" : "Kick the Moon", "score" : 3.278003215789795 } 4 { "title" : "The Man", "score" : 2.8860299587249756 } 5 { "title" : "The Moon and Sixpence", "score" : 2.8754563331604004 } 6 { "title" : "The Moon Is Blue", "score" : 2.8754563331604004 } 7 { "title" : "Racing with the Moon", "score" : 2.8754563331604004 } 8 { "title" : "Mountains of the Moon", "score" : 2.8754563331604004 } 9 { "title" : "Man on the Moon", "score" : 2.8754563331604004 } 10 { "title" : "Castaway on the Moon", "score" : 2.8754563331604004 }
Pros: There are quite a few options you can use with phrase that gives users the flexibility to customize the exact phrases they want to match.
Cons: All of the terms in the query are required to match, in that specified order.
Although this feature isn't about matching, it's worth highlighting. (See what I did there?!)
I love this feature. It's super useful. Highlight allows users to visually see exact matches. This option also allows users to visually return search terms in their original context. In your application UI, the highlight feature looks like so:
If you’re interested in learning how to build an application like this, here is a step by step tutorial visually showing Atlas Search highlights with JavaScript and HTML.
Pros: Aesthetically, this feature enhances user search experience because users can easily see what they are searching for in a given text.
Cons: It can be costly if passages are long because a lot more RAM will be needed to hold the data.
Ultimately, there are many ways to achieve exact('ish) matches with Atlas Search. Your best approach is to skim through a few of the tutorials in the documentation and take a look at the Atlas search section here in the DevCenter and then tinker with it.