If your queries rely on regex matching, you can improve the performance
and efficiency of your query by creating a MongoDB Search index and running a $search aggregation pipeline
stage. $regex is inefficient because it cannot always make use
of indexes whereas MongoDB Search indexes
significantly improve the performance of your queries and offer more
options for customizing query parameters.
This page describes some common MongoDB Search index and query configurations for
$regex use cases.
Examples
The examples use an sample_mflix.movies namespace. To run the
sample queries, add this collection to your cluster
or use the pre-configured snapshots in the MongoDB Search Playground. The sample
queries demonstrate how to use $search instead of
$regex for the following use cases:
If your application frequently queries for string values that start with
a set of characters or prefix, it might use the $regex option
^, which searches from the start of the string value, and i, which
makes it case-insensitive.
Instead, we recommend MongoDB Search queries that
use the $search aggregation pipeline stage. The following
queries search for movie titles that start with the prefix back.
➤ Try this in the MongoDB Search Playground.
$regex Queries | $search Query | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |
To run this $search query, create a MongoDB Search
index similar to the following:
{ "mappings": { "dynamic": false, "fields": { "title": [ { "type": "string", "analyzer": "autocomplete-search", "searchAnalyzer": "lucene.standard" } ] } }, "analyzers": [ { "name": "autocomplete-search", "tokenizer": { "type": "standard" }, "tokenFilters": [ { "type": "lowercase" }, { "type": "edgeGram", "minGram": 4, "maxGram": 10 } ] } ] }
This index definition indexes the title field in the movies
collection as the string type that uses the
autocomplete-search custom analyzer for indexed fields and the
lucene.standard analyzer for queries. The custom analyzer named
autocomplete-search as the analyzer for indexed fields and the lucene.standard
as the searchAnalyzer for queries. The custom analyzer named
lowercasetoken filter to transform all characters to lower case to support case-insensitive queriesedgeGramtoken filter to create tokens of between4and10characters in length
Note
This custom analyzer only supports words up to ten characters in
length. If you expect words and queries longer than ten characters,
increase the maxGram value. We don't recommend setting a
maxGram value higher than fifteen because higher values
increase the size of the index and might impact performance and
availability.
If your application frequently queries for strings that are present
anywhere in the field, you might run $regex queries, which
check every document and return all matches in no particular order.
Instead, we recommend MongoDB Search queries that
use the $search aggregation pipeline stage. The following
queries search for movie titles that contain the term park
anywhere in the title field.
➤ Try this in the MongoDB Search Playground.
$regex Query | $search Query | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |
To run this $search query, create a MongoDB Search
index with the following definition:
{ "mappings": { "dynamic": false, "fields": { "title": { "type": "string", "analyzer": "contains", "searchAnalyzer": "lucene.standard" } } }, "analyzers": [ { "name": "contains", "tokenizer": { "type": "standard" }, "tokenFilters": [ { "type": "lowercase" }, { "type": "reverse" }, { "type": "edgeGram", "minGram": 4, "maxGram": 15 }, { "type": "reverse" } ] } ] }
This index definition indexes the title field in the movies collection
as the string type using a custom analyzer named contains that applies
the following:
standardtokenizer to split the words by whitespace or punctuation.lowercasetoken filter to transform the letters to lowercase to support case-insensitive queries.reversetoken filter (twice) to reverse the words to support efficient unanchored queries.edgeGramtoken filter to create tokens of between four and fifteen characters in length.
Note
This custom analyzer only supports words up to fifteen characters in
length. If you have words longer than fifteen characters, increase the
maxGram value. It is not recommended to set a maxGram value higher
than fifteen because higher values increase the size of the index and might
impact performance and availability.
If your application frequently queries string fields values that end
with a set of characters or suffix, you might run regex queries
with the $regex option
$, which searches for the end of the string value, and the option
i, which makes it case-insensitive.
Instead, we recommend MongoDB Search queries that
use the $search aggregation pipeline stage. The following
queries search for movie titles that end with the term ring.
➤ Try this in the MongoDB Search Playground.
$regex Queries | $search Query | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |
To run this $search query, create a MongoDB Search
index similar to the following:
{ "mappings": { "dynamic": false, "fields": { "title": [ { "type": "autocomplete", "minGrams": 4, "maxGrams": 10, "analyzer": "lucene.keyword", "tokenization": "rightEdgeGram" } ] } } }
This index definition indexes the title field using the:
The
autocompletetype with therightEdgeGramtokenization strategy to split the text into substrings or "grams" of between 4 (minimum) and 10 (maximum) characters in length, which supports partial searches starting from the end of the string.The
lucene.keywordanalyzer to ensure matches only at the end of the text, and not at the end of intermediary words. To find suffix matches on intermediary words, uselucene.standard.
Learn More
To learn more about MongoDB Search queries, see Queries and Indexes.
To learn more about regex queries in MongoDB, see $regex.
MongoDB University offers a free course on optimizing MongoDB Performance. To learn more, see Monitoring and Insights.