If your queries rely on regex matching, you can improve the performance and efficiency of your query by creating a MongoDB Search index and running a $search aggregation pipeline stage. $regex is inefficient because it cannot always make use of indexes whereas MongoDB Search indexes significantly improve the performance of your queries and offer more options for customizing query parameters.
This page describes some common MongoDB Search index and query configurations for $regex use cases.
Examples
The examples use an sample_mflix.movies namespace. To run the sample queries, add this collection to your cluster or use the pre-configured snapshots in the MongoDB Search Playground. The sample queries demonstrate how to use $search instead of $regex for the following use cases:
If your application frequently queries for string values that start with a set of characters or prefix, it might use the $regex option ^, which searches from the start of the string value, and i, which makes it case-insensitive.
Instead, we recommend MongoDB Search queries that use the $search aggregation pipeline stage. The following queries search for movie titles that start with the prefix back.
➤ Try this in the MongoDB Search Playground.
$regex Queries | $search Query | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |
To run this $search query, create a MongoDB Search
index similar to the following:
{ "mappings": { "dynamic": false, "fields": { "title": [ { "type": "string", "analyzer": "autocomplete-search", "searchAnalyzer": "lucene.standard" } ] } }, "analyzers": [ { "name": "autocomplete-search", "tokenizer": { "type": "standard" }, "tokenFilters": [ { "type": "lowercase" }, { "type": "edgeGram", "minGram": 4, "maxGram": 10 } ] } ] }
This index definition indexes the title field in the movies collection as the string type that uses the autocomplete-search custom analyzer for indexed fields and the lucene.standard analyzer for queries. The custom analyzer named autocomplete-search as the analyzer for indexed fields and the lucene.standard as the searchAnalyzer for queries. The custom analyzer named
lowercasetoken filter to transform all characters to lower case to support case-insensitive queriesedgeGramtoken filter to create tokens of between4and10characters in length
Note
This custom analyzer only supports words up to ten characters in length. If you expect words and queries longer than ten characters, increase the maxGram value. We don't recommend setting a maxGram value higher than fifteen because higher values increase the size of the index and might impact performance and availability.
If your application frequently queries for strings that are present anywhere in the field, you might run $regex queries, which check every document and return all matches in no particular order.
Instead, we recommend MongoDB Search queries that use the $search aggregation pipeline stage. The following queries search for movie titles that contain the term park anywhere in the title field.
➤ Try this in the MongoDB Search Playground.
$regex Query | $search Query | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |
To run this $search query, create a MongoDB Search
index with the following definition:
{ "mappings": { "dynamic": false, "fields": { "title": { "type": "string", "analyzer": "contains", "searchAnalyzer": "lucene.standard" } } }, "analyzers": [ { "name": "contains", "tokenizer": { "type": "standard" }, "tokenFilters": [ { "type": "lowercase" }, { "type": "reverse" }, { "type": "edgeGram", "minGram": 4, "maxGram": 15 }, { "type": "reverse" } ] } ] }
This index definition indexes the title field in the movies collection as the string type using a custom analyzer named contains that applies the following:
standardtokenizer to split the words by whitespace or punctuation.lowercasetoken filter to transform the letters to lowercase to support case-insensitive queries.reversetoken filter (twice) to reverse the words to support efficient unanchored queries.edgeGramtoken filter to create tokens of between four and fifteen characters in length.
Note
This custom analyzer only supports words up to fifteen characters in length. If you have words longer than fifteen characters, increase the maxGram value. It is not recommended to set a maxGram value higher than fifteen because higher values increase the size of the index and might impact performance and availability.
If your application frequently queries string fields values that end with a set of characters or suffix, you might run regex queries with the $regex option $, which searches for the end of the string value, and the option i, which makes it case-insensitive.
Instead, we recommend MongoDB Search queries that use the $search aggregation pipeline stage. The following queries search for movie titles that end with the term ring.
➤ Try this in the MongoDB Search Playground.
$regex Queries | $search Query | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |
To run this $search query, create a MongoDB Search
index similar to the following:
{ "mappings": { "dynamic": false, "fields": { "title": [ { "type": "autocomplete", "minGrams": 4, "maxGrams": 10, "analyzer": "lucene.keyword", "tokenization": "rightEdgeGram" } ] } } }
This index definition indexes the title field using the:
The
autocompletetype with therightEdgeGramtokenization strategy to split the text into substrings or "grams" of between 4 (minimum) and 10 (maximum) characters in length, which supports partial searches starting from the end of the string.The
lucene.keywordanalyzer to ensure matches only at the end of the text, and not at the end of intermediary words. To find suffix matches on intermediary words, uselucene.standard.
Learn More
To learn more about MongoDB Search queries, see Queries and Indexes.
To learn more about regex queries in MongoDB, see $regex.
MongoDB University offers a free course on optimizing MongoDB Performance. To learn more, see Monitoring and Insights.