Use MongoDB Search Instead of Regex Queries

If your queries rely on regex matching, you can improve the performance and efficiency of your query by creating a MongoDB Search index and running a $search aggregation pipeline stage. $regex is inefficient because it cannot always make use of indexes whereas MongoDB Search indexes significantly improve the performance of your queries and offer more options for customizing query parameters.

This page describes some common MongoDB Search index and query configurations for $regex use cases.

Examples

The examples use an sample_mflix.movies namespace. To run the sample queries, add this collection to your cluster or use the pre-configured snapshots in the MongoDB Search Playground. The sample queries demonstrate how to use $search instead of $regex for the following use cases:

Prefix Queries

If your application frequently queries for string values that start with a set of characters or prefix, it might use the $regex option ^, which searches from the start of the string value, and i, which makes it case-insensitive.

Instead, we recommend MongoDB Search queries that use the $search aggregation pipeline stage. The following queries search for movie titles that start with the prefix back.

➤ Try this in the MongoDB Search Playground.

$regex Queries

$search Query

db.movies.find( { title: { $regex: /^back/i } }, { title: 1, _id: 0 } )  // Query 1
db.movies.find( { title: { $regex: "^back", $options: "i" } }, { title: 1, _id: 0 } )  // Query 2

[
  { title: 'Back to the Future' },
  { title: 'Back to School' },
  { title: 'Back to the Future Part II' },
  { title: 'Back to the USSR - takaisin Ryssiin' },
  { title: 'Back to the Future Part III' },
  { title: 'Backdraft' },
  { title: 'Backbeat' },
  { title: 'Backstage' },
  { title: 'Backdoor' },
  { title: 'Backstage' },
  { title: 'Back Soon' },
  { title: 'Backlight' },
  { title: 'Back to Stay' },
  { title: 'Back Issues: The Hustler Magazine Story' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "text": {
        "query": "back",
        "path": "title",
        "matchCriteria": "all"
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'Backdraft', score: 3.8287878036499023 },
  { title: 'Backbeat', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'Backdoor', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Way Back', score: 3.8287878036499023 },
  { title: '3 Backyards', score: 3.8287878036499023 },
  { title: 'Backlight', score: 3.8287878036499023 },
  { title: 'The Way Way Back', score: 3.8287878036499023 },
  { title: 'Back to the Future', score: 3.455096483230591 },
  { title: 'Back to School', score: 3.455096483230591 },
  { title: 'The Cat Came Back', score: 3.455096483230591 },
  { title: "Jack's Back", score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'T-Rex: Back to the Cretaceous', score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'No Turning Back', score: 3.455096483230591 },
  { title: "The Devil's Backbone", score: 3.455096483230591 }
]
Type "it" for more

To run this $search query, create a MongoDB Search index similar to the following:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "string",
          "analyzer": "autocomplete-search",
          "searchAnalyzer": "lucene.standard"
        }
      ]
    }
  },
  "analyzers": [
    {
      "name": "autocomplete-search",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 10
        }
      ]
    }
  ]
}

This index definition indexes the title field in the movies collection as the string type that uses the autocomplete-search custom analyzer for indexed fields and the lucene.standard analyzer for queries. The custom analyzer named autocomplete-search as the analyzer for indexed fields and the lucene.standard as the searchAnalyzer for queries. The custom analyzer named

lowercase token filter to transform all characters to lower case to support case-insensitive queries
edgeGram token filter to create tokens of between 4 and 10 characters in length

Note

This custom analyzer only supports words up to ten characters in length. If you expect words and queries longer than ten characters, increase the maxGram value. We don't recommend setting a maxGram value higher than fifteen because higher values increase the size of the index and might impact performance and availability.

Substring "Contains" Queries

If your application frequently queries for strings that are present anywhere in the field, you might run $regex queries, which check every document and return all matches in no particular order.

Instead, we recommend MongoDB Search queries that use the $search aggregation pipeline stage. The following queries search for movie titles that contain the term park anywhere in the title field.

➤ Try this in the MongoDB Search Playground.

$regex Query

$search Query

db.movies.find({ title: { $regex: "park", $options: "i" } }, { title: 1, _id: 0 })

[
  { title: 'Barefoot in the Park' },
  { title: 'The Panic in Needle Park' },
  { title: 'Gorky Park' },
  { title: 'The Park Is Mine' },
  { title: 'Jurassic Park' },
  { title: 'Mrs. Parker and the Vicious Circle' },
  { title: 'The Lost World: Jurassic Park' },
  { title: 'Dog Park' },
  { title: 'South Park: Bigger Longer & Uncut' },
  { title: 'Jurassic Park III' },
  { title: 'Mansfield Park' },
  { title: 'Jurassic Park III' },
  { title: 'Gosford Park' },
  { title: 'The Rosa Parks Story' },
  { title: 'The Delicate Art of Parking' },
  { title: 'Wicker Park' },
  { title: 'Chestnut: Hero of Central Park' },
  { title: 'Trailer Park Boys: The Movie' },
  { title: 'Ellie Parker' },
  { title: 'Paranoid Park' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "wildcard": {
        "query": "park*",
        "path": "title",
        "allowAnalyzedField": true
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { "$meta": "searchScore" }
    }
  }
])

[
  { title: 'Barefoot in the Park', score: 1 },
  { title: 'The Panic in Needle Park', score: 1 },
  { title: 'Gorky Park', score: 1 },
  { title: 'The Park Is Mine', score: 1 },
  { title: 'Jurassic Park', score: 1 },
  { title: 'Mrs. Parker and the Vicious Circle', score: 1 },
  { title: 'The Lost World: Jurassic Park', score: 1 },
  { title: 'Dog Park', score: 1 },
  { title: 'South Park: Bigger Longer & Uncut', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Mansfield Park', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Gosford Park', score: 1 },
  { title: 'The Rosa Parks Story', score: 1 },
  { title: 'Wicker Park', score: 1 },
  { title: 'The Delicate Art of Parking', score: 1 },
  { title: 'Chestnut: Hero of Central Park', score: 1 },
  { title: 'Trailer Park Boys: The Movie', score: 1 },
  { title: 'Ellie Parker', score: 1 },
  { title: 'Paranoid Park', score: 1 }
]
Type "it" for more

To run this $search query, create a MongoDB Search index with the following definition:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "contains",
        "searchAnalyzer": "lucene.standard"
      }
    }
  },
  "analyzers": [
    {
      "name": "contains",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "reverse"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 15
        },
        {
          "type": "reverse"
        }
      ]
    }
  ]
}

This index definition indexes the title field in the movies collection as the string type using a custom analyzer named contains that applies the following:

standard tokenizer to split the words by whitespace or punctuation.
lowercase token filter to transform the letters to lowercase to support case-insensitive queries.
reverse token filter (twice) to reverse the words to support efficient unanchored queries.
edgeGram token filter to create tokens of between four and fifteen characters in length.

Note

This custom analyzer only supports words up to fifteen characters in length. If you have words longer than fifteen characters, increase the maxGram value. It is not recommended to set a maxGram value higher than fifteen because higher values increase the size of the index and might impact performance and availability.

Suffix Queries

If your application frequently queries string fields values that end with a set of characters or suffix, you might run regex queries with the $regex option $, which searches for the end of the string value, and the option i, which makes it case-insensitive.

Instead, we recommend MongoDB Search queries that use the $search aggregation pipeline stage. The following queries search for movie titles that end with the term ring.

➤ Try this in the MongoDB Search Playground.

$regex Queries

$search Query

db.movies.find( { title: { $regex: "ring$" } }, { title: 1, _id: 0 } ) // Case-sensitive Query 1
db.movies.find( { title: { $regex: /ring$/ } }, { title: 1, _id: 0 } ) // Case-sensitive Query 2
db.movies.find( { title: { $regex: /ring$/i } }, { title: 1, _id: 0 } ) // Case-insensitive Query 1
db.movies.find( { title: { $regex: "ring$", $options: "i" } }, { title: 1, _id: 0 } ) // Case-insensitive Query 2

[
  { title: 'It Happens Every Spring' },
  { title: 'Larks on a String' },
  { title: 'Release the Prisoners to Spring' },
  { title: 'Manon of the Spring' },
  { title: 'Floundering' },
  { title: 'Autumn Spring' },
  { title: 'The Gathering' },
  { title: 'Blue Spring' },
  { title: 'Blue Spring' },
  { title: 'Girl with a Pearl Earring' },
  { title: 'Spring, Summer, Fall, Winter... and Spring' },
  { title: 'Breaking and Entering' },
  { title: 'Hunting and Gathering' },
  { title: 'Blood Tea and Red String' },
  { title: 'Warm Spring' },
  { title: 'The Conjuring' },
  { title: 'Thanks for Sharing' },
  { title: 'Leaving on the 15th Spring' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "autocomplete": {
        "query": "ring",
        "path": "title",
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'It Happens Every Spring', score: 4.683838844299316 },
  { title: 'Larks on a String', score: 4.683838844299316 },
  {
    title: 'Release the Prisoners to Spring',
    score: 4.683838844299316
  },
  { title: 'Manon of the Spring', score: 4.683838844299316 },
  { title: 'Floundering', score: 4.683838844299316 },
  {
    title: 'The Lord of the Rings: The Fellowship of the Ring',
    score: 4.683838844299316
  },
  { title: 'Autumn Spring', score: 4.683838844299316 },
  { title: 'The Gathering', score: 4.683838844299316 },
  { title: 'The Ring', score: 4.683838844299316 },
  { title: 'Tom and Jerry: The Magic Ring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Girl with a Pearl Earring', score: 4.683838844299316 },
  {
    title: 'Spring, Summer, Fall, Winter... and Spring',
    score: 4.683838844299316
  },
  { title: 'Curse of the Ring', score: 4.683838844299316 },
  { title: 'Breaking and Entering', score: 4.683838844299316 },
  { title: 'Closing the Ring', score: 4.683838844299316 },
  { title: 'Hunting and Gathering', score: 4.683838844299316 },
  { title: 'Blood Tea and Red String', score: 4.683838844299316 },
  { title: 'Warm Spring', score: 4.683838844299316 }
]
Type "it" for more

To run this $search query, create a MongoDB Search index similar to the following:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "autocomplete",
          "minGrams": 4,
          "maxGrams": 10,
          "analyzer": "lucene.keyword",
          "tokenization": "rightEdgeGram"
        }
      ]
    }
  }
}

This index definition indexes the title field using the:

The autocomplete type with the rightEdgeGram tokenization strategy to split the text into substrings or "grams" of between 4 (minimum) and 10 (maximum) characters in length, which supports partial searches starting from the end of the string.
The lucene.keyword analyzer to ensure matches only at the end of the text, and not at the end of intermediary words. To find suffix matches on intermediary words, use lucene.standard.

Learn More

To learn more about MongoDB Search queries, see Queries and Indexes.
To learn more about regex queries in MongoDB, see $regex.
MongoDB University offers a free course on optimizing MongoDB Performance. To learn more, see Monitoring and Insights.