regex

On this page

Definition
Syntax
Options
Behavior
Lucene Regular Expression Behavior
Reserved characters
Supported Operators
Unsupported Operators
Examples
Index Definition

Definition

regex

regex interprets the query field as a regular expression. regex is a term-level operator, meaning that the query field isn't analyzed.

Note

The regular expression language available to the regex operator is a limited subset of the PCRE library.

For detailed information, see the Class RegExp documentation.

Syntax

regex has the following syntax:

1 {
2   $search: {
3     "index": <index name>, // optional, defaults to "default"
4     "regex": {
5       "query": "<search-string>",
6       "path": "<field-to-search>",
7       "allowAnalyzedField": <boolean>,
8       "score": <options>
9     }
10   }
11 }

Options

regex uses the following terms to construct a query:

Field	Type	Description	Necessity	Default
`query`	string or array of strings	String or strings to search for.	yes
`path`	string or array of strings	Indexed field or fields to search. You can also specify a wildcard path to search.	yes
`allowAnalyzedField`	boolean	Must be set to `true` if the query is run against an analyzed field.	no	`false`
`score`	object	Score to assign to matching search term results. Options are: `boost`: multiply the result score by the given number. `constant`: replace the result score with the given number. `function`: replace the result score with the given expression.	no

Behavior

regex is a term-level operator, meaning that the query field is not analyzed. Regular expression searches work well with the keyword analyzer, because it indexes fields one word at a time. To do a case-sensitive search, do not use the default analyzer, standard analyzer, because the standard analyzer lower cases all terms. Specify a different analyzer instead.

It is possible to use the regex operator to perform searches on an analyzed field by setting the allowAnalyzedField option to true, but you may get unexpected results.

Example

Searching for .*Star Trek.* on a field indexed with the keyword analyzer finds all documents in which the field contains the string Star Trek in any context. Searching for .*Star Trek.* on a field indexed with the standard analyzer finds nothing, because there is a space between Star and Trek, and the index contains no spaces.

Lucene Regular Expression Behavior

The Atlas Search regex operator uses the Lucene regular expression engine, which differs from Perl Compatible Regular Expressions.

Reserved characters

The following characters are reserved as operators when used in regular expressions:

. ? + * | { } [ ] ( ) < > " \ @ #

To use any of the above characters literally in a matching expression, precede it with a \ character.

Example

who\? matches "who?"

When using the escape character in mongosh or with a driver, you must use a double backslash before the character to be escaped.

Example

To create a wildcard expression which searches for any string containing a literal asterisk in an aggregation pipeline, use the following expression:

"*\\**"

The first and last asterisks act as wildcards which match any characters, and the \\* matches a literal asterisk.

Note

Use the following expression to escape a literal backslash:

"*\\\*"

Supported Operators

Operator	Description	Example
`.`	Matches any character.	`x.z` matches "xyz", "xaz", etc.
`?`	The preceding character is optional and matches if it occurs no more than once.	`xyz?` matches "xy" and "xyz"
`+`	The preceding character matches if it occurs one or more times.	`xy+` matches "xy", "xyy", "xyyy", etc.
`*`	The preceding character matches if it occurs any number of times.	`xyz` matches "xy", "xyz", "xyzz", etc.*
`\|`	The `OR` operator. The expression matches if either of the two patterns on either side of the `\|` operator matches.	`abc\|xyz` matches "abc" or "xyz"
`{<number>}`	The preceding character matches if it occurs exactly <number> times.	`xyz{3}` matches "xyzzz"
`()`	Characters inside parentheses are treated as a single unit for matching purposes.	`xyz(abc)[2]` matches "xyzabcabc"
`[]`	Matches any of the characters inside the square brackets. Adding a `^` to the beginning matches any character except those within the square brackets. Inside the square brackets, `-` indicates a range, unless `-` is the first character or is escaped with a `\`.	`[xyz]` matches "x", "y", and "z" `[^abc]` matches any character except "a", "b", or "c" `[a-d]` matches "a", "b", "c", or "d" `[0-9]` matches any numeric character from 0 through 9
`<>`	Match a numeric range.	`<1-3>` matches "1", "2", and "3"
`#`	The empty language operator. The `#` operator does not match any string, including an empty string.	`#\|xyz` matches "xyz" and nothing else

Unsupported Operators

regex does not support anchor operators. For example, these operators are not supported:

^, which matches the beginning of a line.
$, which matches the end of a line.

To match a term, the regular expression must match the entire string.

Examples

The following examples use the movies collection in the sample_mflix database with a custom index definition that uses the keyword analyzer. If you have the sample dataset on your cluster, you can create an Atlas Search index on the movies collection and run the example queries on your cluster.

Index Definition

The following index definition indexes the title field in the movies collection with the keyword analyzer:

1 {
2   "mappings": {
3     "fields": {
4       "title": {
5         "analyzer": "lucene.keyword",
6         "type": "string"
7       },
8       "genres": {
9         "type": "stringFacet"
10       }
11     }
12   }
13 }

The following example searches all title fields for movie titles that end with the word Seattle. The (.*) regular expression matches any number of characters.

1 db.movies.aggregate([
2   {
3     "$search": {
4         "regex": {
5           "path": "title",
6           "query": "(.*) Seattle"
7         }
8     }
9   },
10   {
11     $project: {
12         "_id": 0,
13         "title": 1
14     }
15   }
16 ])

{ "title" : "Sleepless in Seattle" }
{ "title" : "Battle in Seattle" }

The following example uses the regular expression [0-9]{2} (.){4}s to find movie titles which begin with a 2-digit number followed by a space, and end with a 5-letter word ending in s.

1 db.movies.aggregate([
2   {
3     "$search": {
4         "regex": {
5           "path": "title",
6           "query": "[0-9]{2} (.){4}s"
7         }
8     }
9   },
10   {
11     $project: {
12         "_id": 0,
13         "title": 1
14     }
15   }
16 ])

{ "title" : "20 Dates" }
{ "title" : "25 Watts" }
{ "title" : "21 Grams" }
{ "title" : "13 Lakes" }
{ "title" : "18 Meals" }
{ "title" : "17 Girls" }
{ "title" : "16 Acres" }
{ "title" : "26 Years" }
{ "title" : "99 Homes" }
{ "title" : "45 Years" }

The following example uses the regular expression .* to find movie titles that contain the term summer anywhere in the title and returns the number of movies that match the criteria in each genre.

1 db.movies.aggregate([
2   {
3     "$searchMeta": {
4       "facet": {
5         "operator": {
6           "regex": {
7             "path": "title",
8             "query": ".*summer.*"
9           }
10         },
11         "facets": {
12           "genresFacet": {
13             "type": "string",
14             "path": "genres"
15           }
16         }
17       }
18     }
19   }
20 ])

[
  {
    count: { lowerBound: Long('6') },
    facet: {
      genresFacet: {
        buckets: [
          { _id: 'Comedy', count: Long('5') },
          { _id: 'Fantasy', count: Long('3') },
          { _id: 'Romance', count: Long('3') },
          { _id: 'Drama', count: Long('2') },
          { _id: 'Horror', count: Long('1') },
          { _id: 'Mystery', count: Long('1') }
        ]
      }
    }
  }
]

Back

range

span (Deprecated)

1	{
2	$search: {
3	"index": <index name>, // optional, defaults to "default"
4	"regex": {
5	"query": "<search-string>",
6	"path": "<field-to-search>",
7	"allowAnalyzedField": <boolean>,
8	"score": <options>
9	}
10	}
11	}

1	{
2	"mappings": {
3	"fields": {
4	"title": {
5	"analyzer": "lucene.keyword",
6	"type": "string"
7	},
8	"genres": {
9	"type": "stringFacet"
10	}
11	}
12	}
13	}

1	db.movies.aggregate([
2	{
3	"$search": {
4	"regex": {
5	"path": "title",
6	"query": "(.*) Seattle"
7	}
8	}
9	},
10	{
11	$project: {
12	"_id": 0,
13	"title": 1
14	}
15	}
16	])

1	db.movies.aggregate([
2	{
3	"$searchMeta": {
4	"facet": {
5	"operator": {
6	"regex": {
7	"path": "title",
8	"query": ".summer."
9	}
10	},
11	"facets": {
12	"genresFacet": {
13	"type": "string",
14	"path": "genres"
15	}
16	}
17	}
18	}
19	}
20	])