/ /

/ /

Simple Analyzer

The simple analyzer divides text into searchable terms (tokens) wherever it finds a non-letter character, such as whitespace, punctuation, or one or more digits. It converts all text to lower case.

If you select Refine Your Index, the Atlas UI displays a section titled View text analysis of your selected index configuration within the Index Configurations section. If you expand this section, the Atlas UI displays the index and search tokens that the simple analyzer generates for each sample string. You can see the tokens that the simple analyzer creates for a built-in sample document and query string when you create or edit an index in the Atlas UI Visual Editor.

Important

MongoDB Search won't index string fields where analyzer tokens exceed 32766 bytes in size. If using the keyword analyzer, string fields which exceed 32766 bytes will not be indexed.

Example

The following example index definition specifies an index on the title field in the sample_mflix.movies collection using the simple analyzer. To follow along with this example, load the sample data on your cluster and either use mongosh or navigate to the Create a Search Index page in the Atlas UI following the steps in the Create a MongoDB Search Index tutorial.

Then, using the movies collection as your data source, follow the example procedure to create an index from mongosh or the Atlas UI Visual Editor or JSON editor.

➤ Use the Select your language drop-down menu to set the interface for the example on this page.

Click Refine Your Index to configure your index.
In the Index Configurations section, toggle Dynamic Mapping to off.
In the Field Mappings section, click Add Field to open the Add Field Mapping window.
Click Customized Configuration.
Select title from the Field Name dropdown.
Click the Data Type dropdown and select String if it isn't already selected.

Expand String Properties and make the following changes:

Index Analyzer	Select `lucene.simple` from the dropdown.
Search Analyzer	Select `lucene.simple` from the dropdown.
Index Options	Use the default `offsets`.
Store	Use the default `true`.
Ignore Above	Keep the default setting.
Norms	Use the default `include`.

Click Add.
Click Save Changes.
Click Create Search Index.

Replace the default index definition with the following index definition.

{
  "mappings": {
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "lucene.simple"
      }
    }
  }
}

Click Next.
Click Create Search Index.

1 db.movies.createSearchIndex(
2   "default",
3     {
4       "mappings": {
5         "fields": {
6           "title": {
7             "type": "string",
8             "analyzer": "lucene.simple"
9           }
10         }
11       }
12     }
13 )

The following query searches for the term lion in the title field and limits the output to five results.

Click the Query button for your index.
Click Edit Query to edit the query.
Click on the query bar and select the database and collection.

Replace the default query with the following and click Find:

[
  {
    "$search": {
      "text": {
        "query": "lion",
        "path": "title"
      }
    }
  }
]

SCORE: 3.9090898036956787  _id:  "573a13cbf29313caabd8135d"
   awards: Object
   cast: Array (4)
   countries: Array (1)
   directors: Array (1)
   fullplot: "According to the legend of the Shangaan, white lions are the messenger…"
   genres: Array (2)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-09-02 00:45:38.833000000"
   num_mflix_comments: 2
   plot: "According to the legend of the Shangaan, white lions are the messenger…"
   poster: "https://m.media-amazon.com/images/M/MV5BMTcwMTAyMzg5OV5BMl5BanBnXkFtZT…"
   rated: "PG"
   released: 2010-02-19T00:00:00.000+00:00
   runtime: 88
   title: "White Lion"
   type: "movie"
   writers: Array (3)
   year: 2010
SCORE: 3.363236427307129    _id:  "573a1399f29313caabcee7fc"
   awards: Object
   cast: Array (4)
   countries: Array (1)
   directors: Array (2)
   fullplot: "A young lion Prince is cast out of his pride by his cruel uncle, who c…"
   genres: Array (3)
   imdb: Object
   languages: Array (4)
   lastupdated: "2015-08-31 00:04:32.670000000"
   metacritic: 83
   num_mflix_comments: 132
   plot: "Lion cub and future king Simba searches for his identity. His eagernes…"
   poster: "https://m.media-amazon.com/images/M/MV5BYTYxNGMyZTYtMjE3MS00MzNjLWFjNm…"
   rated: "G"
   released: 1994-06-24T00:00:00.000+00:00
   runtime: 89
   title: "The Lion King"
   tomatoes: Object
   type: "movie"
   writers: Array (29)
   year: 1994
SCORE: 3.363236427307129    _id:  "573a13a9f29313caabd1f600"
   awards: Object
   cast: Array (4)
   countries: Array (2)
   directors: Array (1)
   fullplot: "Timon and Pumbaa start to watch the original Lion King movie, but Timo…"
   genres: Array (3)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-09-14 00:01:14.313000000"
   num_mflix_comments: 0
   plot: "Timon the meerkat and Pumbaa the warthog retell the story of The Lion …"
   poster: "https://m.media-amazon.com/images/M/MV5BYzg2N2Y1ODYtY2QyMi00ZDAyLWE3MT…"
   rated: "G"
   released: 2004-02-10T00:00:00.000+00:00
   runtime: 77
   title: "The Lion King 1 1/2"
   tomatoes: Object
   type: "movie"
   writers: Array (5)
   year: 2004
SCORE: 3.363236427307129    _id:  "573a13abf29313caabd24af6"
   awards: Object
   cast: Array (4)
   countries: Array (2)
   directors: Array (1)
   fullplot: "Timon and Pumbaa start to watch the original Lion King movie, but Timo…"
   genres: Array (3)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-08-31 05:44:38.700000000"
   num_mflix_comments: 0
   plot: "Timon the meerkat and Pumbaa the warthog retell the story of The Lion …"
   poster: "https://m.media-amazon.com/images/M/MV5BYzg2N2Y1ODYtY2QyMi00ZDAyLWE3MT…"
   rated: "G"
   released: 2004-02-10T00:00:00.000+00:00
   runtime: 77
   title: "The Lion King 1 1/2"
   tomatoes: Object
   type: "movie"
   writers: Array (5)
   year: 2004
SCORE: 2.9511470794677734   _id:  "573a1396f29313caabce366e"
   awards: Object
   cast: Array (4)
   countries: Array (2)
   directors: Array (1)
   fullplot: "Christmas 1183--an aging and conniving King Henry II plans a reunion w…"
   genres: Array (2)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-09-17 01:39:32.220000000"
   num_mflix_comments: 0
   plot: "1183 AD: King Henry II's three sons all want to inherit the throne, bu…"
   poster: "https://m.media-amazon.com/images/M/MV5BMTkzNzYyMzA5N15BMl5BanBnXkFtZT…"
   rated: "PG"
   released: 1968-10-30T00:00:00.000+00:00
   runtime: 134
   title: "The Lion in Winter"
   tomatoes: Object
   type: "movie"
   writers: Array (2)
   year: 1968
SCORE: 2.9511470794677734   _id:  "573a13c1f29313caabd63be7"
   awards: Object
   cast: Array (4)
   countries: Array (1)
   directors: Array (1)
   genres: Array (1)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-04-24 02:38:23.767000000"
   num_mflix_comments: 0
   poster: "https://m.media-amazon.com/images/M/MV5BMTg4Mzg4NDk5MF5BMl5BanBnXkFtZT…"
   released: 2009-11-06T00:00:00.000+00:00
   runtime: 92
   title: "Son of a Lion"
   tomatoes: Object
   type: "movie"
   writers: Array (1)
   year: 2007
SCORE: 2.9511470794677734   _id:  "573a13dbf29313caabdaf30d"
   awards: Object
   cast: Array (4)
   countries: Array (2)
   directors: Array (1)
   fullplot: "Neo-Nazi falls in love with a woman who has a black son and finds hims…"
   genres: Array (2)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-08-15 00:13:18.457000000"
   num_mflix_comments: 0
   plot: "Neo-Nazi falls in love with a woman who has a black son and finds hims…"
   poster: "https://m.media-amazon.com/images/M/MV5BY2M4ZjI5NmMtZjcyNy00NWU3LWI2Zj…"
   released: 2013-10-18T00:00:00.000+00:00
   runtime: 104
   title: "Heart of a Lion"
   tomatoes: Object
   type: "movie"
   writers: Array (1)
   year: 2013
SCORE: 2.629019260406494    _id:  "573a1397f29313caabce5e62"
   awards: Object
   cast: Array (4)
   countries: Array (1)
   directors: Array (1)
   fullplot: "At the beginning of the 20th century an American woman is abducted in …"
   genres: Array (3)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-09-02 00:17:16.943000000"
   num_mflix_comments: 2
   plot: "At the beginning of the 20th century an American woman is abducted in …"
   poster: "https://m.media-amazon.com/images/M/MV5BYTNhODI4NWYtYzc1Zi00OGIxLTk5ZW…"
   rated: "PG"
   released: 1975-10-26T00:00:00.000+00:00
   runtime: 119
   title: "The Wind and the Lion"
   tomatoes: Object
   type: "movie"
   writers: Array (1)
   year: 1975
SCORE: 2.629019260406494    _id:  "573a13ebf29313caabdcfc8d"
   awards: Object
   cast: Array (4)
   countries: Array (1)
   directors: Array (1)
   fullplot: "A documentary on young actress, Marianna Palka, as she confronts her r…"
   genres: Array (3)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-09-03 00:37:45.227000000"
   num_mflix_comments: 0
   plot: "A documentary on young actress, Marianna Palka, as she confronts her r…"
   poster: "https://m.media-amazon.com/images/M/MV5BMTgzMTc2OTg2N15BMl5BanBnXkFtZT…"
   released: 2014-01-18T00:00:00.000+00:00
   runtime: 15
   title: "The Lion's Mouth Opens"
   type: "movie"
   writers: Array (1)
   year: 2014
SCORE: 2.3702940940856934   _id:  "573a139af29313caabcf0ccd"
   awards: Object
   cast: Array (4)
   countries: Array (2)
   directors: Array (2)
   fullplot: "Simba and Nala have a daughter, Kiara. Timon and Pumbaa are assigned t…"
   genres: Array (3)
   imdb: Object
   languages: Array (1)
   lastupdated: "2015-08-24 00:49:09.900000000"
   num_mflix_comments: 0
   plot: "Simba's daughter is the key to a resolution of a bitter feud between S…"
   poster: "https://m.media-amazon.com/images/M/MV5BY2Y3MTk2MDgtOTc1Yy00ZmFjLThlNT…"
   rated: "G"
   released: 1998-10-27T00:00:00.000+00:00
   runtime: 81
   title: "The Lion King 2: Simba's Pride"
   tomatoes: Object
   type: "movie"
   writers: Array (10)
   year: 1998

1 db.movies.aggregate([
2   {
3     "$search": {
4       "text": {
5          "query": "lion",
6          "path": "title"
7       }
8     }
9   },
10   {
11     "$limit": 5
12   },
13   {
14     "$project": {
15       "_id": 0,
16       "title": 1
17     }
18   }
19 ])

[
  { title: 'White Lion' },
  { title: 'The Lion King' },
  { title: 'The Lion King 1 1/2' },
  { title: 'The Lion King 1 1/2' },
  { title: 'Lion's Den' },
]

MongoDB Search returns these documents by doing the following for the text in the title field using the lucene.simple analyzer:

Convert text to lowercase.
Create separate tokens by dividing text wherever there is a non-letter character.

The following table shows the tokens that MongoDB Search creates using the Simple Analyzer and, by contrast, the Standard Analyzer and Whitespace Analyzer for the documents in the results:

Title	Simple Analyzer Tokens	Standard Analyzer Tokens	Whitespace Analyzer Tokens
`White Lion`	`white`, `lion`	`white`, `lion`	`White`, `Lion`
`The Lion King`	`the`, `lion`, `king`	`the`, `lion`, `king`	`The`, `Lion`, `King`
`The Lion King 1 1/2`	`the`, `lion`, `king`	`the`, `lion`, `king`, `1`, `1`, `2`	`The`, `Lion`, `King`, `1`, `1/2`
`Lion's Den`	`lion`, `s`, `den`	`lion's`, `den`	`Lion's`, `Den`

MongoDB Search returns document Lion's Den in the results because the simple analyzer creates a separate token for lion, which matches the query term lion. By contrast, if you index the field using the Standard Analyzer or Whitespace Analyzer, MongoDB Search would return some of the documents in the results for the query, but not Lion's Den because these analyzers would create the tokens lion's and Lion's respectively, but don't create a token for lion.

Back

Standard

Whitespace

1	db.movies.createSearchIndex(
2	"default",
3	{
4	"mappings": {
5	"fields": {
6	"title": {
7	"type": "string",
8	"analyzer": "lucene.simple"
9	}
10	}
11	}
12	}
13	)

1	db.movies.aggregate([
2	{
3	"$search": {
4	"text": {
5	"query": "lion",
6	"path": "title"
7	}
8	}
9	},
10	{
11	"$limit": 5
12	},
13	{
14	"$project": {
15	"_id": 0,
16	"title": 1
17	}
18	}
19	])