Docs Menu
Docs Home
/
MongoDB Atlas
/ / / /

Whitespace Analyzer

The whitespace analyzer divides text into searchable terms (tokens) wherever it finds a whitespace character. It leaves all text in its original letter case.

You can see the tokens that the whitespace analyzer creates for a built-in static string in the Atlas UI Visual Editor when you Refine Your Index. The Index Configurations section displays the index and search tokens that the whitespace analyzer creates if you expand View text analysis of your selected index configuration to help you select the analyzer to use in your index.

Important

Atlas Search won't index string fields where analyzer tokens exceed 32766 bytes in size. If using the keyword analyzer, string fields which exceed 32766 bytes will not be indexed.

The following example index definition specifies an index on the title field in the sample_mflix.movies collection using the whitespace analyzer. If you loaded the collection on your cluster, you can create the example index using the Atlas UI Visual Editor or the JSON Editor. After you select your preferred configuration method, select the database and collection.

  1. Click Refine Your Index to configure your index.

  2. In the Field Mappings section, click Add Field to open the Add Field Mapping window.

  3. Select title from the Field Name dropdown.

  4. Click Customized Configuration.

  5. Click the Data Type dropdown and select String if it isn't already selected.

  6. Expand String Properties and make the following changes:

    Index Analyzer
    Select lucene.whitespace from the dropdown.
    Search Analyzer
    Select lucene.whitespace from the dropdown.
    Index Options
    Use the default offsets.
    Store
    Use the default true.
    Ignore Above
    Keep the default setting.
    Norms
    Use the default include.
  7. Click Add.

  8. Click Save Changes.

  9. Click Create Search Index.

  1. Replace the default index definition with the following index definition.

    {
    "mappings": {
    "fields": {
    "title": {
    "type": "string",
    "analyzer": "lucene.whitespace",
    "searchAnalyzer": "lucene.whitespace"
    }
    }
    }
    }
  2. Click Next.

  3. Click Create Search Index.

The following query searches for the term Lion's in the title field.

db.movies.aggregate([
{
"$search": {
"text": {
"query": "Lion's",
"path": "title"
}
}
},
{
"$project": {
"_id": 0,
"title": 1
}
}
])
[
{ title: 'Lion's Den' },
{ title: 'The Lion's Mouth Opens' }
]

Atlas Search returns these documents by doing the following for the text in the title field using the lucene.whitespace analyzer:

  • Retain the original letter case for the text.

  • Divide the text into tokens wherever it finds a whitespace character.

The following table shows the tokens (searchable terms) that Atlas Search creates using the Whitespace Analyzer and, by contrast, the Simple Analyzer and Keyword Analyzer for the documents in the results:

Title
Whitespace Analyzer Tokens
Simple Analyzer Tokens
Keyword Analyzer Tokens
Lion's Den
Lion's, Den
lion, s, den
Lion's Den
The Lion's Mouth Opens
The, Lion's, Mouth, Opens
the, lion, s, mouth, opens
The Lion's Mouth Opens

The index that uses whitespace analyzer is case-sensitive. Therefore, Atlas Search is able to match the query term Lion's to the token Lion's created by the whitespace analyzer.

Back

Simple