How to Index Fields for Autocompletion
On this page
You can use the Atlas Search autocomplete
type to index text values in
string fields for autocompletion. You can query fields indexed as
autocomplete
type using the autocomplete operator.
You can also use the autocomplete
type to index:
Fields whose value is an array of strings. To learn more, see How to Index the Elements of an Array.
String fields inside an array of documents indexed as the embeddedDocuments type.
Tip
If you have a large number of documents and a wide range of data
against which you want to run Atlas Search queries using the
autocomplete operator, building this index can take
some time. Alternatively, you can create a separate index with only
the autocomplete
type to reduce the impact on other indexes and
queries while the index builds.
To learn more, see Atlas Search Index Performance Considerations.
Atlas Search doesn't dynamically index
fields of type autocomplete
. You must use static mappings to index autocomplete
fields. You can
use the Visual Editor or the JSON Editor in the Atlas UI
to index fields of type autocomplete
.
Define the Index for the autocomplete
Type
To define the index for the autocomplete
type, choose your preferred
configuration method in the Atlas UI and then select the
database and collection.
Click Refine Your Index to configure your index.
In the Field Mappings section, click Add Field to open the Add Field Mapping window.
Click Customized Configuration.
Select the field to index from the Field Name dropdown.
Note
You can't index fields that contain the dollar (
$
) sign at the start of the field name.For field names that contain the term
email
orurl
, the Atlas Search Visual Editor recommends using a custom analyzer with the uaxUrlEmail tokenizer for indexing email addresses or URL values. Click Create urlEmailAnalyzer to create and apply the custom analyzer to the Autocomplete Properties for the field.Click the Data Type dropdown and select Autocomplete.
(Optional) Expand and configure the Token Properties for the field. To learn more, see Configure
token
Field Properties.Click Add.
The following is the JSON syntax for the autocomplete
type.
Replace the default index definition with the following. To learn more
about the fields, see Field Properties.
1 { 2 "mappings": { 3 "dynamic": true|false, 4 "fields": { 5 "<field-name>": { 6 "type": "autocomplete", 7 "analyzer": "<lucene-analyzer>", 8 "tokenization": "edgeGram|rightEdgeGram|nGram", 9 "minGrams": <2>, 10 "maxGrams": <15>, 11 "foldDiacritics": true|false 12 } 13 } 14 } 15 }
Configure autocomplete
Field Properties
The Atlas Search autocomplete
type takes the following parameters:
Option | Type | Necessity | Description | Default | ||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
type | string | required | Human-readable label that identifies this field type. Value must
be string. | |||||||||||||||||||||||||||||||||||||||||||||||||
analyzer | string | optional | Name of the analyzer to use with this
autocomplete mapping. You can use any Atlas Search analyzer except the
| lucene.standard | ||||||||||||||||||||||||||||||||||||||||||||||||
maxGrams | int | optional | Maximum number of characters per indexed sequence. The
value limits the character length of indexed tokens. When you
search for terms longer than the maxGrams value, Atlas Search
truncates the tokens to the maxGrams length. | 15 | ||||||||||||||||||||||||||||||||||||||||||||||||
minGrams | int | optional | Minimum number of characters per indexed sequence. We
recommend 4 for the minimum value. A value that is less
than 4 could impact performance because the size of the
index can become very large. We recommend the default value of
2 for edgeGram only. | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||
tokenization | enum | optional | Tokenization strategy to use when indexing the field for autocompletion. Value can be one of the following:
When tokenized with a
Indexing a field for autocomplete with an For the specified tokenization strategy, Atlas Search applies the
following process to concatenate sequential tokens before
emitting them. This process is sometimes referred to as
"shingling". Atlas Search emits tokens between
| edgeGram | ||||||||||||||||||||||||||||||||||||||||||||||||
foldDiacritics | boolean | optional | Flag that indicates whether to include or remove diacritics from the indexed text. Value can be one of the following:
| true |
Try an Example for the autocomplete
Type
The following index definition example uses the sample_mflix.movies collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.
The following index definition example indexes only the title
field as the autocomplete
type to support search-as-you-type
queries against that field using the autocomplete
operator. The index definition also specifies the following:
Use the standard analyzer to divide text values into terms based on word boundaries.
Use the
edgeGram
tokenization strategy to index characters starting at the left side of the words .Index a minimum of
3
characters per indexed sequence.Index a maximum of
5
characters per indexed sequence.Include diacritic marks in the index and query text.
In the Add Field Mapping window, select title from the Field Name dropdown.
Click the Data Type dropdown and select Autocomplete.
Make the following changes to the Autocomplete Properties:
Max GramsSet value to5
.Min GramsSet value to3
.TokenizationSelectedgeGram
from dropdown.Fold DiacriticsSelectfalse
from dropdown.Click Add.
Replace the default index definition with the following index definition.
1 { 2 "mappings": { 3 "dynamic": false, 4 "fields": { 5 "title": { 6 "type": "autocomplete", 7 "analyzer": "lucene.standard", 8 "tokenization": "edgeGram", 9 "minGrams": 3, 10 "maxGrams": 5, 11 "foldDiacritics": false 12 } 13 } 14 } 15 }
The following index definition example uses the sample_mflix.movies collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.
You can index a field as other types also by specifying the other
types in the array. For example, the following index definition
indexes the title
field as the following types:
autocomplete
type to support autocompletion for queries using the autocomplete operator.string
type to support text search using operators such text, phrase, and so on.
In the Add Field Mapping window, select title from the Field Name dropdown.
Click the Data Type dropdown and select Autocomplete.
Make the following changes to the Autocomplete Properties:
Max GramsSet value to15
.Min GramsSet value to2
.TokenizationSelectedgeGram
from dropdown.Fold DiacriticsSelectfalse
from dropdown.Click Add.
Repeat steps b through d.
Click the Data Type dropdown and select String.
Accept the default String Properties settings and click Add.
Replace the default index definition with the following index definition.
1 { 2 "mappings": { 3 "dynamic": true|false, 4 "fields": { 5 "title": [ 6 { 7 "type": "autocomplete", 8 "analyzer": "lucene.standard", 9 "tokenization": "edgeGram", 10 "minGrams": 2, 11 "maxGrams": 15, 12 "foldDiacritics": false 13 }, 14 { 15 "type": "string" 16 } 17 ] 18 } 19 } 20 }
The following index definition example uses the sample_mflix.users collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.
The following index definition example indexes only the email
field as the autocomplete
type to support search-as-you-type
queries against that field using the autocomplete
operator. The index definition specifies the following:
Use the keyword analyzer to accept a string or array of strings as a parameter and index them as a single term (token).
Use the nGram tokenizer to tokenize text into chunks, or "n-grams", of given sizes.
Index a minimum of
3
characters per indexed sequence.Index a maximum of
15
characters per indexed sequence.Include diacritic marks in the index and query text.
You can also use the uaxUrlEmail
tokenizer to tokenizes
URLs and email addresses. To learn more, see
uaxUrlEmail.
In the Add Field Mapping window, select email from the Field Name dropdown.
Click the Data Type dropdown and select Autocomplete.
Make the following changes to the Autocomplete Properties:
AnalyzerSelect lucene.keyword from the dropdown.Max GramsSet value to15
.Min GramsSet value to3
.TokenizationSelect nGram from the dropdown.Fold DiacriticsSelectfalse
from dropdown.Click Add.
Replace the default index definition with the following index definition.
1 { 2 "mappings": { 3 "dynamic": true, 4 "fields": { 5 "email": { 6 "type": "autocomplete", 7 "analyzer": "lucene.keyword", 8 "tokenization": "nGram", 9 "minGrams": 3, 10 "maxGrams": 15, 11 "foldDiacritics": false 12 } 13 } 14 } 15 }