Make the MongoDB docs better! We value your opinion. Share your feedback for a chance to win $100.
MongoDB Branding Shape
Click here >
Docs Menu

Language Analyzers

Use language-specific analyzers to create indexes tailored to a particular language. Each language analyzer has built-in stop words and word divisions based on that language's usage patterns.

MongoDB Search offers the following language analyzers:

lucene.arabic

lucene.armenian

lucene.basque

lucene.bengali

lucene.brazilian

lucene.bulgarian

lucene.catalan

lucene.chinese

lucene.cjk 1

lucene.czech

lucene.danish

lucene.dutch

lucene.english

lucene.finnish

lucene.french

lucene.galician

lucene.german

lucene.greek

lucene.hindi

lucene.hungarian

lucene.indonesian

lucene.irish

lucene.italian

lucene.japanese

lucene.korean

lucene.kuromoji 2

lucene.latvian

lucene.lithuanian

lucene.morfologik 3

lucene.nori 4

lucene.norwegian

lucene.persian

lucene.polish

lucene.portuguese

lucene.romanian

lucene.russian

lucene.smartcn 5

lucene.sorani

lucene.spanish

lucene.swedish

lucene.thai

lucene.turkish

lucene.ukrainian

1 cjk is a generic Chinese, Japanese, and Korean analyzer

2 kuromoji is a Japanese analyzer

3 morfologik is a Polish analyzer

4 nori is a Korean analyzer

5 smartcn is a Chinese analyzer

Consider a collection named cars with the following documents:

{
"_id": 1,
"subject": {
"en": "It is better to equip our cars to understand the causes of the accident.",
"fr": "Mieux équiper nos voitures pour comprendre les causes d'un accident.",
"he": "עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה."
}
}
{
"_id": 2,
"subject": {
"en": "The best time to do this is immediately after you've filled up with fuel",
"fr": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant.",
"he": "הזמן הטוב ביותר לעשות זאת הוא מיד לאחר שמילאת דלק."
}
}

The following example index definition specifies an index on the subject.fr field using the french analyzer:

{
"mappings": {
"fields": {
"subject": {
"fields": {
"fr": {
"analyzer": "lucene.french",
"type": "string"
}
},
"type": "document"
}
}
}
}

The following MongoDB Search query searches for the string pour in the subject.fr field. To run this query, connect to your cluster using mongosh and switch to the database that contains the cars collection.

db.cars.aggregate([
{
$search: {
"text": {
"query": "pour",
"path": "subject.fr"
}
}
},
{
$project: {
"_id": 0,
"subject.fr": 1
}
}
])

The previous query returns no results when using the french analyzer, because pour is a built-in stop word. Using the standard analyzer, the same query would return both documents.

The following MongoDB Search query searches for the string carburant in the subject.fr field. To run this query, connect to your cluster using mongosh and switch to the database that contains the cars collection.

db.cars.aggregate([
{
$search: {
"text": {
"query": "carburant",
"path": "subject.fr"
}
}
},
{
$project: {
"_id": 0,
"subject.fr": 1
}
}
])
{ subject: { fr: "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." } }

MongoDB Search returns a document with _id: 1 in the results because the query matched a token that the lucene.french analyzer created for the document. The lucene.french analyzer creates the following tokens for the subject.fr field in document with _id: 1:

meileu

moment

fair

est

imediat

aprè

fait

plein

carburant

You can also create indexes for unsupported languages by creating a custom analyzer with the icuFolding and stopword token filters.

The following example index definition specifies an index on the subject.he field using a custom analyzer called myHebrewAnalyzer to analyze and create tokens for Hebrew text:

{
"analyzer": "lucene.standard",
"mappings": {
"dynamic": false,
"fields": {
"subject": {
"fields": {
"he": {
"analyzer": "myHebrewAnalyzer",
"type": "string"
}
},
"type": "document"
}
}
},
"analyzers": [
{
"charFilters": [],
"name": "myHebrewAnalyzer",
"tokenFilters": [
{
"type": "icuFolding"
},
{
"tokens": [
"אן",
"שלנו",
"זה",
"אל"
],
"type": "stopword"
}
],
"tokenizer": {
"type": "standard"
}
}
]
}

The following MongoDB Search query searches for the string המכוניות in the subject.he field. To run this query, connect to your cluster using mongosh and switch to the database that contains the cars collection.

db.cars.aggregate([
{
$search: {
"text": {
"query": "המכוניות",
"path": "subject.he"
}
}
},
{
$project: {
"_id": 0,
"subject.he": 1
}
}
])
{ subject: { he: 'עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה.' } }

MongoDB Search returns a document with _id: 1 in the results because the query matched a token that the myHebrewAnalyzer analyzer created for document. The myHebrewAnalyzer analyzer creates the following tokens for the subject.he field in document with _id: 1:

עדיף

לצייד

את

המכוניות

כדי

להבין

את

הגורמים

לתאונה

You can also create an index that uses multiple language analyzers to perform a multilingual search.

The following example index definition specifies an index with dynamic mapping on the sample_mflix.movies collection. The definition applies the lucene.italian language analyzer to index the fullplot field, and uses the multi option to specify lucene.english as an alternate language analyzer. MongoDB Search uses the default lucene.english language analyzer for all other fields that it dynamically indexes in the movies collection.

{
"analyzer": "lucene.standard",
"mappings": {
"dynamic": true,
"fields": {
"fullplot": {
"type": "string",
"analyzer": "lucene.italian",
"multi": {
"fullplot_english": {
"type": "string",
"analyzer": "lucene.english",
}
}
}
}
}
}

The following MongoDB Search query uses the compound operator to query the collection in multiple languages. To run this query, connect to your cluster using mongosh and switch to the sample_mflix database.

The compound operator contains the following clauses:

  • must clause searches for movie plots in English and Italian that contain the term Bella using the text operator

  • mustNot clause excludes movies released between the years 1984 to 2016 using the range operator

  • should clause specifies preference for the Comedy genre using the text operator

db.movies.aggregate([
{
$search: {
"index": "multilingual-tutorial",
"compound": {
"must": [{
"text": {
"query": "Bella",
"path": { "value": "fullplot", "multi": "fullplot_english" }
}
}],
"mustNot": [{
"range": {
"path": "released",
"gt": ISODate("1984-01-01T00:00:00.000Z"),
"lt": ISODate("2016-01-01T00:00:00.000Z")
}
}],
"should": [{
"text": {
"query": "Comedy",
"path": "genres"
}
}]
}
}
},
{
$project: {
"_id": 0,
"title": 1,
"plot": 1,
"genres": 1,
"runtime": 1,
"fullplot": 1,
"released": 1,
"score": { "$meta": "searchScore" }
}
}
])
[
{
plot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come ...",
genres: [ 'Comedy' ],
runtime: 100,
title: 'Policewoman',
fullplot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come Giovanna d'Arco, il suo idolo, non guardera' in faccia a nessuno e con l'aiuto del pretore Patane', innamorato di lei, smascherera' una serie di intrallazzi e corruzione denunciando perfino il suo capo, Marcellini. I due paladini della giustizia coroneranno il loro sogno d'amore, trasferiti in una lontana isoletta a sud della Sicilia, ma i corrotti resteranno al loro posto.",
released: ISODate('1974-11-15T00:00:00.000Z'),
score: 3.4109344482421875
},
{
plot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la ...`,
genres: [ 'Comedy' ],
runtime: 95,
title: 'Love and Larceny',
fullplot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la prigione, dove incontra Chinotto e Gloria Patri. Uscito inizia, con l'opposizione di Annalisa che lo vuole sposare, una carriera come truffatore, dapprima in societè con Chinotto e quindi con la bella Elena. Tutto sembra filare a gonfie vele, e le truffe divengono sempre piè grosse e di successo. Ma a volte è destino che il ragno resti preso dalla stessa tela che tesse.`,
released: ISODate('1960-02-10T00:00:00.000Z'),
score: 3.3489856719970703
},
{
plot: 'He is a revenge-obssessed stevedore... She is a wealthy, elusive woman. They try hard to get together... or do they?',
genres: [ 'Drama' ],
runtime: 137,
title: 'The Moon in the Gutter',
fullplot: "Nightly, Gerard broods in an alley hoping to catch his sister's attacker. He lives with his lover Bella whom he neglects, an alcoholic brother who lurks about, and his father who's stayed drunk since the daughter's death, ignoring work and his own companion. At a seedy bar, Gerard meets a wealthy, nihilistic hedonist and his beautiful sister. Gerard flips for her and thinks she's his ticket out of the slum...",
released: ISODate('1983-05-18T00:00:00.000Z'),
score: 3.2985665798187256
},
{
plot: 'Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories...',
genres: [ 'Horror' ],
runtime: 90,
title: 'Tales That Witness Madness',
fullplot: "Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories as to why each of the four patients went mad... cue four distinct tales each with a different set of characters: 'Mr Tiger' tells of Paul, the sensitive and troubled young son of prosperous but constantly bickering and unlovely parents, and the boy's 'imaginary' friend, a tiger. 'Penny Farthing' tells of Timothy, an antique store owner propelled backwards in time by a penny-farthing bicycle in his shop, all the while being watched over by the constantly changing photograph of Uncle Albert, which endangers the lives of both Timothy and his beautiful wife, Ann. 'Mel' tells of Brian, a man who brings home an old dead tree and prominently displays it in his living room as a work of art. His fiery wife Bella soon becomes jealous of the tree, which the husband has lovingly named Mel, and it seems to be developing a will of its own. 'Luau' tells of Auriol, a flamboyant and ambitious literary agent who will do anything to impress her sinister new client, though he seems more interested in Auriol's beautiful and precocious young daughter Ginny. Ginny sneaks off on holiday while Auriol plans a sumptuous feast for her client.",
released: ISODate('1973-10-31T00:00:00.000Z'),
score: 1.9504895210266113
}
]