Use language-specific analyzers to create indexes tailored to a particular language. Each language analyzer has built-in stop words and word divisions based on that language's usage patterns.
MongoDB Search offers the following language analyzers:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 cjk is a generic Chinese, Japanese, and Korean analyzer
2 kuromoji is a Japanese analyzer
3 morfologik is a Polish analyzer
4 nori is a Korean analyzer
5 smartcn is a Chinese analyzer
Examples
Consider a collection named cars with the following documents:
{ "_id": 1, "subject": { "en": "It is better to equip our cars to understand the causes of the accident.", "fr": "Mieux équiper nos voitures pour comprendre les causes d'un accident.", "he": "עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה." } }
{ "_id": 2, "subject": { "en": "The best time to do this is immediately after you've filled up with fuel", "fr": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant.", "he": "הזמן הטוב ביותר לעשות זאת הוא מיד לאחר שמילאת דלק." } }
Built-In Language Analyzer Example
The following example index definition specifies an index on the subject.fr field using the french analyzer:
{ "mappings": { "fields": { "subject": { "fields": { "fr": { "analyzer": "lucene.french", "type": "string" } }, "type": "document" } } } }
The following MongoDB Search query searches for the string pour in the subject.fr field. To run this query, connect to your cluster using mongosh and switch to the database that contains the cars collection.
db.cars.aggregate([ { $search: { "text": { "query": "pour", "path": "subject.fr" } } }, { $project: { "_id": 0, "subject.fr": 1 } } ])
The previous query returns no results when using the french analyzer, because pour is a built-in stop word. Using the standard analyzer, the same query would return both documents.
The following MongoDB Search query searches for the string carburant in the subject.fr field. To run this query, connect to your cluster using mongosh and switch to the database that contains the cars collection.
db.cars.aggregate([ { $search: { "text": { "query": "carburant", "path": "subject.fr" } } }, { $project: { "_id": 0, "subject.fr": 1 } } ])
{ subject: { fr: "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." } }
MongoDB Search returns a document with _id: 1 in the results because the query matched a token that the lucene.french analyzer created for the document. The lucene.french analyzer creates the following tokens for the subject.fr field in document with _id: 1:
|
|
|
|
|
|
|
|
|
Custom Language Analyzer Example
You can also create indexes for unsupported languages by creating a custom analyzer with the icuFolding and stopword token filters.
The following example index definition specifies an index on the subject.he field using a custom analyzer called myHebrewAnalyzer to analyze and create tokens for Hebrew text:
{ "analyzer": "lucene.standard", "mappings": { "dynamic": false, "fields": { "subject": { "fields": { "he": { "analyzer": "myHebrewAnalyzer", "type": "string" } }, "type": "document" } } }, "analyzers": [ { "charFilters": [], "name": "myHebrewAnalyzer", "tokenFilters": [ { "type": "icuFolding" }, { "tokens": [ "אן", "שלנו", "זה", "אל" ], "type": "stopword" } ], "tokenizer": { "type": "standard" } } ] }
The following MongoDB Search query searches for the string המכוניות in the subject.he field. To run this query, connect to your cluster using mongosh and switch to the database that contains the cars collection.
db.cars.aggregate([ { $search: { "text": { "query": "המכוניות", "path": "subject.he" } } }, { $project: { "_id": 0, "subject.he": 1 } } ])
{ subject: { he: 'עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה.' } }
MongoDB Search returns a document with _id: 1 in the results because the query matched a token that the myHebrewAnalyzer analyzer created for document. The myHebrewAnalyzer analyzer creates the following tokens for the subject.he field in document with _id: 1:
|
|
|
|
|
|
|
|
|
Multilingual Search Example
You can also create an index that uses multiple language analyzers to perform a multilingual search.
The following example index definition specifies an index with dynamic mapping on the sample_mflix.movies collection. The definition applies the lucene.italian language analyzer to index the fullplot field, and uses the multi option to specify lucene.english as an alternate language analyzer. MongoDB Search uses the default lucene.english language analyzer for all other fields that it dynamically indexes in the movies collection.
{ "analyzer": "lucene.standard", "mappings": { "dynamic": true, "fields": { "fullplot": { "type": "string", "analyzer": "lucene.italian", "multi": { "fullplot_english": { "type": "string", "analyzer": "lucene.english", } } } } } }
The following MongoDB Search query uses the compound operator to query the collection in multiple languages. To run this query, connect to your cluster using mongosh and switch to the sample_mflix database.
The compound operator contains the following clauses:
mustclause searches for movie plots in English and Italian that contain the termBellausing the text operatormustNotclause excludes movies released between the years 1984 to 2016 using the range operatorshouldclause specifies preference for theComedygenre using the text operator
db.movies.aggregate([ { $search: { "index": "multilingual-tutorial", "compound": { "must": [{ "text": { "query": "Bella", "path": { "value": "fullplot", "multi": "fullplot_english" } } }], "mustNot": [{ "range": { "path": "released", "gt": ISODate("1984-01-01T00:00:00.000Z"), "lt": ISODate("2016-01-01T00:00:00.000Z") } }], "should": [{ "text": { "query": "Comedy", "path": "genres" } }] } } }, { $project: { "_id": 0, "title": 1, "plot": 1, "genres": 1, "runtime": 1, "fullplot": 1, "released": 1, "score": { "$meta": "searchScore" } } } ])
[ { plot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come ...", genres: [ 'Comedy' ], runtime: 100, title: 'Policewoman', fullplot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come Giovanna d'Arco, il suo idolo, non guardera' in faccia a nessuno e con l'aiuto del pretore Patane', innamorato di lei, smascherera' una serie di intrallazzi e corruzione denunciando perfino il suo capo, Marcellini. I due paladini della giustizia coroneranno il loro sogno d'amore, trasferiti in una lontana isoletta a sud della Sicilia, ma i corrotti resteranno al loro posto.", released: ISODate('1974-11-15T00:00:00.000Z'), score: 3.4109344482421875 }, { plot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la ...`, genres: [ 'Comedy' ], runtime: 95, title: 'Love and Larceny', fullplot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la prigione, dove incontra Chinotto e Gloria Patri. Uscito inizia, con l'opposizione di Annalisa che lo vuole sposare, una carriera come truffatore, dapprima in societè con Chinotto e quindi con la bella Elena. Tutto sembra filare a gonfie vele, e le truffe divengono sempre piè grosse e di successo. Ma a volte è destino che il ragno resti preso dalla stessa tela che tesse.`, released: ISODate('1960-02-10T00:00:00.000Z'), score: 3.3489856719970703 }, { plot: 'He is a revenge-obssessed stevedore... She is a wealthy, elusive woman. They try hard to get together... or do they?', genres: [ 'Drama' ], runtime: 137, title: 'The Moon in the Gutter', fullplot: "Nightly, Gerard broods in an alley hoping to catch his sister's attacker. He lives with his lover Bella whom he neglects, an alcoholic brother who lurks about, and his father who's stayed drunk since the daughter's death, ignoring work and his own companion. At a seedy bar, Gerard meets a wealthy, nihilistic hedonist and his beautiful sister. Gerard flips for her and thinks she's his ticket out of the slum...", released: ISODate('1983-05-18T00:00:00.000Z'), score: 3.2985665798187256 }, { plot: 'Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories...', genres: [ 'Horror' ], runtime: 90, title: 'Tales That Witness Madness', fullplot: "Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories as to why each of the four patients went mad... cue four distinct tales each with a different set of characters: 'Mr Tiger' tells of Paul, the sensitive and troubled young son of prosperous but constantly bickering and unlovely parents, and the boy's 'imaginary' friend, a tiger. 'Penny Farthing' tells of Timothy, an antique store owner propelled backwards in time by a penny-farthing bicycle in his shop, all the while being watched over by the constantly changing photograph of Uncle Albert, which endangers the lives of both Timothy and his beautiful wife, Ann. 'Mel' tells of Brian, a man who brings home an old dead tree and prominently displays it in his living room as a work of art. His fiery wife Bella soon becomes jealous of the tree, which the husband has lovingly named Mel, and it seems to be developing a will of its own. 'Luau' tells of Auriol, a flamboyant and ambitious literary agent who will do anything to impress her sinister new client, though he seems more interested in Auriol's beautiful and precocious young daughter Ginny. Ginny sneaks off on holiday while Auriol plans a sumptuous feast for her client.", released: ISODate('1973-10-31T00:00:00.000Z'), score: 1.9504895210266113 } ]