Flexible Querying With Atlas Search

Ethan Steininger3 min read • Published Oct 04, 2022 • Updated Jul 12, 2024

GraphQL Atlas Search JavaScript

SNIPPET

Rate this tutorial

Introduction

In this walkthrough, I will show how the flexibility of Atlas Search's inverted indexes are a powerful option versus traditional b-tree indexes when it comes to supporting ad-hoc queries.

What is flexible querying?

Flexible query engines provide the ability to execute a performant query that spans multiple indexes in your data store. This means you can write ad-hoc, dynamically generated queries, where you don't need to know the query, fields, or ordering of fields in advance.

Be sure to check out the MongoDB documentation on this subject!

It's very rare that MongoDB’s query planner selects a plan that involves multiple indexes. In this tutorial, we’ll walk through a scenario in which this becomes a requirement.

Your application is in a constant state of evolution

Let’s say you have a movie application with documents like:

1 {
2   "title": "Fight Club",
3   "year": 1999,
4   "imdb": {
5     "rating": 8.9,
6     "votes": 1191784,
7     "id": 137523
8   },
9   "cast": [
10     "Edward Norton",
11     "Brad Pitt"
12   ]
13 }

Initial product requirements

Now for the version 1.0 application, you need to query on title and year, so you first create a compound index via:

db.movies.createIndex( { "title": 1, "year": 1 } )

Then issue the query:

db.movies.find({"title":"Fight Club", "year":1999})

When you run an explain plan, you have a perfect query with a 1:1 documents-examined to documents-returned ratio:

1 {
2   "executionStats": {
3     "executionSuccess": true,
4     "nReturned": 1,
5     "executionTimeMillis": 0,
6     "totalKeysExamined": 1,
7     "totalDocsExamined": 1
8   }
9 }

Our query then needs to evolve

Now our application requirements have evolved and you need to query on cast and imdb. First you create the index:

db.movies.createIndex( { "cast": 1, "imdb.rating": 1 } )

Then issue the query:

db.movies.find({"cast":"Edward Norton", "imdb.rating":{ $gte:9 } })

Not the greatest documents-examined to documents-returned ratio, but still not terrible:

1 {
2   "executionStats": {
3     "executionSuccess": true,
4     "nReturned": 7,
5     "executionTimeMillis": 0,
6     "totalKeysExamined": 17,
7     "totalDocsExamined": 17
8   }
9 }

Now our query evolves again

Now, our application requires you issue a new query, which becomes a subset of the original:

db.movies.find({"imdb.rating" : { $gte:9 } })

The query above results in the dreaded collection scan despite the previous compound index (cast_imdb.rating) comprising the above query’s key. This is because the "imdb.rating" field is not the index-prefix, and the query contains no filter conditions on the "cast" field."

Note: Collection scans should be avoided because not only do they instruct the cursor to look at every document in the collection which is slow, but it also forces documents out of memory resulting in increased I/O pressure.

Our query plan results as follows:

1 {
2   "executionStats": {
3     "executionSuccess": true,
4     "nReturned": 31,
5     "executionTimeMillis": 26,
6     "totalKeysExamined": 0,
7     "totalDocsExamined": 23532
8   }
9 }

Now you certainly could create a new index composed of just imdb.rating, which would return an index scan for the above query, but that’s three different indexes that the query planner would have to navigate in order to select the most performant response.

Alternatively: Atlas Search

Because Lucene uses a different index data structure (inverted indexes vs B-tree indexes), it’s purpose-built to run queries that overlap into multiple indexes.

Unlike compound indexes, the order of fields in the Atlas Search index definition is not important. Fields can be defined in any order. Therefore, it's not subject to the limitation above where a query that is only on a non-prefix field of a compound index cannot use the index.

If you create a single index that maps all of our four fields above (title, year, cast, imdb):

1 {
2   "mappings": {
3     "dynamic": false,
4     "fields": {
5       "title": {
6         "type": "string",
7         "dynamic": false
8       },
9       "year": {
10         "type": "number",
11         "dynamic": false
12       },
13       "cast": {
14         "type": "string",
15         "dynamic": false
16       },
17       "imdb.rating": {
18         "type": "number",
19         "dynamic": false
20       }                  
21     }
22   }
23 }

Then you issue a query that first spans title and year via a must (AND) clause, which is the equivalent of db.collection.find({"title":"Fight Club", "year":1999}):

1 [{
2   "$search": {
3     "compound": {
4       "must": [{
5           "text": {
6             "query": "Fight Club",
7             "path": "title"
8           }
9         },
10         {
11           "range": {
12             "path": "year",
13             "gte": 1999,
14             "lte": 1999
15           }
16         }
17       ]
18     }
19   }
20 }]

The corresponding query planner results:

1 {
2   '$_internalSearchIdLookup': {},
3   'executionTimeMillisEstimate': 6,
4   'nReturned': 0
5 }

Then when you add imdb and cast to the query, you can still get performant results:

1 [{
2     "$search": {
3       "compound": {
4         "must": [{
5             "text": {
6               "query": "Fight",
7               "path": "title"
8             },
9             {
10               "range": {
11                 "path": "year",
12                 "gte": 1999,
13                 "lte": 1999
14               },
15               {
16                 "text": {
17                   "query": "Edward Norton",
18                   "path": "cast"
19                 }
20               },
21               {
22                 "range": {
23                   "path": "year",
24                   "gte": 1999,
25                   "lte": 1999
26                 }
27               }
28             ]
29           }
30         }
31       }]

The corresponding query planner results:

1 {
2   '$_internalSearchIdLookup': {},
3   'executionTimeMillisEstimate': 6,
4   'nReturned': 0
5 }

This isn’t a peculiar scenario

Applications evolve as our users’ expectations and requirements do. In order to support your applications' evolving requirements, Standard B-tree indexes simply cannot evolve at the rate that an inverted index can.

Use cases

Here are several examples where Atlas Search's inverted index data structures can come in handy, with links to reference material:

GraphQL: If your database's entry point is GraphQL, where the queries are defined by the client, then you're a perfect candidate for inverted indexes
Advanced Search: You need to expand the filtering criteria for your searchbar beyond several fields.
Wildcard Search: Searching across fields that match combinations of characters and wildcards.
Ad-Hoc Querying: The need to dynamically generate queries on-demand by our clients.

Resources

Full code walkthrough via a Jupyter Notebook

Rate this tutorial

Tutorial

Building a Restaurant Locator Using Atlas, Neurelo, and AWS Lambda

Apr 02, 2024 | 8 min read

Tutorial

How to Deploy an Application in Kubernetes With the MongoDB Atlas Operator

Jan 13, 2025 | 9 min read

Tutorial

Build a Cocktail API with Beanie and MongoDB

Oct 01, 2024 | 6 min read

Tutorial

RAG with Atlas Vector Search, LangChain, and OpenAI

Sep 18, 2024 | 10 min read

Resources

Atlas

Flexible Querying With Atlas Search

Introduction

What is flexible querying?

Your application is in a constant state of evolution

Initial product requirements

Our query then needs to evolve

Now our query evolves again

Alternatively: Atlas Search

This isn’t a peculiar scenario

Use cases

Resources

Related

Building a Restaurant Locator Using Atlas, Neurelo, and AWS Lambda

How to Deploy an Application in Kubernetes With the MongoDB Atlas Operator

Build a Cocktail API with Beanie and MongoDB

RAG with Atlas Vector Search, LangChain, and OpenAI

Table of Contents

1	{
2	"title": "Fight Club",
3	"year": 1999,
4	"imdb": {
5	"rating": 8.9,
6	"votes": 1191784,
7	"id": 137523
8	},
9	"cast": [
10	"Edward Norton",
11	"Brad Pitt"
12	]
13	}

1	{
2	"executionStats": {
3	"executionSuccess": true,
4	"nReturned": 1,
5	"executionTimeMillis": 0,
6	"totalKeysExamined": 1,
7	"totalDocsExamined": 1
8	}
9	}

1	{
2	"executionStats": {
3	"executionSuccess": true,
4	"nReturned": 7,
5	"executionTimeMillis": 0,
6	"totalKeysExamined": 17,
7	"totalDocsExamined": 17
8	}
9	}

1	{
2	"executionStats": {
3	"executionSuccess": true,
4	"nReturned": 31,
5	"executionTimeMillis": 26,
6	"totalKeysExamined": 0,
7	"totalDocsExamined": 23532
8	}
9	}

1	{
2	"mappings": {
3	"dynamic": false,
4	"fields": {
5	"title": {
6	"type": "string",
7	"dynamic": false
8	},
9	"year": {
10	"type": "number",
11	"dynamic": false
12	},
13	"cast": {
14	"type": "string",
15	"dynamic": false
16	},
17	"imdb.rating": {
18	"type": "number",
19	"dynamic": false
20	}
21	}
22	}
23	}

1	[{
2	"$search": {
3	"compound": {
4	"must": [{
5	"text": {
6	"query": "Fight Club",
7	"path": "title"
8	}
9	},
10	{
11	"range": {
12	"path": "year",
13	"gte": 1999,
14	"lte": 1999
15	}
16	}
17	]
18	}
19	}
20	}]

1	{
2	'$_internalSearchIdLookup': {},
3	'executionTimeMillisEstimate': 6,
4	'nReturned': 0
5	}