An MongoDB Vector Search query takes the form of an aggregation pipeline that uses $vectorSearch as the first
stage. This guide explains the syntax, options, and behavior of the
$vectorSearch stage.
$vectorSearchThe
$vectorSearchstage performs a semantic search for a query vector on the specified field or fields. The field or fields must be indexed as the MongoDB Vector Search vector type inside a vectorSearch type index.
Note
MongoDB Vector Search supports ANN search on clusters running MongoDB v6.0.11, v7.0.2, or later and ENN search on clusters running MongoDB v6.0.16, v7.0.10, v7.3.2, or later.
Supported Clients
You can run $vectorSearch queries by using the
Atlas UI, mongosh, and any MongoDB driver.
You can also use MongoDB Vector Search with local Atlas deployments that you create with the Atlas CLI. To learn more, see Create a Local Atlas Deployment.
$vectorSearch is supported only on Atlas clusters
running the following MongoDB versions:
v6.0.11
v7.0.2 and later (including RCs).
Syntax
The field that you want to search must be indexed as MongoDB Vector Search vector type inside a vectorSearch index type.
A $vectorSearch pipeline has the following prototype form:
{ "$vectorSearch": { "exact": true | false, "filter": {<filter-specification>}, "index": "<index-name>", "limit": <number-of-results>, "numCandidates": <number-of-candidates>, "path": "<field-to-search>", "queryVector": [<array-of-numbers>], "explainOptions": { "traceDocumentIds": [<array-of-documentIDs>] } } }
Fields
The $vectorSearch stage takes a document with the following fields:
Field | Type | Necessity | Description |
|---|---|---|---|
| boolean | Optional | This is required if Flag that specifies whether to run ENN or ANN search. Value can be one of the following:
If omitted, defaults to MongoDB Vector Search supports ANN search on clusters running MongoDB v6.0.11, v7.0.2, or later and ENN search on clusters running MongoDB v6.0.16, v7.0.10, v7.3.2, or later. To learn more about these search types, see Vector Search Types. |
| document | Optional | MQL expression that compares an indexed field to use as a pre-filter. You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types. To learn which MQL operators MongoDB Vector Search supports in your filter, see MongoDB Vector Search Pre-Filtering. |
| string | Required | Name of the MongoDB Vector Search index to use. MongoDB Vector Search doesn't return results if you misspell the index name or if the specified index doesn't already exist on the cluster. |
| number | Required | Number (of type |
| number | Conditional | This field is required if Number of nearest neighbors to use during the search. Value must
be less than or equal to ( We recommend that you specify a number at least 20 times higher than the
number of documents to return ( This overrequest pattern is the recommended way to trade off latency and recall in your ANN searches, and we recommend tuning this parameter based on your specific dataset size and query requirements. To learn more about other variables that might impact this parameter, see |
| string | Required | Indexed vector type field to search. |
| array of numbers | Required | Array of numbers of To learn more about generating BSON The array size must match the number of vector You must embed your query with the same model that you used to embed the data. You can query your embeddings with full-fidelity vectors,
as long as the vector subtype is the same. This is only possible with
|
| document | Optional | Trace a list of vectors (identified by theirs |
explainOptions.traceDocumentIds | array of document IDs | Required | List of document |
Vector Search Types
When you define a $vectorSearch stage, you can use the
exact field to specify whether to run an ANN or ENN search.
ANN Search
For Approximate Nearest Neighbors (ANN) search, MongoDB Vector Search finds vector embeddings in your data that are closest to the vector embedding in your query based on their proximity in multi-dimensional space and based on the number of neighbors that it considers. It uses the Hierarchical Navigable Small Worlds algorithm and finds the vector embeddings most similar to the vector embedding in your query without scanning every vector. Therefore, ANN search is ideal for querying large datasets without significant filtering.
Note
Optimal recall for ANN search is typically considered to
be around 90-95% overlap in results with ENN search but
with significantly lower latency. This provides a good balance
between accuracy and performance. To achieve this with MongoDB Vector Search,
tune the numCandidates parameter
at query time.
numCandidates Selection
You must specify the numCandidates field to run ANN search.
This field determines how many nearest neighbors MongoDB Vector Search considers
during the search.
We recommend that you specify a numCandidates number at least 20
times higher than the number of documents to return (limit) to
increase accuracy and reduce discrepancies between your ENN and
ANN query results. For example, if you set limit to return
5 results, consider setting numCandidates to 100 as a
starting point. To learn more, see How to Measure the Accuracy of Your Query Results.
This overrequest pattern is the recommended way to trade off latency
and recall in your ANN searches. However, we recommend
tuning the numCandidates parameter based on your specific dataset
size and query requirements. To ensure that you get accurate results,
consider the following variables:
Index Size: Larger collections typically require higher
numCandidatesvalues to maintain recall. A collection with millions of vectors might need significantly more candidates than one with thousands of vectors.Limit Value: Because
numCandidatesis highly correlated with the index size, lowlimitvalues require proportionally highernumCandidatesvalues to maintain recall.Vector Quantization: Quantized vectors reduce storage at the cost of accuracy. Using quantized vectors (
int8orint1subtypes) might require highernumCandidatesvalues compared to full precisionfloat32vectors to maintain similar recall.
ENN Search
For an Exact Nearest Neighbors (ENN) search, MongoDB Vector Search exhaustively searches all the indexed vector embeddings by calculating the distance between all the embeddings and finds the exact nearest neighbor for the vector embedding in your query. This is computationally intensive and might negatively impact query latency. Therefore, we recommend ENN searches for the following use-cases:
You want to determine the recall and accuracy of your ANN query using the ideal, exact results for the ENN query.
You want to query less than 10000 documents without having to tune the number of nearest neighbors to consider.
Your want to include selective pre-filters in your query against collections where less than 5% of your data meets the given pre-filter.
If you enable automatic quantization, MongoDB Vector Search uses only the full-fidelity vectors for ENN queries.
Behavior
$vectorSearchmust be the first stage of any pipeline where it appears.
Limitations
$vectorSearch can't be used in
view definition and the following pipeline
stages:
| [1] | You can pass the results of $vectorSearch
to this stage. |
MongoDB Vector Search Indexing
You must index the fields to search using the $vectorSearch
stage inside a vectorSearch type index
definition. You can index the following types of fields in a MongoDB Vector Search
vectorSearch type index definition:
Fields that contain vector embeddings as vector type.
Additional fields as the filter type to enable vector search on pre-filtered data.
To learn more about these MongoDB Vector Search field types, see How to Index Fields for Vector Search.
MongoDB Vector Search Scoring
MongoDB Vector Search assigns a score, in a fixed range from 0 to 1
(where 0 indicates low similarity and 1 indicates high
similarity), to every document that it returns.
The score is calculated according to the similarity function that
you specify in the MongoDB Vector Search index definition. To learn more about the
similarity options you can choose from, see
About the Similarity Functions.
Each returned document includes the score as metadata. To return each
document's score along with the result set, use a
$project stage in your aggregation pipeline and configure
the score as a field to project. In the score field, specify a
$meta expression
with the value vectorSearchScore. The syntax is as follows:
1 db.<collection>.aggregate([ 2 { 3 "$vectorSearch": { 4 <query-syntax> 5 } 6 }, 7 { 8 "$project": { 9 "<field-to-include>": 1, 10 "<field-to-exclude>": 0, 11 "score": { "$meta": "vectorSearchScore" } 12 } 13 } 14 ])
Note
You can use vectorSearchScore as a score $meta expression only after the
$vectorSearch pipeline stage. If you use
vectorSearchScore after any other query, MongoDB logs a warning
starting in MongoDB v8.2.
Note
Pre-filtering your data doesn't affect the score that MongoDB Vector Search returns
using vectorSearchScore for $vectorSearch queries.
MongoDB Vector Search Pre-Filtering
The $vectorSearch filter option matches BSON
boolean, date, objectId, numeric, string, and UUID values, including arrays of these types.
You must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison.
MongoDB Vector Search supports the $vectorSearch filter option for
the following MQL operators:
Note
The $vectorSearch filter option doesn't support
other query operators,
aggregation pipeline operators, or MongoDB Search operators.
Filtering Considerations
MongoDB Vector Search supports the short form of
$eq. In the short form, you don't need to specify$eqin the query.For example, consider the following filter with
$eq:"filter": { "_id": { "$eq": ObjectId("5a9427648b0beebeb69537a5") } This is equivalent to the following filter, which uses the short form of
$eq:"filter": { "_id": ObjectId("5a9427648b0beebeb69537a5") } You can use the
$andMQL operator to specify an array of filters in a single query.For example, consider the following pre-filter for documents with a
genresfield equal toActionand ayearfield with the value1999,2000, or2001:"filter": { "$and": [ { "genres": "Action" }, { "year": { "$in": [ 1999, 2000, 2001 ] } } ] } For advanced filtering capabilities such as fuzzy search, phrase matching, location filtering, and other analyzed text, use the vectorSearch operator in a
$searchstage.
Parallel Query Execution Across Segments
We recommend dedicated Search Nodes to isolate vector search query processing. You might see improved query performance on the dedicated Search Nodes. Note that the high-CPU systems might provide more performance improvement. When MongoDB Vector Search runs on search nodes, MongoDB Vector Search parallelizes query execution across segments of data.
Parallelization of query processing improves the response time in many cases, such as queries on large datasets. Using intra-query parallelism during MongoDB Vector Search query processing utilizes more resources, but improves latency for each individual query.
Note
MongoDB Vector Search doesn't guarantee that each query will run concurrently. For example, when too many concurrent queries are queued, MongoDB Vector Search might fall back to single-threaded execution.
You might see inconsistent results for the same successive queries.
To mitigate this, increase the value of numCandidates in your
$vectorSearch queries.
Examples
The following queries search the sample
sample_mflix.embedded_movies
collection using the $vectorSearch stage. The queries
search the plot_embedding_voyage_3_large field, which contains
embeddings created using the voyage-3-large embedding model from
Voyage AI.
Prerequisites
Before you run these examples, perform the following actions:
Add the sample collection to your Atlas cluster.
Create MongoDB Vector Search indexes for the collection. For instructions, see the Create a MongoDB Vector Search Index procedure and copy the configurations for the basic or filter examples in your desired language.
Note
If you use mongosh, pasting the queryVector from the sample code
into your terminal might take a while depending on your machine.
Basic ANN Example
The following query uses the $vectorSearch stage to search
the plot_embedding_voyage_3_large field using vector embeddings
for the string time travel. It considers up to 150 nearest
neighbors, and returns 10 documents in the results. The query also
specifies a $project stage to do the following:
Exclude the
_idfield and include only theplotandtitlefields in the results.Add a field named
scorethat shows the vector search score for each document in the results.
Filtered ANN Example
The following query filters the documents for movies released between
January 01, 1955 and January 01, 1975 before performing the
semantic search against the sample vector data. It uses the
$and operator to perform a logical AND operation of the
specified dates. It then searches the
plot_embedding_voyage_3_large field in the filtered documents for
150 nearest neighbors using the vector embeddings for the string
kids adventure, and returns 10 documents in the results. The
query also specifies a $project stage to do the following:
Exclude the
_idfield and include onlyplot,title, andyearfields in the results.Add a field named
scorethat shows the vector search score of the documents in the results.
MongoDB Vector Search filters the documents based on the year field value that
ranges between 1955 and 1975. It returns documents that summarize
children's adventures in the plot for movies released between 1955 and
1975.
Tip
Additional Filter Examples
The How to Perform Semantic Search Against Data in Your Atlas Cluster tutorial demonstrates other
pre-filters in semantic search queries against the embedded data in
the sample_mflix.embedded_movies collection.
ENN Example
The following query uses the $vectorSearch stage to search
the plot_embedding_voyage_3_large field using vector embeddings
for the string world war. It requests exact matches and limits the
results to 10 documents only. The query also specifies a
$project stage to do the following:
Exclude the
_idfield and include only theplot,title, andyearfields in the results.Add a field named
scorethat shows the vector search score of the documents in the results.