Docs Menu
Docs Home
/ /

Run Vector Search Queries

An MongoDB Vector Search query takes the form of an aggregation pipeline that uses $vectorSearch as the first stage. This guide explains the syntax, options, and behavior of the $vectorSearch stage.

$vectorSearch

The $vectorSearch stage performs a semantic search for a query vector on the specified field or fields. The field or fields must be indexed as the MongoDB Vector Search vector type inside a vectorSearch type index.

Note

MongoDB Vector Search supports ANN search on clusters running MongoDB v6.0.11, v7.0.2, or later and ENN search on clusters running MongoDB v6.0.16, v7.0.10, v7.3.2, or later.

You can run $vectorSearch queries by using the Atlas UI, mongosh, and any MongoDB driver.

You can also use MongoDB Vector Search with local Atlas deployments that you create with the Atlas CLI. To learn more, see Create a Local Atlas Deployment.

$vectorSearch is supported only on Atlas clusters running the following MongoDB versions:

  • v6.0.11

  • v7.0.2 and later (including RCs).

The field that you want to search must be indexed as MongoDB Vector Search vector type inside a vectorSearch index type.

A $vectorSearch pipeline has the following prototype form:

{
"$vectorSearch": {
"exact": true | false,
"filter": {<filter-specification>},
"index": "<index-name>",
"limit": <number-of-results>,
"numCandidates": <number-of-candidates>,
"path": "<field-to-search>",
"queryVector": [<array-of-numbers>],
"explainOptions": {
"traceDocumentIds": [<array-of-documentIDs>]
}
}
}

The $vectorSearch stage takes a document with the following fields:

Field
Type
Necessity
Description

exact

boolean

Optional

This is required if numCandidates is omitted.

Flag that specifies whether to run ENN or ANN search. Value can be one of the following:

  • false - to run ANN search

  • true - to run ENN search

If omitted, defaults to false.

MongoDB Vector Search supports ANN search on clusters running MongoDB v6.0.11, v7.0.2, or later and ENN search on clusters running MongoDB v6.0.16, v7.0.10, v7.3.2, or later.

To learn more about these search types, see Vector Search Types.

filter

document

Optional

MQL expression that compares an indexed field to use as a pre-filter. You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types.

To learn which MQL operators MongoDB Vector Search supports in your filter, see MongoDB Vector Search Pre-Filtering.

index

string

Required

Name of the MongoDB Vector Search index to use.

MongoDB Vector Search doesn't return results if you misspell the index name or if the specified index doesn't already exist on the cluster.

limit

number

Required

Number (of type int only) of documents to return in the results. This value can't exceed the value of numCandidates if you specify numCandidates.

numCandidates

number

Conditional

This field is required if exact is false or omitted.

Number of nearest neighbors to use during the search. Value must be less than or equal to (<=) 10000. You can't specify a number less than the number of documents to return (limit).

We recommend that you specify a number at least 20 times higher than the number of documents to return (limit) to increase accuracy.

This overrequest pattern is the recommended way to trade off latency and recall in your ANN searches, and we recommend tuning this parameter based on your specific dataset size and query requirements.

To learn more about other variables that might impact this parameter, see numCandidates Selection.

path

string

Required

Indexed vector type field to search.

queryVector

array of numbers

Required

Array of numbers of float32, BSON BinData vectors with subtype float32, or BSON BinData vectors with subtype int1 or int8 type that represent the query vector.

To learn more about generating BSON binData vectors with subtype float32, int8, int1, see How to Ingest Pre-Quantized Vectors.

The array size must match the number of vector dimensions specified in the index definition for the field.

You must embed your query with the same model that you used to embed the data.

You can query your embeddings with full-fidelity vectors, as long as the vector subtype is the same. This is only possible with binData vectors with subtype float32. If you use any other subtype (int8 or int1), MongoDB Vector Search doesn't return any results or errors.

explainOptions

document

Optional

Trace a list of vectors (identified by theirs _id) in an explain executionStats query. You can't use this option without explain. To learn more, see Explain MongoDB Vector Search Results.

explainOptions.
traceDocumentIds

array of document IDs

Required

List of document _ids.

When you define a $vectorSearch stage, you can use the exact field to specify whether to run an ANN or ENN search.

For Approximate Nearest Neighbors (ANN) search, MongoDB Vector Search finds vector embeddings in your data that are closest to the vector embedding in your query based on their proximity in multi-dimensional space and based on the number of neighbors that it considers. It uses the Hierarchical Navigable Small Worlds algorithm and finds the vector embeddings most similar to the vector embedding in your query without scanning every vector. Therefore, ANN search is ideal for querying large datasets without significant filtering.

Note

Optimal recall for ANN search is typically considered to be around 90-95% overlap in results with ENN search but with significantly lower latency. This provides a good balance between accuracy and performance. To achieve this with MongoDB Vector Search, tune the numCandidates parameter at query time.

You must specify the numCandidates field to run ANN search. This field determines how many nearest neighbors MongoDB Vector Search considers during the search.

We recommend that you specify a numCandidates number at least 20 times higher than the number of documents to return (limit) to increase accuracy and reduce discrepancies between your ENN and ANN query results. For example, if you set limit to return 5 results, consider setting numCandidates to 100 as a starting point. To learn more, see How to Measure the Accuracy of Your Query Results.

This overrequest pattern is the recommended way to trade off latency and recall in your ANN searches. However, we recommend tuning the numCandidates parameter based on your specific dataset size and query requirements. To ensure that you get accurate results, consider the following variables:

  • Index Size: Larger collections typically require higher numCandidates values to maintain recall. A collection with millions of vectors might need significantly more candidates than one with thousands of vectors.

  • Limit Value: Because numCandidates is highly correlated with the index size, low limit values require proportionally higher numCandidates values to maintain recall.

  • Vector Quantization: Quantized vectors reduce storage at the cost of accuracy. Using quantized vectors (int8 or int1 subtypes) might require higher numCandidates values compared to full precision float32 vectors to maintain similar recall.

For an Exact Nearest Neighbors (ENN) search, MongoDB Vector Search exhaustively searches all the indexed vector embeddings by calculating the distance between all the embeddings and finds the exact nearest neighbor for the vector embedding in your query. This is computationally intensive and might negatively impact query latency. Therefore, we recommend ENN searches for the following use-cases:

  • You want to determine the recall and accuracy of your ANN query using the ideal, exact results for the ENN query.

  • You want to query less than 10000 documents without having to tune the number of nearest neighbors to consider.

  • Your want to include selective pre-filters in your query against collections where less than 5% of your data meets the given pre-filter.

If you enable automatic quantization, MongoDB Vector Search uses only the full-fidelity vectors for ENN queries.

  • $vectorSearch must be the first stage of any pipeline where it appears.

$vectorSearch can't be used in view definition and the following pipeline stages:

[1] You can pass the results of $vectorSearch to this stage.

You must index the fields to search using the $vectorSearch stage inside a vectorSearch type index definition. You can index the following types of fields in a MongoDB Vector Search vectorSearch type index definition:

  • Fields that contain vector embeddings as vector type.

  • Additional fields as the filter type to enable vector search on pre-filtered data.

To learn more about these MongoDB Vector Search field types, see How to Index Fields for Vector Search.

MongoDB Vector Search assigns a score, in a fixed range from 0 to 1 (where 0 indicates low similarity and 1 indicates high similarity), to every document that it returns.

The score is calculated according to the similarity function that you specify in the MongoDB Vector Search index definition. To learn more about the similarity options you can choose from, see About the Similarity Functions.

Each returned document includes the score as metadata. To return each document's score along with the result set, use a $project stage in your aggregation pipeline and configure the score as a field to project. In the score field, specify a $meta expression with the value vectorSearchScore. The syntax is as follows:

1db.<collection>.aggregate([
2 {
3 "$vectorSearch": {
4 <query-syntax>
5 }
6 },
7 {
8 "$project": {
9 "<field-to-include>": 1,
10 "<field-to-exclude>": 0,
11 "score": { "$meta": "vectorSearchScore" }
12 }
13 }
14])

Note

You can use vectorSearchScore as a score $meta expression only after the $vectorSearch pipeline stage. If you use vectorSearchScore after any other query, MongoDB logs a warning starting in MongoDB v8.2.

Note

Pre-filtering your data doesn't affect the score that MongoDB Vector Search returns using vectorSearchScore for $vectorSearch queries.

The $vectorSearch filter option matches BSON boolean, date, objectId, numeric, string, and UUID values, including arrays of these types.

You must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison.

MongoDB Vector Search supports the $vectorSearch filter option for the following MQL operators:

Type
MQL operator

Equals

Range

In set

Logical

  • MongoDB Vector Search supports the short form of $eq. In the short form, you don't need to specify $eq in the query.

    For example, consider the following filter with $eq:

    "filter": { "_id": { "$eq": ObjectId("5a9427648b0beebeb69537a5") }

    This is equivalent to the following filter, which uses the short form of $eq:

    "filter": { "_id": ObjectId("5a9427648b0beebeb69537a5") }
  • You can use the $and MQL operator to specify an array of filters in a single query.

    For example, consider the following pre-filter for documents with a genres field equal to Action and a year field with the value 1999, 2000, or 2001:

    "filter": {
    "$and": [
    { "genres": "Action" },
    { "year": { "$in": [ 1999, 2000, 2001 ] } }
    ]
    }
  • For advanced filtering capabilities such as fuzzy search, phrase matching, location filtering, and other analyzed text, use the vectorSearch operator in a $search stage.

We recommend dedicated Search Nodes to isolate vector search query processing. You might see improved query performance on the dedicated Search Nodes. Note that the high-CPU systems might provide more performance improvement. When MongoDB Vector Search runs on search nodes, MongoDB Vector Search parallelizes query execution across segments of data.

Parallelization of query processing improves the response time in many cases, such as queries on large datasets. Using intra-query parallelism during MongoDB Vector Search query processing utilizes more resources, but improves latency for each individual query.

Note

MongoDB Vector Search doesn't guarantee that each query will run concurrently. For example, when too many concurrent queries are queued, MongoDB Vector Search might fall back to single-threaded execution.

You might see inconsistent results for the same successive queries. To mitigate this, increase the value of numCandidates in your $vectorSearch queries.

The following queries search the sample sample_mflix.embedded_movies collection using the $vectorSearch stage. The queries search the plot_embedding_voyage_3_large field, which contains embeddings created using the voyage-3-large embedding model from Voyage AI.

Before you run these examples, perform the following actions:

Note

If you use mongosh, pasting the queryVector from the sample code into your terminal might take a while depending on your machine.

The following query uses the $vectorSearch stage to search the plot_embedding_voyage_3_large field using vector embeddings for the string time travel. It considers up to 150 nearest neighbors, and returns 10 documents in the results. The query also specifies a $project stage to do the following:

  • Exclude the _id field and include only the plot and title fields in the results.

  • Add a field named score that shows the vector search score for each document in the results.

The following query filters the documents for movies released between January 01, 1955 and January 01, 1975 before performing the semantic search against the sample vector data. It uses the $and operator to perform a logical AND operation of the specified dates. It then searches the plot_embedding_voyage_3_large field in the filtered documents for 150 nearest neighbors using the vector embeddings for the string kids adventure, and returns 10 documents in the results. The query also specifies a $project stage to do the following:

  • Exclude the _id field and include only plot, title, and year fields in the results.

  • Add a field named score that shows the vector search score of the documents in the results.

MongoDB Vector Search filters the documents based on the year field value that ranges between 1955 and 1975. It returns documents that summarize children's adventures in the plot for movies released between 1955 and 1975.

Tip

Additional Filter Examples

The How to Perform Semantic Search Against Data in Your Atlas Cluster tutorial demonstrates other pre-filters in semantic search queries against the embedded data in the sample_mflix.embedded_movies collection.

The following query uses the $vectorSearch stage to search the plot_embedding_voyage_3_large field using vector embeddings for the string world war. It requests exact matches and limits the results to 10 documents only. The query also specifies a $project stage to do the following:

  • Exclude the _id field and include only the plot, title, and year fields in the results.

  • Add a field named score that shows the vector search score of the documents in the results.

Back

Index Reference

Earn a Skill Badge

Master "Vector Search Fundamentals" for free!

Learn more

On this page