For AI agents: a documentation index is available at https://www.mongodb.com/docs/llms.txt — markdown versions of all pages are available by appending .md to any URL path.
Make the MongoDB docs better! We value your opinion. Share your feedback for a chance to win $100.
MongoDB Branding Shape
Click here >
Docs Menu

Text Embeddings

Voyage AI's text embedding models convert your text into high-dimensional vectors that capture semantic meaning. The models are inherently multilingual, meaning semantic similarity of texts is irrespective of language. Use the following models to power your AI search applications with state-of-the-art retrieval accuracy.

Voyage AI provides the following text embedding models:

General Purpose Models
Model
Context Length
Dimensions
Description

voyage-4-large

32,000 tokens

1024 (default), 256, 512, 2048

The best general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other.

To learn more, see the blog post.

voyage-4

32,000 tokens

1024 (default), 256, 512, 2048

Optimized for general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other.

To learn more, see the blog post.

voyage-4-lite

32,000 tokens

1024 (default), 256, 512, 2048

Optimized for latency and cost. All embeddings created with the 4 series are compatible with each other.

To learn more, see the blog post.

Domain-Specific Models
Model
Context Length
Dimensions
Description

voyage-code-3

32,000 tokens

1024 (default), 256, 512, 2048

Optimized for code retrieval and documentation.

To learn more, see the blog post.

voyage-finance-2

32,000 tokens

1024

Optimized for finance retrieval and RAG applications.

To learn more, see the blog post.

voyage-law-2

16,000 tokens

1024

Optimized for legal retrieval and RAG applications.

To learn more, see the blog post.

Open Models
Model
Context Length
Dimensions
Description

voyage-4-nano

32,000 tokens

512 (default), 128, 256

Open-weight model available on Hugging Face. All embeddings created with the 4 series are compatible with eachother

To learn more, see the blog post.

The following older models are still accessible from our API, but we recommend using the new models above for better quality and efficiency.

Our latest models perform better than the legacy models in all aspects, such as quality, context length, latency, and throughput.

Model
Context Length
Dimensions
Description

voyage-3-large

32,000 tokens

1024 (default), 256, 512, 2048

Previous generation of text embeddings for general-purpose and multilingual retrieval quality.

To learn more, see the blog post.

voyage-3.5

32,000 tokens

1024 (default), 256, 512, 2048

Previous generation of text embeddings optimized for general-purpose and multilingual retrieval quality.

To learn more, see the blog post.

voyage-3.5-lite

32,000 tokens

1024 (default), 256, 512, 2048

Previous generation of text embeddings optimized for latency and cost.

To learn more, see the blog post.

voyage-code-2

16,000 tokens

1536

Optimized for code retrieval (17% better than alternatives). Previous generation of code embeddings.

To learn more, see the blog post.

For tutorials on using text embeddings, see the following resources: