Text Embeddings

Voyage AI's text embedding models convert your text into high-dimensional vectors that capture semantic meaning. The models are inherently multilingual, meaning semantic similarity of texts is irrespective of language. Use the following models to power your AI search applications with state-of-the-art retrieval accuracy.

Available Models

Voyage AI provides the following text embedding models:

General Purpose Models

Model	Context Length	Dimensions	Description
`voyage-4-large`	32,000 tokens	1024 (default), 256, 512, 2048	The best general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other. To learn more, see the blog post.
`voyage-4`	32,000 tokens	1024 (default), 256, 512, 2048	Optimized for general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other. To learn more, see the blog post.
`voyage-4-lite`	32,000 tokens	1024 (default), 256, 512, 2048	Optimized for latency and cost. All embeddings created with the 4 series are compatible with each other. To learn more, see the blog post.

Domain-Specific Models

Model	Context Length	Dimensions	Description
`voyage-code-3`	32,000 tokens	1024 (default), 256, 512, 2048	Optimized for code retrieval and documentation. To learn more, see the blog post.
`voyage-finance-2`	32,000 tokens	1024	Optimized for finance retrieval and RAG applications. To learn more, see the blog post.
`voyage-law-2`	16,000 tokens	1024	Optimized for legal retrieval and RAG applications. To learn more, see the blog post.

Open Models

Model	Context Length	Dimensions	Description
`voyage-4-nano`	32,000 tokens	512 (default), 128, 256	Open-weight model available on Hugging Face. All embeddings created with the 4 series are compatible with eachother To learn more, see the blog post.

Older Models

The following older models are still accessible from our API, but we recommend using the new models above for better quality and efficiency.

Our latest models perform better than the legacy models in all aspects, such as quality, context length, latency, and throughput.

Model	Context Length	Dimensions	Description
`voyage-3-large`	32,000 tokens	1024 (default), 256, 512, 2048	Previous generation of text embeddings for general-purpose and multilingual retrieval quality. To learn more, see the blog post.
`voyage-3.5`	32,000 tokens	1024 (default), 256, 512, 2048	Previous generation of text embeddings optimized for general-purpose and multilingual retrieval quality. To learn more, see the blog post.
`voyage-3.5-lite`	32,000 tokens	1024 (default), 256, 512, 2048	Previous generation of text embeddings optimized for latency and cost. To learn more, see the blog post.
`voyage-code-2`	16,000 tokens	1536	Optimized for code retrieval (17% better than alternatives). Previous generation of code embeddings. To learn more, see the blog post.

Tutorials

For tutorials on using text embeddings, see the following resources:

Usage

Language

Contextualized Chunk Embeddings