Voyage AI's text embedding models convert your text into high-dimensional vectors that capture semantic meaning. The models are inherently multilingual, meaning semantic similarity of texts is irrespective of language. Use the following models to power your AI search applications with state-of-the-art retrieval accuracy.
Available Models
Voyage AI provides the following text embedding models:
Model | Context Length | Dimensions | Description |
|---|---|---|---|
| 32,000 tokens | 1024 (default), 256, 512, 2048 | The best general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other. To learn more, see the blog post. |
| 32,000 tokens | 1024 (default), 256, 512, 2048 | Optimized for general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other. To learn more, see the blog post. |
| 32,000 tokens | 1024 (default), 256, 512, 2048 | Optimized for latency and cost. All embeddings created with the 4 series are compatible with each other. To learn more, see the blog post. |
Model | Context Length | Dimensions | Description |
|---|---|---|---|
| 32,000 tokens | 1024 (default), 256, 512, 2048 | Optimized for code retrieval and documentation. To learn more, see the blog post. |
| 32,000 tokens | 1024 | Optimized for finance retrieval and RAG applications. To learn more, see the blog post. |
| 16,000 tokens | 1024 | Optimized for legal retrieval and RAG applications. To learn more, see the blog post. |
Model | Context Length | Dimensions | Description |
|---|---|---|---|
| 32,000 tokens | 512 (default), 128, 256 | Open-weight model available on Hugging Face. All embeddings created with the 4 series are compatible with eachother To learn more, see the blog post. |
Our latest models perform better than the legacy models in all aspects, such as quality, context length, latency, and throughput.
Model | Context Length | Dimensions | Description |
|---|---|---|---|
| 32,000 tokens | 1024 (default), 256, 512, 2048 | Previous generation of text embeddings for general-purpose and multilingual retrieval quality. To learn more, see the blog post. |
| 32,000 tokens | 1024 (default), 256, 512, 2048 | Previous generation of text embeddings optimized for general-purpose and multilingual retrieval quality. To learn more, see the blog post. |
| 32,000 tokens | 1024 (default), 256, 512, 2048 | Previous generation of text embeddings optimized for latency and cost. To learn more, see the blog post. |
| 16,000 tokens | 1536 | Optimized for code retrieval (17% better than alternatives). Previous generation of code embeddings. To learn more, see the blog post. |
Tutorials
For tutorials on using text embeddings, see the following resources: