Voyage AI embedding models support flexible dimensions and quantization to help you optimize storage and search costs for your vector-based applications. This page explains how to use these features to reduce costs while maintaining high retrieval quality.
Learn about flexible dimensions and quantization through an interactive tutorial in Google Colab.
Overview
When working with large-scale vector search applications, such as code retrieval across massive repositories, storage and computational costs can be significant. These costs scale linearly with the following factors:
Embedding dimensionality: The number of dimensions in each vector
Precision: The number of bits used to encode each number in the vector
By reducing either or both of these factors, you can dramatically lower costs without significantly impacting retrieval quality. Voyage AI models support two complementary techniques to achieve this:
Matryoshka embeddings: Allows you to use smaller versions of your embeddings by truncating to fewer dimensions
Quantization: Reduces the precision of each number in your embeddings from 32-bit floats to lower-precision formats
These techniques are enabled through Matryoshka learning and quantization-aware training, which train the models to maintain quality even with reduced dimensions or quantized values.
Matryoshka Embeddings
Matryoshka embeddings are a special type of vector embedding that contains multiple valid embeddings of different sizes nested within a single vector. This gives you the flexibility to choose the dimensionality that best balances your performance and cost requirements.
The latest Voyage embedding models generate Matryoshka
embeddings and support multiple output dimensions directly through the
output_dimension parameter. To learn more, see
Models Overview.
How Matryoshka Embeddings Work
With Matryoshka learning, a single embedding contains a nested family of embeddings at various lengths. For example, a 2048-dimensional Voyage embedding contains valid embeddings at multiple shorter lengths:
The first 256 dimensions form a valid 256-dimensional embedding
The first 512 dimensions form a valid 512-dimensional embedding
The first 1024 dimensions form a valid 1024-dimensional embedding
All 2048 dimensions form the full-fidelity embedding
Each shorter version provides slightly lower retrieval quality than the full embedding, but requires less storage and computational resources.
How to Truncate Matryoshka Embeddings
Truncate Matryoshka embeddings by keeping the leading subset of dimensions. The following example demonstrates how to truncate 1024-dimensional vectors to 256 dimensions:
import voyageai import numpy as np def embd_normalize(v: np.ndarray) -> np.ndarray: # Normalize rows of a 2D array to unit vectors row_norms = np.linalg.norm(v, axis=1, keepdims=True) if np.any(row_norms == 0): raise ValueError("Cannot normalize rows with a norm of zero.") return v / row_norms vo = voyageai.Client() # Generate 1024-dimensional embeddings embd = vo.embed(['Sample text 1', 'Sample text 2'], model='voyage-4-large').embeddings # Truncate to 256 dimensions and normalize short_dim = 256 resized_embd = embd_normalize(np.array(embd)[:, :short_dim]).tolist()
Quantization
Quantization reduces the precision of embeddings by converting high-precision floating-point numbers into lower-precision formats. This process can dramatically reduce storage and computational costs while maintaining strong retrieval quality.
The latest Voyage embedding models are trained using quantization-aware training, which means they maintain high retrieval quality even when quantized. To learn more, see Models Overview.
Note
Many databases that support vector storage and retrieval also support quantized embeddings, including MongoDB. To learn more about quantization in MongoDB Vector Search, see Vector Quantization.
How Quantization Works
Quantization reduces the precision of embeddings by representing each dimension with fewer bits than the standard 32-bit floating-point format. Instead of using 4 bytes per dimension, quantized embeddings use:
8-bit integers (1 byte per dimension): Reduces storage by 4x
Binary (1 bit per dimension): Reduces storage by 32x
Despite this dramatic reduction in size, quantization-aware trained models like
Voyage's maintain high retrieval quality. Supported Voyage models enable quantization
by specifying the output data type with the output_dtype parameter:
Data Type | Description |
|---|---|
| Each returned embedding is a list of 32-bit (4-byte) single-precision floating-point numbers. This is the default and provides the highest precision and retrieval accuracy. |
| Each returned embedding is a list of 8-bit (1-byte) integers ranging from -128 to 127 and 0 to 255, respectively. |
| Each returned embedding is a list of 8-bit integers that represent
bit-packed, quantized single-bit embedding values: |
Example
Understanding binary quantization
Consider the following embedding values:
-0.0396, 0.0062, -0.0745, -0.0390, 0.0046, 0.0003, -0.0850, 0.0399 Binary quantization converts each value to a single bit, using the following rules:
Values less than
0are converted to0Values greater than or equal to
0are converted to1
0, 1, 0, 0, 1, 1, 0, 1 The eight bits pack into one 8-bit integer:
01001101. This integer converts to77in decimal.To convert to the final output type, apply the following conversions:
Output TypeConversion MethodResultubinaryuint8: Use the value directly as unsigned integer.77binaryint8: Apply the offset binary method by subtracting128.-51(which equals77 - 128)
Offset Binary
Offset binary is a method
for representing signed integers in binary form. Voyage AI uses this method
for the binary output type to encode bit-packed binary embeddings as
signed integers (int8).
The offset binary method works by adding or subtracting an offset value:
When converting to binary: Add
128to the signed integer before encodingWhen converting from binary: Subtract
128from the integer after decoding
For 8-bit signed integers (range -128 to 127), the offset is always 128.
Example
Signed integer to binary
To represent -32 as an 8-bit binary number:
Add the offset (
128) to-32, resulting in96.Convert
96to binary:01100000.
Example
Binary to signed integer
To determine the signed integer from the 8-bit binary number 01010101:
Convert it directly to an integer:
85.Subtract the offset (
128) from85, resulting in-43.
How to Use Quantization with Voyage AI
You can convert float embeddings to binary format manually or unpack binary embeddings back to individual bits. The following examples demonstrate both operations:
import numpy as np import voyageai vo = voyageai.Client() # Generate float embeddings embd_float = vo.embed('Sample text 1', model='voyage-4-large', output_dimension=2048).embeddings[0] # Compute 512-dimensional bit-packed binary and ubinary embeddings from 2048-dimensional float embeddings embd_binary_calc = (np.packbits(np.array(embd_float) > 0, axis=0) - 128).astype(np.int8).tolist() # Quantize, binary offset embd_binary_512_calc = embd_binary_calc[0:64] # Truncate. Binary is 1/8 length of embedding dimension. embd_ubinary_calc = (np.packbits(np.array(embd_float) > 0, axis=0)).astype(np.uint8).tolist() # Quantize, binary offset embd_ubinary_512_calc = embd_ubinary_calc[0:64] # Truncate. Binary is 1/8 length of embedding dimension.
import numpy as np import voyageai vo = voyageai.Client() # Generate binary embeddings embd_binary = vo.embed('Sample text 1', model='voyage-4-large', output_dtype='binary', output_dimension=2048).embeddings[0] embd_ubinary = vo.embed('Sample text 1', model='voyage-4-large', output_dtype='ubinary', output_dimension=2048).embeddings[0] # Unpack bits embd_binary_bits = [format(x, f'08b') for x in np.array(embd_binary) + 128] # List of (bits) strings embd_binary_unpacked = [bit == '1' for bit in ''.join(embd_binary_bits)] # List of booleans embd_ubinary_bits = [format(x, f'08b') for x in np.array(embd_ubinary)] # List of (bits) strings embd_ubinary_unpacked = [bit == '1' for bit in ''.join(embd_ubinary_bits)] # List of booleans