Docs Menu
Docs Home
/ /

Manage Rate Limits

Rate limits are restrictions on the frequency and number of tokens you can request from Voyage AI within a specified period of time. To learn more about rate limits, see Best Practices.

Atlas enforces rate limits based on the model API key usage (requests per minute (RPM) and tokens per minute (TPM)). If you exceed the number of requests or tokens in the most recent minute, the API denies any subsequent additional request and returns a 429 (Rate Limit Exceeded) HTTP status code.

The following sections describe how to manage rate limits in the Atlas UI.

To set and reset rate limits at the project level, you must have Project Owner access or higher to Atlas.

To view rate limits:

  • At the organization and project levels, you must have Organization Read Only or higher access to Atlas.

  • At only the project level, you must have Project Read Only or higher access to Atlas.

You can set different limits for each project at the project level. Project level rate limits can't exceed the rate limits for the organization. Rate limits set at the project level apply to all model API keys for the project.

1
2
  1. If it's not already displayed, select your desired organization from the Organizations menu in the navigation bar.

  2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. At the project level, click AI Models under the Services header in the navigation bar.

3
  1. From the navigation bar, select Rate Limits.

  2. In the Actions column corresponding to the embeddings model for which you want to modify rate limits, click .

  3. Modify the TPM and RPM values.

    Project-level rate limits for each model can be any value less than or equal to the organization's rate limit.

    Example

    At usage tier 1, rate limits for the voyage-4 embedding model for a project can be set to 2000 RPM and 8,000,000 TPM, or lower.

  4. Click to apply the rate limit.

You can view the rate limits at the organization and project levels.

1
2
  1. If it's not already displayed, select your desired organization from the Organizations menu in the navigation bar.

  2. At the organization level, click Rate Limits under the Services header in the navigation bar.

The page displays the following information:

Name
Description

Model

List of Voyage AI embedding models.

Tokens Per Minute (TPM)

Number of tokens that you can request within a minute from the Embedding and Reranking API endpoints.

Requests Per Min (RPM)

Number of API requests that you can send within a minute to the Embedding and Reranking API endpoints.

1
2
  1. If it's not already displayed, select your desired organization from the Organizations menu in the navigation bar.

  2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. At the project level, click AI Models under the Services header in the navigation bar.

3

The page displays the following information about the rate limits:

Column Name
Column Description

Model

List of Voyage AI embedding models.

Tokens Per Minute (TPM)

Number of tokens that you can request within a minute from the Voyage AI Embedding and Reranking API endpoints.

Requests Per Min (RPM)

Number of requests you can send within a minute to the Voyage AI Embedding and Reranking API endpoints.

Actions

Actions you can take. You can:

  • Reduce the number of tokens and requests per minute for the project.

  • Undo custom number of tokens and requests per minute while setting it.

If you set custom limits, the page also displays Reset all limits button to revert all the custom rate limits on the page to the default for the organization.

You can reset all the custom limits that you set for a project at any time. When you reset the limits, the rate limits for the project revert to the default rate limits for the organization.

1
2
  1. If it's not already displayed, select your desired organization from the Organizations menu in the navigation bar.

  2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. At the project level, click AI Models under the Services header in the navigation bar.

3
  1. From the navigation bar, select Rate Limits.

  2. On the page, click Reset all limits in the top right corner.

Rate limits follow a tiered system, with higher tiers offering increased limits. Qualification for a tier is based on billed usage (excluding free tokens). Atlas offers 200 million free tokens for each model. The multimodal models also include 150 billion free pixels. Once you qualify for a tier, you are never downgraded. As your usage and spending increase, Atlas automatically promotes you to the next usage tier, raising rate limits across all models.

To learn more, see Rate Limits and Usage Tiers.

This section describes the default rate limits for each usage tier that are applied at the organization level. It also describes the rate limits that you can configure for each project.

The following tables show the default rate limits (TPM and RPM) based on usage tier for each Voyage AI embedding model.

Model
Tokens Per Min (TPM)
Requests Per Min (RPM)

voyage-4-lite, voyage-3.5-lite

16,000,000

2,000

voyage-4, voyage-3.5

8,000,000

2,000

voyage-4-large

3,000,000

2,000

voyage-3-large, voyage-context-3, voyage-code-3, voyage-code-2, voyage-law-2, voyage-finance-2

3,000,000

2,000

voyage-multimodal-3.5, voyage-multimodal-3

2,000,000

2,000

rerank-2-lite, rerank-2.5-lite

4,000,000

2,000

rerank-2, rerank-2.5

2,000,000

2,000

The rate limits for Usage Tier 2 are twice those of Usage Tier 1.

Model
Tokens Per Min (TPM)
Requests Per Min (RPM)

voyage-4-lite, voyage-3.5-lite

32,000,000

4,000

voyage-4, voyage-3.5

16,000,000

4,000

voyage-4-large

6,000,000

4,000

voyage-3-large, voyage-context-3, voyage-code-3, voyage-code-2, voyage-law-2, voyage-finance-2

6,000,000

4,000

voyage-multimodal-3.5, voyage-multimodal-3

4,000,000

4,000

rerank-2-lite, rerank-2.5-lite

8,000,000

4,000

rerank-2, rerank-2.5

4,000,000

4,000

The rate limits for Usage Tier 3 are three times those of Usage Tier 1.

Model
Tokens Per Min (TPM)
Requests Per Min (RPM)

voyage-4-lite, voyage-3.5-lite

48,000,000

6,000

voyage-4, voyage-3.5

24,000,000

6,000

voyage-4-large

9,000,000

6,000

voyage-3-large, voyage-context-3, voyage-code-3, voyage-code-2, voyage-law-2, voyage-finance-2

9,000,000

6,000

voyage-multimodal-3.5, voyage-multimodal-3

6,000,000

6,000

rerank-2-lite, rerank-2.5-lite

12,000,000

6,000

rerank-2, rerank-2.5

6,000,000

6,000

By default, projects inherit the rate limits based on the rate limits for the organization. However, you can set different limits for each project at the project level. Project level rate limits can't exceed the rate limits for the organization. Rate limits set at the project level apply to all model API keys for the project. However, if the organization rate limit is reached first, projects might be rate-limited to a lower rate. This can occur when the sum of all project rate limits exceeds the organization limit.

Example

Consider an organization rate limit O with three projects with rate limits P1, P2, and P3. The table below illustrates three scenarios where the sum of the project rate limits is less than, equal to, or greater than the organization rate limit. For each scenario, the table indicates whether the organization limit can be reached and whether one project's usage can impact another.

Scenario 1
P1 + P2 + P3 < O
Scenario 2
P1 + P2 + P3 = O
Scenario 3
P1 + P2 + P3 > O

Scenario Description

Sum of all project rate limits is less than the organization limit.

Sum of all project rate limits is equal to the organization limit.

Sum of all project rate limits is greater than the organization limit.

Can the organization limit be reached?

No, even if all projects reach their rate limits, the organization rate limit will not be exceeded.

Yes, if all projects reach their rate limits, the organization limit will also be reached.

Yes, as the sum of all project rate limits exceeds the organization limit, the organization limit can be reached before individual projects hit their own limits.

Can one project's usage impact another?

No.

No.

Yes. If projects collectively consume enough usage to reach the organization limit before any or all projects reach their individual limits, projects can be rate-limited to a lower rate than their individual limits.

Rate limits ensure a balanced and efficient utilization of the API's resources, preventing excessive traffic that could impact the overall performance and accessibility of the service. Specifically, rate limits serve the following vital purposes:

  • Rate limits promote equitable access to the API for all users. If one individual or organization generates an excessive volume of requests, it could potentially impede the API's performance for others. Through rate limiting, we ensure that a larger number of users can utilize the API without encountering performance issues.

  • Rate limits enable Voyage AI to effectively manage the workload on its infrastructure. Sudden and substantial spikes in API requests could strain server resources and lead to performance degradation. By establishing rate limits, Voyage AI can effectively maintain a consistent and reliable experience for all users.

  • They act as a safeguard against potential abuse or misuse of the API. For instance, malicious actors might attempt to inundate the API with excessive requests to overload it or disrupt its services. By instituting rate limits, Voyage AI can thwart such nefarious activities.

To avoid and manage rate limit errors, we recommend the following best practices.

If you have many documents to embed, you can increase the number of documents you embed per request and increase your overall throughput by sending larger batches. A "batch" is the collection of documents you are embedding in one request, and the "batch size" is the number of documents in the batch, meaning the length of the list of documents.

Example

Suppose you want to vectorize 512 documents. If you used a batch size of 1, then this would require 512 requests and you could hit your RPM limit. However, if you used a batch size of 128, then this would require only 4 requests and you would not hit your RPM limit. You can control the batch size by changing the number of documents you provide in the request, and using larger batch sizes will reduce your overall RPM for a given number of documents.

You must consider the API maximum batch size and tokens when selecting your batch size. You cannot exceed the API max batch size. If you have longer documents, the token limit per request might constrain you to a smaller batch size.

Make requests less frequently. You can do this by pacing your requests, and the most straightforward approach is inserting a wait period between each request.

Backoff once you've hit your rate limit (that is, receive a 429 error). You could wait for an exponentially increased time after receiving a rate limit error before trying again. Wait until the request is successful or until a maximum number of retries is reached.

Example

If your initial wait time was one second and you got three consecutive rate limit errors before success, you would wait one, two, and four seconds after each rate limit error, respectively, before resending the request.

Back

Monitor Usage

On this page