Rate limits

We enforce rate limits to ensure fair usage and stable performance for all users. Rate limits vary by subscription plan and are applied per account, shared across all API keys.

Rate limits by product

Rate limits and concurrency limits depend on your subscription plan. Please see the Pricing Page for detailed limits by tier:

TTS limits — Concurrent generations, WebSocket connections, voice design and cloning limits
STT limits — Streaming concurrency
Realtime API limits — Concurrent sessions
LLM Router limits — Concurrent generations, requests per second

Higher rate limits are available on higher-tier subscription plans — see the Pricing Page to compare tiers. If your use case requires limits beyond what any standard plan offers, please reach out to our team to discuss enterprise options.

Rate limits apply per account and are shared across your API keys.

Handling rate-limited requests

When you exceed your rate limit, the API returns an HTTP 429 Too Many Requests response. Your request is not processed — you need to wait and retry. Retrying immediately or in a tight loop will not help and can make the situation worse. If many clients retry at the same time (a “thundering herd”), they collectively sustain the overload and keep getting rejected. The standard solution is exponential backoff with jitter.

Exponential backoff with jitter

Exponential backoff increases the delay between retries: the first retry waits 1 second, the second waits 2 seconds, the third waits 4 seconds, and so on. Adding random jitter spreads out retries across clients so they don’t all hit the API at the same instant. The formula for each retry delay:

delay = min(base_delay × 2^attempt, max_delay) + random(0, jitter)

import time
import random
import requests

def request_with_backoff(method, url, max_retries=5, **kwargs):
    base_delay = 1
    max_delay = 30
    jitter = 1

    for attempt in range(max_retries + 1):
        response = requests.request(method, url, **kwargs)

        if response.status_code != 429:
            return response

        if attempt == max_retries:
            response.raise_for_status()

        delay = min(base_delay * (2 ** attempt), max_delay)
        delay += random.uniform(0, jitter)
        print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
        time.sleep(delay)

async function requestWithBackoff(url, options = {}, maxRetries = 5) {
  const baseDelay = 1000;
  const maxDelay = 30000;
  const jitter = 1000;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) {
      return response;
    }

    if (attempt === maxRetries) {
      throw new Error(`Rate limited after ${maxRetries} retries`);
    }

    const delay = Math.min(baseDelay * 2 ** attempt, maxDelay)
      + Math.random() * jitter;
    console.log(`Rate limited. Retrying in ${(delay / 1000).toFixed(1)}s (attempt ${attempt + 1}/${maxRetries})`);
    await new Promise((resolve) => setTimeout(resolve, delay));
  }
}

Best practices

Set a maximum retry count. Don’t retry forever — 5 retries with exponential backoff covers over 30 seconds of wait time. If the request still fails, surface the error to the caller.
Always add jitter. Without jitter, clients that hit the limit together will retry together, perpetuating the overload.
Log retry attempts. Include the attempt number and delay in your logs so you can identify rate-limiting patterns and adjust your request volume.
Reduce concurrent requests. If you’re consistently hitting limits, throttle your request rate or use a queue rather than relying on retries alone.

Overview

Text-to-Speech

Voices

Speech-to-Text

Realtime API

LLM

Router

Moderation

Models

Embeddings

Rate limits by product

Handling rate-limited requests

Exponential backoff with jitter

Best practices

​Rate limits by product

​Handling rate-limited requests

​Exponential backoff with jitter

​Best practices

Rate limits by product

Handling rate-limited requests

Exponential backoff with jitter

Best practices