Rate limits by product
Rate limits and concurrency limits depend on your subscription plan. Please see the Pricing Page for detailed limits by tier:- TTS limits — Concurrent generations, WebSocket connections, voice design and cloning limits
- STT limits — Streaming concurrency
- Realtime API limits — Concurrent sessions
- LLM Router limits — Concurrent generations, requests per second
Rate limits apply per account and are shared across your API keys.
Handling rate-limited requests
When you exceed your rate limit, the API returns an HTTP429 Too Many Requests response. Your request is not processed — you need to wait and retry.
Retrying immediately or in a tight loop will not help and can make the situation worse. If many clients retry at the same time (a “thundering herd”), they collectively sustain the overload and keep getting rejected. The standard solution is exponential backoff with jitter.
Exponential backoff with jitter
Exponential backoff increases the delay between retries: the first retry waits 1 second, the second waits 2 seconds, the third waits 4 seconds, and so on. Adding random jitter spreads out retries across clients so they don’t all hit the API at the same instant. The formula for each retry delay:Best practices
- Set a maximum retry count. Don’t retry forever — 5 retries with exponential backoff covers over 30 seconds of wait time. If the request still fails, surface the error to the caller.
- Always add jitter. Without jitter, clients that hit the limit together will retry together, perpetuating the overload.
- Log retry attempts. Include the attempt number and delay in your logs so you can identify rate-limiting patterns and adjust your request volume.
- Reduce concurrent requests. If you’re consistently hitting limits, throttle your request rate or use a queue rather than relying on retries alone.