> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Concurrency Limits

Concurrency limits specify the guaranteed number of TTS generations that can be in progress at the same time for your account. They apply across both HTTP and WebSocket requests. Requests above these limits are best-effort based on available capacity.

## Default Limits

| **Plan**   | **Concurrent generations** |
| :--------- | :------------------------- |
| On-Demand  | 5                          |
| Creator    | 10                         |
| Builder    | 50                         |
| Developer  | 150                        |
| Growth     | 500                        |
| Enterprise | Custom                     |

## How Concurrency Is Counted

The way a generation counts toward your limit depends on the protocol:

* **HTTP requests:** Each in-flight request counts as one concurrent generation. The slot is held for the duration of the request. Exceeding the limit returns a `429` error.
* **WebSocket contexts:** Each active context counts as one concurrent generation. A context is considered active while packets are being exchanged — for example, `send_text` from the client or audio chunks from the server. Once a context goes idle, it stops counting toward your limit. See [Errors](#errors) for details on how limit errors are surfaced within a WebSocket connection.

## Concurrent Connections

Concurrent connections refer to the number of WebSocket connections your account can maintain simultaneously. This is a separate limit from concurrent generations, set at 10× your concurrent generation limit. Exceeding it returns an HTTP `429` error.

## How Many Concurrent Generations Do You Actually Need?

For conversational use cases like voice agents, most applications need far fewer concurrent generations than expected. Because speech is generated much faster than it is spoken, a single generation slot can serve multiple conversations. The diagram below shows three simultaneous conversations and how their generation windows interact. Most generation windows are brief and staggered — only when two overlap does the concurrent generation count increase.

```mermaid theme={"system"}
gantt
    title Concurrent Generation Slots Across 3 Simultaneous Conversations
    dateFormat X
    axisFormat %Ss

    section Conversation 1
    User speaking    :a1, 0, 8
    Generating       :crit, a2, 8, 10
    Agent speaking   :a3, 10, 20
    User speaking    :a4, 20, 27
    Generating       :crit, a5, 27, 29
    Agent speaking   :a6, 29, 38

    section Conversation 2
    Agent speaking   :b1, 0, 12
    User speaking    :b2, 12, 20
    Generating       :crit, b3, 20, 22
    Agent speaking   :b4, 22, 38

    section Conversation 3
    User speaking    :c1, 0, 6
    Generating       :crit, c2, 6, 9
    Agent speaking   :c3, 9, 19
    User speaking    :c4, 19, 26
    Generating       :crit, c5, 26, 29
    Agent speaking   :c6, 29, 38

    section Concurrent Slots Counted
    1 concurrent generation  :done, d1, 6, 8
    2 concurrent generations :crit, d2, 8, 9
    1 concurrent generation  :done, d3, 9, 10
    1 concurrent generation  :done, d4, 20, 22
    1 concurrent generation  :done, d5, 26, 27
    2 concurrent generations :crit, d6, 27, 29
```

The red **2 slots counted** periods are the moments where two generation windows overlap — only these count as 2 concurrent generations. Outside those windows, a single slot handles all three conversations. We typically find that the number of simultaneous conversations you can handle is at least **4x** your concurrent generation limit. Exact numbers will vary depending on your use case, conversation patterns, and response lengths.

## Idle WebSocket Contexts

A WebSocket TTS context only counts toward your limit while it is **active** — that is, while packets are being exchanged (e.g., `send_text` from the client or audio chunks from the server). Closing a context releases its slot immediately once all audio chunks have been returned. If a context is left open, it becomes idle shortly after activity stops and no longer counts toward your limit.

## Errors

When your account exceeds its concurrent generation limit, the affected request is rejected with an error JSON object containing `code: 8`:

```json theme={"system"}
{
  "error": {
    "code": 8,
    "message": "request failed: rpc error: code = ResourceExhausted desc = maximum allowed number of active WebSocket TTS contexts: 5 is reached",
    "details": []
  }
}
```

Your WebSocket connection and other active contexts are unaffected.

* **If you close and reopen contexts per agent turn**, the error occurs on `create_context` — the context is not created. Any messages sent after this `create_context` may have been dropped.
* **If you keep contexts open while idle**, the error can occur on any message that resumes activity (e.g., `send_text`, `flush`). Any messages sent since the first `send_text` after idle may have been dropped.

For retry strategies when hitting rate limits, see [Handling rate-limited requests](/resources/rate-limits#handling-rate-limited-requests).

If you do reach your concurrency ceiling, [contact us](https://inworld.ai/contact-sales) and we will work with you to accommodate your needs.