Skip to main content
Concurrency limits define how many operations can run simultaneously for your account. There are two distinct types of concurrency to understand: concurrent generations and concurrent connections.

Concurrent Generations vs. Concurrent Connections

Concurrent GenerationsConcurrent Connections
DefinitionThe number of generation contexts actively being processed at the same timeThe number of simultaneous WebSocket connections to the proxy per account
Typical limitSet per accountUp to 10x the number of concurrent generations
These two concepts are often confused but serve different purposes.

Concurrent Generations

Concurrent generations refer to the number of AI generation contexts that are actively being processed at the same moment. A generation context is only active during the brief overlap window when the system is computing a response — not during the full duration of a conversation. In a typical voice conversation, the agent and user each speak roughly half the time. Since generation is much faster than playback, the active generation window is a small fraction of the overall conversation — only these overlapping generation windows count toward the concurrent generation limit.

Concurrent Connections

Concurrent connections refer to the number of WebSocket connections your account can maintain to the proxy simultaneously. This limit is separate from, and typically much higher than, the concurrent generation limit — usually up to 10x the number of concurrent generations allowed.

How Many Concurrent Generations Do You Actually Need?

Most applications need far fewer concurrent generations than expected. Because speech is generated much faster than it is spoken, a single generation slot can serve multiple users simultaneously. The diagram below shows three simultaneous conversations and how their generation windows interact. Most generation windows are brief and staggered — but when two overlap, that is when the concurrent generation count increases. The Slots in Use row tracks this in real time. The red 2 slots counted periods are the moments where concurrent generation limits apply — when two generation windows overlap. Outside those windows, a single slot handles all three conversations. We typically find that the number of simultaneous conversations you can handle is at least 4x your concurrent generation limit. Exact numbers will vary depending on your use case, conversation patterns, and response lengths. If you do reach your concurrency ceiling, contact us and we will work with you to accommodate your needs.