Latency - Inworld AI Documentation

For realtime use cases, minimizing latency is critical. Here are some tips and techniques you can use:

Stream TTS output - Instead of waiting for the entire generation (which may take some time if it is long), you can start playback as soon as the first chunk arrives so that the user doesn’t have to wait. Inworld’s websocket streaming should be the lowest-latency option, but streaming over HTTP will also be superior to a non-streaming setup.
Chunk streaming LLM output into TTS - For the fastest time to first audio, consider breaking streaming LLM output into sentence chunks and sending them one by one to TTS. The Inworld Agent Runtime provides built-in tools to handle this in a performant manner.
Use JWT authentication to stream directly to the client - For applications like mobile apps or browser-based experiences, use JWT authentication to stream TTS directly to the client rather than proxying through your server and adding extra latency.
Reuse connections with keep-alive - The first request to the API incurs a TCP and TLS handshake. Use Connection: keep-alive (and persistent sessions in Python) to reuse the established connection on subsequent requests. See our low-latency Python and JavaScript examples for this technique in practice.

Next Steps

Looking for more tips and tricks? Check out the resources below to get started!