Synthesize Speech

You send text, the server generates the entire audio, and returns it in a single HTTP response. No streaming, no open connections — just one request and one response with the complete audio file. Best for batch or offline work like audiobooks, voiceovers, and podcasts, or any workflow where you can wait for the full file before playback.

For real-time playback or low-latency scenarios, use the Streaming API or WebSocket API. See the latency best practices for details.