Intro to Realtime TTS - Inworld AI Documentation

Inworld’s Realtime TTS models offer ultra-realistic, context-aware speech synthesis, zero data retention, and precise voice cloning capabilities, enabling developers to build natural and engaging experiences with human-like speech quality at an accessible price point. Our models can be accessed via API (streaming and non-streaming) or the TTS Playground.

Developer quickstart

Learn how to make your first API call with a guided tutorial.

TTS Playground

Try different TTS models and voice cloning in TTS Playground.

Code Examples

Browse ready-to-use GitHub samples for common use cases.

Using AI to code? Paste https://docs.inworld.ai/llms.txt into your assistant so it knows every page on this site. Want live search? Add the MCP server.

Models

Realtime TTS-2

Our flagship, top-ranked model — the best quality plus steerability

Natural language steering for more contextually aware speech
Support for 200+ languages and locales
Optimized for real-time use
High quality instant voice cloning
Enhanced timestamps with phonetic details and visemes

Realtime TTS 1.5 Max

Rich, expressive speech with maximum stability

Support for 15 languages
Optimized for real-time use (<200ms median latency)
High quality instant voice cloning

Realtime TTS 1.5 Mini

Our ultra-fast model — for when latency is the top priority

Ultra-low latency (~120ms median latency)
Support for 15 languages
High quality instant voice cloning

See the Models page for model IDs and full details.

Features

Feature	Realtime TTS-2	Realtime TTS 1.5 Max	Realtime TTS 1.5 Mini
Quality	Top-ranked flagship — best quality and steerability	High quality, maximum stability	Great quality at ultra-low latency
P50 Latency	200 ms	200 ms	120 ms
Instant voice cloning
Professional voice cloning
Custom pronunciation
Multilingual	200+ languages	15 languages	15 languages
Steering
Pause controls
Timestamp alignment
On-premises deployments
Zero data retention

Quickstart

Developer quickstart

TTS Playground

Code Examples

​Models

Realtime TTS-2

​Our flagship, top-ranked model — the best quality plus steerability

Realtime TTS 1.5 Max

​Rich, expressive speech with maximum stability

Realtime TTS 1.5 Mini

​Our ultra-fast model — for when latency is the top priority

​Features

Models

Our flagship, top-ranked model — the best quality plus steerability

Rich, expressive speech with maximum stability

Our ultra-fast model — for when latency is the top priority

Features