Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt

Use this file to discover all available pages before exploring further.

Inworld’s Realtime TTS models offer ultra-realistic, context-aware speech synthesis, zero data retention, and precise voice cloning capabilities, enabling developers to build natural and engaging experiences with human-like speech quality at an accessible price point. Our models can be accessed via API (streaming and non-streaming) or the TTS Playground.

Developer quickstart

Learn how to make your first API call with a guided tutorial.

TTS Playground

Try different TTS models and voice cloning in TTS Playground.

Code Examples

Browse ready-to-use GitHub samples for common use cases.

Models

Realtime TTS 1.5 Max

Our flagship model, delivering the best balance of quality and speed

  • Rich, expressive, contextually aware speech
  • Support for 15 languages
  • Optimized for real-time use (<200ms median latency)
  • High quality instant voice cloning

Realtime TTS 1.5 Mini

Our ultra-fast, most cost-efficient model. For when latency is the top priority.

  • Ultra-low latency (~120ms median latency)
  • Support for 15 languages
  • Radically affordable pricing
  • High quality instant voice cloning

Features

FeatureRealtime TTS-1.5-MaxRealtime TTS-1.5-Mini
Radically accessible pricing                See pricing               See pricing            
Quality                #1 ranked, maximum stability#1 ranked
P50 Latency                200 ms120 ms
Instant voice cloning                
Professional voice cloning                
Custom pronunciation                
Multilingual                15 languages15 languages
Pause controls                
Timestamp alignment                
On-premises deployments                
Zero data retention