Release Notes

May 5, 2026

Realtime TTS-2

Launched Realtime TTS-2 (inworld-tts-2), our most powerful and expressive TTS model:

Natural Language Steering: Direct any voice with bracketed instructions like [say excitedly], [whisper in a hushed style], or free-form directions like [speak as if barely holding back rage]. Covers articulation, intonation, volume, pitch, range, speed, vocal style, and non-verbals ([laugh], [sigh], etc.). See the Steering guide.
Stronger Multilingual Support: Production-quality synthesis across 15 languages, plus experimental support for 90+ additional languages. See Languages.
Cross-Lingual Voice Synthesis: Reuse the same voice across multiple languages. For best results, specify the language field.
Voice Localization: Localize your voice for the most consistent, native-sounding speech in a target language. See Voice Localization.
Delivery Mode: New deliveryMode field (STABLE, BALANCED, CREATIVE) controls the trade-off between consistency and emotional range.
Updated Voice Design: Released an updated version of Voice Design with improved generations. See Voice Design.

January 21, 2026

Launched Inworld TTS 1.5, our newest generation of realtime TTS models featuring:

Two New Models: Our flagship model inworld-tts-1.5-max is ideal for most use cases, with the best balance of quality and speed. For use cases where latency is the top priority, we also offer inworld-tts-1.5-mini.
Latency Improvements: Our new TTS-1.5 models achieve P90 latency for first audio chunk delivery under 250ms for our Max model and under 130ms for our Mini model, a 4x improvement compared to TTS-1.
More Expressive and More Stable: TTS-1.5 is 30% more expressive than prior generations and demonstrates a 40% reduction in word error rates.
Additional Languages: We’ve added support for additional languages, including Hindi, Arabic, and Hebrew, bringing total languages supported to 15.

August 22, 2025

Released an upgraded version of the Inworld TTS models with higher overall quality.

Speech Quality: Clearer, more natural speech with smoother pacing and more accurate pronunciation.
Voice Similarity: Cloned voices sound closer to the originals, preserving each voice’s unique style.
Non-English Languages: More consistent, reliable output across supported non-English languages.
Custom Pronunciation: New support for inline IPA, giving you control over exact word pronunciations. See the Key Features for details.