Skip to main content
To celebrate the launch of Inworld TTS, for a limited time, all usage in TTS Playground is free.
Inworld’s text-to-speech (TTS) models offer ultra-realistic, context-aware speech synthesis and precise voice cloning capabilities, enabling developers to build natural and engaging experiences with human-like speech quality at an accessible price point. Our models can be accessed via API or the TTS Playground.

Models

Inworld TTS

Our flagship model, offering cost-efficient, ultra-realistic speech

  • Rich, expressive speech and low-latency
  • Support for 12 languages
  • Optimized for real-time use
  • Available on Portal and API
  • Experimental support for audio markups

Inworld TTS Max

Our most powerful and expressive model

  • More expressive, contextually aware speech
  • Stronger multilingual capabilities
  • Available on Portal and API
  • Experimental support for audio markups

Features

 Available in Preview
Radically accessible pricing                See Pricing
State-of-the-art quality                
Real-time latency                
Free instant (zero-shot) voice cloning                
Professional voice cloning                
Custom pronunciation                
Multilingual                
Crosslingual (using the same voice across multiple languages)                
Audio markups for emotion, style and non-verbals                
Multiple model sizes for every use case                
Embedded safeguards               
SOC2 Type II                
On-premise deployments                
Open-Source training & modeling code