Transcribe audio to text using leading STT providers through a single API.
Inworld’s Speech-to-Text (STT) API provides a unified integration point for industry-leading transcription providers. You get consistent authentication, request formatting, and response handling across providers — without managing multiple SDKs or credentials.The API supports both synchronous transcription for complete audio files and real-time bidirectional streaming over WebSocket for live audio.
Recommended defaults: 16,000 Hz sample rate, 16-bit depth, mono. For container formats (MP3, FLAC, OGG_OPUS), sampleRateHertz is optional — the API auto-detects it from the file header.