Skip to main content
The Inworld STT model (inworld/inworld-stt-1) supports 30 languages for speech recognition.

Supported languages

LanguageCode
Arabicar
Cantoneseyue
Chinesezh
Czechcs
Danishda
Dutchnl
Englishen
Filipinofil
Finnishfi
Frenchfr
Germande
Greekel
Hindihi
Hungarianhu
Indonesianid
Italianit
Japaneseja
Koreanko
Macedonianmk
Malayms
Persianfa
Polishpl
Portuguesept
Romanianro
Russianru
Spanishes
Swedishsv
Thaith
Turkishtr
Vietnamesevi

Specifying a language

The language field is a language hint — it tells the model which language to prefer, but it is not guaranteed to be respected. The model automatically detects the spoken language from the audio, and you can switch languages in the middle of a conversation without changing the hint. The field accepts ISO 639-1 language codes (e.g., en, ja) matching the codes listed in the table above.
BCP-47 codes (e.g., en-US, ja-JP) are also accepted and will be automatically converted to the base ISO 639 language code — for example, en-US becomes en. Regional variants do not affect recognition behavior.
If you know the primary language of the audio in advance, providing a hint will generally produce more accurate transcription results — especially for short utterances where auto-detection may not have enough context.

Third-party provider languages

The Inworld STT API also supports models from third-party providers, each with their own language coverage. See the provider documentation for details:
ProviderModelsLanguage documentation
Groqgroq/whisper-large-v3Whisper — supported languages
AssemblyAIassemblyai/universal-streaming-multilingual, assemblyai/u3-rt-pro, assemblyai/whisper-rtAssemblyAI — supported languages
Sonioxsoniox/stt-rt-v4Soniox — supported languages

Next steps

Developer Quickstart

Make your first STT API call and get a transcript.

API Reference

View the complete API specification.