Timestamp alignment currently supports English only; other languages are experimental.
timestampType request parameter to control granularity:
WORD: Return timestamps for each word, including detailed phoneme-level timing with viseme symbolsCHARACTER: Return timestamps for each character or punctuation
Enabling timestamp alignment can increase latency (especially for the non-streaming endpoint).
WORD:timestampInfo.wordAlignmentwithwords,wordStartTimeSeconds,wordEndTimeSeconds- For TTS 1.5 models,
phoneticDetailscontaining detailed phoneme-level timing with viseme symbols
- For TTS 1.5 models,
CHARACTER:timestampInfo.characterAlignmentwithcharacters,characterStartTimeSeconds,characterEndTimeSeconds
Phoneme and viseme timings (
phoneticDetails) are currently only returned for WORD alignment (not CHARACTER).Streaming behavior
Each streamed chunk includes alignment data for that specific chunk. Audio and alignment arrive together in sync.Response structure
TTS 1.5 models (inworld-tts-1.5-mini, inworld-tts-1.5-max)
Returns enhanced alignment data with phonetic details: detailed phoneme-level timing with viseme symbols for precise lip-sync animation.
Phonetic details structure
Each entry inphoneticDetails contains:
| Field | Description |
|---|---|
wordIndex | Index of the word this phonetic detail belongs to (0-based). |
phones | Array of phonemes that make up this word. |
isPartial | True when the server considers the word potentially unstable (e.g., last word in a non-final streaming update). Clients may choose to delay processing partial words until isPartial becomes false. |
| Field | Description |
|---|---|
phoneSymbol | The phoneme symbol in IPA notation. |
startTimeSeconds | Start time of the phoneme in seconds. May be omitted for the first phoneme of a word. |
durationSeconds | Duration of the phoneme in seconds. |
visemeSymbol | The viseme symbol for lip-sync animation. |
Viseme symbols
The following viseme symbols are used for lip-sync animation:| Viseme | Description |
|---|---|
aei | Open mouth vowels (a, e, i, ə, ʌ, æ, ɑ, etc.) |
o | Rounded vowels (o, ʊ, əʊ, oʊ, etc.) |
ee | Front vowels (i, ɪ, eɪ, etc.) |
bmp | Bilabial consonants (b, m, p) |
fv | Labiodental consonants (f, v) |
l | Lateral consonant (l) |
r | Rhotic sounds (r, ɝ, ɚ) |
th | Dental fricatives (θ, ð) |
qw | Rounded consonants (w, ʍ) |
cdgknstxyz | Alveolar/velar consonants (c, d, g, k, n, s, t, x, y, z) |