Skip to main content
Timestamp alignment lets you retrieve timing information that matches the generated audio, which is useful for experiences like word highlighting, karaoke‑style captions, and lipsync. Set the timestampType request parameter to control granularity:
  • WORD: Return timestamps for each word
  • CHARACTER: Return timestamps for each character or punctuation
When enabled, the response includes timestamp arrays:
  • WORD: timestampInfo.wordAlignment with words, wordStartTimeSeconds, wordEndTimeSeconds
  • CHARACTER: timestampInfo.characterAlignment with characters, characterStartTimeSeconds, characterEndTimeSeconds
See the API reference for full details.
Timestamp alignment currently supports English only; other languages are experimental.
Enabling timestamp alignment slightly increases latency; internal experiments show an average ~100 ms increase.