Bidirectional streaming API for real-time speech-to-text transcription over WebSocket.
This method listens for streaming audio input and returns recognized text chunks one by one as soon as they are ready. Audio chunks are expected to be a part of a single voice input. Suitable for streaming live conversations, microphone input, or other streaming audio sources.
To use the API:
transcribe_config message first to configure the session (model, language, audio encoding, etc.).audio_chunk messages containing raw audio bytes.transcription results as they become available, including both interim (partial) and final results.end_turn to signal end of a speaker’s turn.close_stream when done.