TTS-1.5 is here: under 120ms latency, optimized stability! Learn more
{
"type": "session.update",
"session": {
"instructions": "You are a friendly voice assistant.",
"audio": {
"input": {
"transcription": {
"model": "assemblyai/universal-streaming-multilingual"
},
"turn_detection": {
"type": "semantic_vad",
"eagerness": "medium",
"create_response": true,
"interrupt_response": true
}
},
"output": {
"voice": "Dennis",
"speed": 1
}
}
}
}{
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [
{
"type": "input_text",
"text": "Hello, how are you?"
}
]
}
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>",
"content_index": 123,
"audio_end_ms": 123
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"type": "response.create",
"response": {
"output_modalities": [
"audio",
"text"
],
"instructions": "Respond in a cheerful tone."
}
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"audio": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}No examples foundNo examples found{
"event_id": "2c23cfd4-a4b5-4a96-83b8-a6a151f3989e",
"type": "error",
"error": {
"type": "server_error",
"code": null,
"message": "Failed to read content stream.",
"param": null,
"event_id": null
}
}No examples foundNo examples found{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>",
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>",
"content_index": 123,
"transcript": "<string>"
}No examples foundNo examples foundNo examples foundNo examples foundNo examples foundNo examples found{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"text": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"transcript": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"arguments": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"audio_start_ms": 123,
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"audio_end_ms": 123,
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}Real-time, multimodal AI interactions over WebSocket. Enables low-latency speech-to-speech conversations with AI models, supporting both audio and text modalities.
The API maintains a persistent WebSocket connection where clients can:
Key Features:
Default Rate Limits:
This API implements the Realtime interface. Refer to the Realtime overview for hands-on guides.
{
"type": "session.update",
"session": {
"instructions": "You are a friendly voice assistant.",
"audio": {
"input": {
"transcription": {
"model": "assemblyai/universal-streaming-multilingual"
},
"turn_detection": {
"type": "semantic_vad",
"eagerness": "medium",
"create_response": true,
"interrupt_response": true
}
},
"output": {
"voice": "Dennis",
"speed": 1
}
}
}
}{
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [
{
"type": "input_text",
"text": "Hello, how are you?"
}
]
}
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>",
"content_index": 123,
"audio_end_ms": 123
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"type": "response.create",
"response": {
"output_modalities": [
"audio",
"text"
],
"instructions": "Respond in a cheerful tone."
}
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"audio": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}No examples foundNo examples found{
"event_id": "2c23cfd4-a4b5-4a96-83b8-a6a151f3989e",
"type": "error",
"error": {
"type": "server_error",
"code": null,
"message": "Failed to read content stream.",
"param": null,
"event_id": null
}
}No examples foundNo examples found{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>",
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>",
"content_index": 123,
"transcript": "<string>"
}No examples foundNo examples foundNo examples foundNo examples foundNo examples foundNo examples found{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"text": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"transcript": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"delta": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"response_id": "<string>",
"item_id": "<string>",
"output_index": 123,
"content_index": 123,
"arguments": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"audio_start_ms": 123,
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"audio_end_ms": 123,
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>",
"item_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}{
"const": "<string>",
"event_id": "<string>"
}Use your API key for authentication. See Authentication for details.
Update the session configuration. The server responds with a session.updated event.
Add a conversation item (message, function call result, etc.).
Truncate an assistant message's audio.
Delete a conversation item by ID.
Retrieve a conversation item by ID.
Trigger a model response. The server streams back response events.
Cancel an in-progress response.
Append audio bytes to the input buffer.
Commit the buffered audio as a user message.
Discard all audio in the input buffer.
Clear the server's output audio buffer, stopping playback.
Not currently supported. The session starts immediately with default configuration. Send a session.update to configure the session.
Confirms a session.update was applied.
Indicates an error occurred.
A new item was added to the conversation.
An item finished being populated.
An item was deleted from the conversation.
Response to conversation.item.retrieve.
An assistant audio item was truncated.
Streaming partial transcription for user audio.
Final transcription for a user audio item.
A new response was created. Contains the full response object in its initial state.
The response finished. Contains the completed response object with final status and output.
An output item was added to the response.
An output item finished.
A content part was added to an output item.
A content part finished.
Streaming text chunk from the model.
Text output finished.
Streaming transcript for generated audio.
Final transcript for generated audio.
Audio output for a content part finished.
Streaming function call arguments.
Function call arguments finished.
Voice activity detected — user started speaking.
Voice activity ended — user stopped speaking.
Buffered audio was committed as a conversation item.
Input audio buffer was cleared.
An idle timeout was triggered on the input buffer.
Server started sending output audio.
Server stopped sending output audio.
Output audio buffer was cleared.
Reports current rate limit state.
Was this page helpful?