Messages

{
  "type": "session.update",
  "session": {
    "instructions": "You are a friendly voice assistant.",
    "audio": {
      "input": {
        "transcription": {
          "model": "assemblyai/universal-streaming-multilingual"
        },
        "turn_detection": {
          "type": "semantic_vad",
          "eagerness": "medium",
          "create_response": true,
          "interrupt_response": true
        }
      },
      "output": {
        "voice": "Dennis",
        "speed": 1
      }
    }
  }
}

{
  "event_id": "2c23cfd4-a4b5-4a96-83b8-a6a151f3989e",
  "type": "error",
  "error": {
    "type": "server_error",
    "code": null,
    "message": "Failed to read content stream.",
    "param": null,
    "event_id": null
  }
}

Realtime API

Realtime API (WebSocket)

Real-time, multimodal AI interactions over WebSocket. Enables low-latency speech-to-speech conversations with AI models, supporting both audio and text modalities.

The API maintains a persistent WebSocket connection where clients can:

Create and configure sessions with custom instructions and voice settings
Stream audio input in real-time for natural voice conversations
Send text input as an alternative to audio
Receive streaming audio and text responses with low latency
Manage conversation flow with turn detection and response control

Key Features:

Low Latency: Optimized for real-time interactions
Multimodal: Supports both audio and text input/output
Voice Activity Detection: Automatic speech detection with configurable thresholds
Streaming Responses: Receive response events as they’re generated
Session Management: Maintain conversation context across multiple interactions

Default Rate Limits:

Max Concurrent Sessions: 20 sessions per account
Max Packets Per Second: 1,000 packets per second, shared across all sessions in an account

This API implements the Realtime interface. Refer to the Realtime overview for hands-on guides.

WSS

api

realtime

session

Messages

{
  "type": "session.update",
  "session": {
    "instructions": "You are a friendly voice assistant.",
    "audio": {
      "input": {
        "transcription": {
          "model": "assemblyai/universal-streaming-multilingual"
        },
        "turn_detection": {
          "type": "semantic_vad",
          "eagerness": "medium",
          "create_response": true,
          "interrupt_response": true
        }
      },
      "output": {
        "voice": "Dennis",
        "speed": 1
      }
    }
  }
}

{
  "event_id": "2c23cfd4-a4b5-4a96-83b8-a6a151f3989e",
  "type": "error",
  "error": {
    "type": "server_error",
    "code": null,
    "message": "Failed to read content stream.",
    "param": null,
    "event_id": null
  }
}

bearerAuth

type:http

Use your API key for authentication. See Authentication for details.

session.update

type:object

Update the session configuration. The server responds with a session.updated event.

conversation.item.create

type:object

Add a conversation item (message, function call result, etc.).

conversation.item.truncate

type:object

Truncate an assistant message's audio.

conversation.item.delete

type:object

Delete a conversation item by ID.

conversation.item.retrieve

type:object

Retrieve a conversation item by ID.

response.create

type:object

Trigger a model response. The server streams back response events.

response.cancel

type:object

Cancel an in-progress response.

input_audio_buffer.append

type:object

Append audio bytes to the input buffer.

input_audio_buffer.commit

type:object

Commit the buffered audio as a user message.

input_audio_buffer.clear

type:object

Discard all audio in the input buffer.

output_audio_buffer.clear

type:object

Clear the server's output audio buffer, stopping playback.

session.created

type:object

Not currently supported. The session starts immediately with default configuration. Send a session.update to configure the session.

session.updated

type:object

Confirms a session.update was applied.

error

type:object

Indicates an error occurred.

conversation.item.added

type:object

A new item was added to the conversation.

conversation.item.done

type:object

An item finished being populated.

conversation.item.deleted

type:object

An item was deleted from the conversation.

conversation.item.retrieved

type:object

Response to conversation.item.retrieve.

conversation.item.truncated

type:object

An assistant audio item was truncated.

conversation.item.input_audio_transcription.delta

type:object

Streaming partial transcription for user audio.

conversation.item.input_audio_transcription.completed

type:object

Final transcription for a user audio item.

response.created

type:object

A new response was created. Contains the full response object in its initial state.

response.done

type:object

The response finished. Contains the completed response object with final status and output.

response.output_item.added

type:object

An output item was added to the response.

response.output_item.done

type:object

An output item finished.

response.content_part.added

type:object

A content part was added to an output item.

response.content_part.done

type:object

A content part finished.

response.output_text.delta

type:object

Streaming text chunk from the model.

response.output_text.done

type:object

Text output finished.

response.output_audio_transcript.delta

type:object

Streaming transcript for generated audio.

response.output_audio_transcript.done

type:object

Final transcript for generated audio.

response.output_audio.done

type:object

Audio output for a content part finished.

response.function_call_arguments.delta

type:object

Streaming function call arguments.

response.function_call_arguments.done

type:object

Function call arguments finished.

input_audio_buffer.speech_started

type:object

Voice activity detected — user started speaking.

input_audio_buffer.speech_stopped

type:object

Voice activity ended — user stopped speaking.

input_audio_buffer.committed

type:object

Buffered audio was committed as a conversation item.

input_audio_buffer.cleared

type:object

Input audio buffer was cleared.

input_audio_buffer.timeout_triggered

type:object

An idle timeout was triggered on the input buffer.

output_audio_buffer.started

type:object

Server started sending output audio.

output_audio_buffer.stopped

type:object

Server stopped sending output audio.

output_audio_buffer.cleared

type:object

Output audio buffer was cleared.

rate_limits.updated

type:object

Reports current rate limit state.

Transcribe audio (WebSocket)Realtime API (WebRTC)

Overview

Text-to-Speech

Voices

Speech-to-Text

Realtime API

LLM

Router

Moderation

Models

Embeddings

Realtime API (WebSocket)

Overview

Text-to-Speech

Voices

Speech-to-Text

Realtime API

LLM

Router

Moderation

Models

Embeddings

Documentation Index