> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Realtime API (WebRTC)

> Technical specification for Realtime WebRTC signaling and event schemas, including SDP call creation and data channel event contracts.

## Overview

The Realtime WebRTC API has three parts:

1. **Signaling endpoint** — Used to negotiate the WebRTC connection via SDP offer/answer exchange.

```
POST https://api.inworld.ai/v1/realtime/calls
```

2. **ICE servers** — STUN/TURN server configurations for NAT traversal and reliable connectivity.

```
GET https://api.inworld.ai/v1/realtime/ice-servers
```

3. **Data channel events** — Once the WebRTC peer connection is established, all Realtime events flow over a data channel named `oai-events`.

```
Data Channel: oai-events
```

## Examples

Get started quickly with these reference implementations: [JavaScript](https://github.com/inworld-ai/inworld-api-examples/tree/main/realtime/js/webrtc) and [Python](https://github.com/inworld-ai/inworld-api-examples/tree/main/realtime/python/webrtc).

## Authentication

Use your API key for authentication. See [Authentication](/api-reference/introduction) for details.

```
Authorization: Bearer <API_KEY>
```

***

## Signaling Endpoints

### Create Call

<div style={{fontFamily: 'monospace', fontWeight: 600, marginBottom: '0.5rem'}}>POST [https://api.inworld.ai/v1/realtime/calls](https://api.inworld.ai/v1/realtime/calls)</div>

Creates a WebRTC call by posting an SDP offer and optional session configuration. Returns the server's SDP answer.

**Request body:**

| Field     | Type                       | Required | Description                                            |
| --------- | -------------------------- | -------- | ------------------------------------------------------ |
| `sdp`     | string                     | Yes      | SDP offer generated by the client `RTCPeerConnection`. |
| `session` | [Session](#session-object) | No       | Initial session configuration.                         |

**Response body:**

| Field         | Type      | Description                                                                                           |
| ------------- | --------- | ----------------------------------------------------------------------------------------------------- |
| `id`          | string    | Server-assigned call identifier.                                                                      |
| `sdp`         | string    | SDP answer returned by the server.                                                                    |
| `ice_servers` | object\[] | Array of ICE server configurations (same schema as the [Get ICE Servers](#get-ice-servers) response). |

<Accordion title="Example">
  ```json Request theme={"system"}
  {
    "sdp": "v=0\r\no=- 4611731400430051336 2 IN IP4 127.0.0.1\r\n...",
    "session": {
      "model": "llama-3.3-70b-versatile",
      "instructions": "You are a helpful assistant.",
      "output_modalities": ["audio", "text"],
      "audio": {
        "input": {
          "transcription": {
            "model": "inworld/inworld-stt-1"
          },
          "turn_detection": {
            "type": "semantic_vad",
            "eagerness": "medium"
          }
        },
        "output": {
          "model": "inworld-tts-2",
          "voice": "Dennis",
          "speed": 1.0
        }
      }
    }
  }
  ```

  ```json Response theme={"system"}
  {
    "id": "call_abc123",
    "sdp": "v=0\r\no=- 1234567890 2 IN IP4 0.0.0.0\r\n...",
    "ice_servers": [
      {
        "urls": ["stun:stun.l.google.com:19302"]
      },
      {
        "urls": ["turn:turn.example.com:3478"],
        "username": "<TURN_USERNAME>",
        "credential": "<TURN_CREDENTIAL>"
      }
    ]
  }
  ```
</Accordion>

***

### Get ICE Servers

<div style={{fontFamily: 'monospace', fontWeight: 600, marginBottom: '0.5rem'}}>GET [https://api.inworld.ai/v1/realtime/ice-servers](https://api.inworld.ai/v1/realtime/ice-servers)</div>

Returns STUN and TURN server configurations for WebRTC connectivity. Use these ICE servers when creating the `RTCPeerConnection` to ensure reliable connections across NATs and firewalls.

**Response body:**

| Field                      | Type      | Description                                       |
| -------------------------- | --------- | ------------------------------------------------- |
| `ice_servers`              | object\[] | Array of ICE server configurations.               |
| `ice_servers[].urls`       | string\[] | STUN or TURN server URLs.                         |
| `ice_servers[].username`   | string    | TURN credential username (only for TURN servers). |
| `ice_servers[].credential` | string    | TURN credential (only for TURN servers).          |

<Accordion title="Example">
  ```json Response theme={"system"}
  {
    "ice_servers": [
      {
        "urls": [
          "stun:stun.l.google.com:19302",
          "stun:stun1.l.google.com:19302"
        ]
      },
      {
        "urls": [
          "turn:34.41.153.85:3478",
          "turn:34.41.153.85:3479?transport=tcp"
        ],
        "username": "1772761055:6e3fa6ea-2ed0-4306-971e-aa5092cb3736",
        "credential": "6eBPxGW2nsktPFzjjbJSF5PK8ow="
      }
    ]
  }
  ```
</Accordion>

***

## Data Channel Events

Once the WebRTC connection is established, events are exchanged as JSON messages over the `oai-events` data channel. The event protocol is the same as the [Realtime WebSocket API](/api-reference/realtimeAPI/realtime/realtime-websocket).

### Client Events

Events sent from the client to the server.

#### session.update

Update the session configuration. The server responds with a [`session.updated`](#sessionupdated) event.

| Field      | Type                       | Required | Description                         |
| ---------- | -------------------------- | -------- | ----------------------------------- |
| `type`     | session.update             | Yes      | Event type.                         |
| `event_id` | string                     | No       | Optional client-generated event ID. |
| `session`  | [Session](#session-object) | Yes      | Session configuration to apply.     |

<Accordion title="Example">
  ```json theme={"system"}
  {
    "type": "session.update",
    "session": {
      "instructions": "You are a friendly voice assistant.",
      "audio": {
        "input": {
          "transcription": {
            "model": "inworld/inworld-stt-1"
          },
          "turn_detection": {
            "type": "semantic_vad",
            "eagerness": "medium",
            "create_response": true,
            "interrupt_response": true
          }
        },
        "output": {
          "model": "inworld-tts-2",
          "voice": "Dennis",
          "speed": 1.0
        }
      }
    }
  }
  ```
</Accordion>

#### conversation.item.create

Add a conversation item (message, function call result, etc.).

| Field              | Type                                  | Required | Description                         |
| ------------------ | ------------------------------------- | -------- | ----------------------------------- |
| `type`             | conversation.item.create              | Yes      | Event type.                         |
| `event_id`         | string                                | No       | Optional client-generated event ID. |
| `previous_item_id` | string                                | No       | Insert after this item ID.          |
| `item`             | [ConversationItem](#conversationitem) | Yes      | The item to add.                    |

<Accordion title="Example">
  ```json theme={"system"}
  {
    "type": "conversation.item.create",
    "item": {
      "type": "message",
      "role": "user",
      "content": [
        { "type": "input_text", "text": "Hello, how are you?" }
      ]
    }
  }
  ```
</Accordion>

#### conversation.item.truncate

Truncate an assistant message's audio.

| Field           | Type                       | Required | Description                                       |
| --------------- | -------------------------- | -------- | ------------------------------------------------- |
| `type`          | conversation.item.truncate | Yes      | Event type.                                       |
| `event_id`      | string                     | No       | Optional client-generated event ID.               |
| `item_id`       | string                     | Yes      | The ID of the assistant message item to truncate. |
| `content_index` | integer                    | Yes      | Index of the content part to truncate.            |
| `audio_end_ms`  | integer                    | Yes      | Millisecond offset to truncate the audio at.      |

#### conversation.item.delete

Delete a conversation item by ID.

| Field      | Type                     | Required | Description                                |
| ---------- | ------------------------ | -------- | ------------------------------------------ |
| `type`     | conversation.item.delete | Yes      | Event type.                                |
| `event_id` | string                   | No       | Optional client-generated event ID.        |
| `item_id`  | string                   | Yes      | The ID of the conversation item to delete. |

#### conversation.item.retrieve

Retrieve a conversation item by ID.

| Field      | Type                       | Required | Description                                  |
| ---------- | -------------------------- | -------- | -------------------------------------------- |
| `type`     | conversation.item.retrieve | Yes      | Event type.                                  |
| `event_id` | string                     | No       | Optional client-generated event ID.          |
| `item_id`  | string                     | Yes      | The ID of the conversation item to retrieve. |

#### response.create

Trigger a model response. The server streams back response events.

| Field      | Type                              | Required | Description                                  |
| ---------- | --------------------------------- | -------- | -------------------------------------------- |
| `type`     | response.create                   | Yes      | Event type.                                  |
| `event_id` | string                            | No       | Optional client-generated event ID.          |
| `response` | [ResponseConfig](#responseconfig) | No       | Override session defaults for this response. |

<Accordion title="Example">
  ```json theme={"system"}
  {
    "type": "response.create",
    "response": {
      "output_modalities": ["audio", "text"],
      "instructions": "Respond in a cheerful tone."
    }
  }
  ```
</Accordion>

#### response.cancel

Cancel an in-progress response.

| Field         | Type            | Required | Description                                                                |
| ------------- | --------------- | -------- | -------------------------------------------------------------------------- |
| `type`        | response.cancel | Yes      | Event type.                                                                |
| `event_id`    | string          | No       | Optional client-generated event ID.                                        |
| `response_id` | string          | No       | Cancel a specific response by ID. If omitted, cancels the active response. |

#### input\_audio\_buffer.append

Append audio bytes to the input buffer.

| Field      | Type                        | Required | Description                                                                    |
| ---------- | --------------------------- | -------- | ------------------------------------------------------------------------------ |
| `type`     | input\_audio\_buffer.append | Yes      | Event type.                                                                    |
| `event_id` | string                      | No       | Optional client-generated event ID.                                            |
| `audio`    | string                      | Yes      | Base64-encoded audio chunk (\~100–200ms) matching the configured input format. |

#### input\_audio\_buffer.commit

Commit the buffered audio as a user message.

| Field      | Type                        | Required |
| ---------- | --------------------------- | -------- |
| `type`     | input\_audio\_buffer.commit | Yes      |
| `event_id` | string                      | No       |

#### input\_audio\_buffer.clear

Discard all audio in the input buffer.

| Field      | Type                       | Required |
| ---------- | -------------------------- | -------- |
| `type`     | input\_audio\_buffer.clear | Yes      |
| `event_id` | string                     | No       |

#### output\_audio\_buffer.clear

Clear the server's output audio buffer, stopping playback.

| Field      | Type                        | Required |
| ---------- | --------------------------- | -------- |
| `type`     | output\_audio\_buffer.clear | Yes      |
| `event_id` | string                      | No       |

***

### Server Events

Events emitted by the server to the client.

#### session.created

<Info>
  Delivered over the data channel once it opens, carrying the session's default configuration. You don't need to wait for it—send a `session.update` to configure the session as soon as the data channel opens.
</Info>

| Field      | Type                       | Description                    |
| ---------- | -------------------------- | ------------------------------ |
| `type`     | session.created            | Event type.                    |
| `event_id` | string                     | Server-generated event ID.     |
| `session`  | [Session](#session-object) | Current session configuration. |

#### session.updated

Confirms a `session.update` was applied.

| Field      | Type                       | Description                    |
| ---------- | -------------------------- | ------------------------------ |
| `type`     | session.updated            | Event type.                    |
| `event_id` | string                     | Server-generated event ID.     |
| `session`  | [Session](#session-object) | Updated session configuration. |

#### error

Indicates an error occurred.

| Field            | Type   | Description                                               |
| ---------------- | ------ | --------------------------------------------------------- |
| `type`           | error  | Event type.                                               |
| `event_id`       | string | Server-generated event ID.                                |
| `error.type`     | string | Error category.                                           |
| `error.code`     | string | Error code.                                               |
| `error.message`  | string | Human-readable error description.                         |
| `error.param`    | string | Related parameter, if applicable.                         |
| `error.event_id` | string | The client event ID that caused the error, if applicable. |

#### conversation.item.added

A new item was added to the conversation.

| Field              | Type                                  | Description                                           |
| ------------------ | ------------------------------------- | ----------------------------------------------------- |
| `type`             | conversation.item.added               | Event type.                                           |
| `event_id`         | string                                | Server-generated event ID.                            |
| `previous_item_id` | string                                | The ID of the preceding conversation item, or `null`. |
| `item`             | [ConversationItem](#conversationitem) | The item that was added.                              |

#### conversation.item.done

An item finished being populated.

| Field              | Type                                  | Description                                           |
| ------------------ | ------------------------------------- | ----------------------------------------------------- |
| `type`             | conversation.item.done                | Event type.                                           |
| `event_id`         | string                                | Server-generated event ID.                            |
| `previous_item_id` | string                                | The ID of the preceding conversation item, or `null`. |
| `item`             | [ConversationItem](#conversationitem) | The completed item.                                   |

#### Other conversation item events

| Event type                    | Description                                |
| ----------------------------- | ------------------------------------------ |
| `conversation.item.retrieved` | Response to `conversation.item.retrieve`.  |
| `conversation.item.deleted`   | An item was deleted from the conversation. |
| `conversation.item.truncated` | An assistant audio item was truncated.     |

#### conversation.item.input\_audio\_transcription.delta

Streaming partial transcription for user audio.

| Field           | Type                                                | Description                                  |
| --------------- | --------------------------------------------------- | -------------------------------------------- |
| `type`          | conversation.item.input\_audio\_transcription.delta | Event type.                                  |
| `event_id`      | string                                              | Server-generated event ID.                   |
| `item_id`       | string                                              | The conversation item being transcribed.     |
| `content_index` | integer                                             | Index of the content part being transcribed. |
| `delta`         | string                                              | Partial transcription text.                  |

#### conversation.item.input\_audio\_transcription.completed

Final transcription for a user audio item.

| Field           | Type                                                    | Description                                     |
| --------------- | ------------------------------------------------------- | ----------------------------------------------- |
| `type`          | conversation.item.input\_audio\_transcription.completed | Event type.                                     |
| `event_id`      | string                                                  | Server-generated event ID.                      |
| `item_id`       | string                                                  | The conversation item that was transcribed.     |
| `content_index` | integer                                                 | Index of the content part that was transcribed. |
| `transcript`    | string                                                  | Complete transcription text.                    |

#### response.created

A new response was created. Contains the full response object in its initial state.

| Field                        | Type               | Description                                 |
| ---------------------------- | ------------------ | ------------------------------------------- |
| `type`                       | response.created   | Event type.                                 |
| `event_id`                   | string             | Server-generated event ID.                  |
| `response.id`                | string             | Response identifier.                        |
| `response.object`            | realtime.response  | Object type.                                |
| `response.status`            | string             | `"in_progress"`.                            |
| `response.status_details`    | object \| null     | Status details, if any.                     |
| `response.output`            | array              | Output items (empty at creation).           |
| `response.conversation_id`   | string             | Conversation this response belongs to.      |
| `response.output_modalities` | string\[]          | `"text"`, `"audio"`, or both.               |
| `response.max_output_tokens` | integer \| `"inf"` | Token limit for this response.              |
| `response.audio`             | object             | Audio output config echoed from session.    |
| `response.usage`             | object \| null     | Token usage (populated in `response.done`). |
| `response.metadata`          | object \| null     | Response metadata.                          |

#### response.done

The response finished. Contains the completed response object with final status and output.

| Field                        | Type                                     | Description                                                                                             |
| ---------------------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| `type`                       | response.done                            | Event type.                                                                                             |
| `event_id`                   | string                                   | Server-generated event ID.                                                                              |
| `response.id`                | string                                   | Response identifier.                                                                                    |
| `response.object`            | realtime.response                        | Object type.                                                                                            |
| `response.status`            | string                                   | `"completed"`, `"cancelled"`, or `"failed"`.                                                            |
| `response.status_details`    | object                                   | Status details. `type` matches `status`. For cancelled: includes `reason` (e.g., `"client_cancelled"`). |
| `response.output`            | [ConversationItem](#conversationitem)\[] | Completed output items with content.                                                                    |
| `response.conversation_id`   | string                                   | Conversation this response belongs to.                                                                  |
| `response.output_modalities` | string\[]                                | `"text"`, `"audio"`, or both.                                                                           |
| `response.max_output_tokens` | integer \| `"inf"`                       | Token limit for this response.                                                                          |
| `response.audio`             | object                                   | Audio output config.                                                                                    |
| `response.usage`             | object \| null                           | Token usage statistics.                                                                                 |
| `response.metadata`          | object \| null                           | Response metadata.                                                                                      |

#### response.output\_item.added

An output item was added to the response.

| Field          | Type                                  | Description                                |
| -------------- | ------------------------------------- | ------------------------------------------ |
| `type`         | response.output\_item.added           | Event type.                                |
| `event_id`     | string                                | Server-generated event ID.                 |
| `response_id`  | string                                | The response this item belongs to.         |
| `output_index` | integer                               | Index of the output item in the response.  |
| `item`         | [ConversationItem](#conversationitem) | The output item (initially empty content). |

#### response.output\_item.done

An output item finished.

| Field          | Type                                  | Description                               |
| -------------- | ------------------------------------- | ----------------------------------------- |
| `type`         | response.output\_item.done            | Event type.                               |
| `event_id`     | string                                | Server-generated event ID.                |
| `response_id`  | string                                | The response this item belongs to.        |
| `output_index` | integer                               | Index of the output item in the response. |
| `item`         | [ConversationItem](#conversationitem) | The completed output item with content.   |

#### response.content\_part.added

A content part was added to an output item.

| Field           | Type                         | Description                                                                            |
| --------------- | ---------------------------- | -------------------------------------------------------------------------------------- |
| `type`          | response.content\_part.added | Event type.                                                                            |
| `event_id`      | string                       | Server-generated event ID.                                                             |
| `response_id`   | string                       | Response identifier.                                                                   |
| `item_id`       | string                       | Item identifier.                                                                       |
| `output_index`  | integer                      | Index of the output item.                                                              |
| `content_index` | integer                      | Index of the content part.                                                             |
| `part`          | object                       | The content part. `type`: `"audio"` or `"text"`. `transcript`: initially empty string. |

#### response.content\_part.done

A content part finished.

| Field           | Type                        | Description                                         |
| --------------- | --------------------------- | --------------------------------------------------- |
| `type`          | response.content\_part.done | Event type.                                         |
| `event_id`      | string                      | Server-generated event ID.                          |
| `response_id`   | string                      | Response identifier.                                |
| `item_id`       | string                      | Item identifier.                                    |
| `output_index`  | integer                     | Index of the output item.                           |
| `content_index` | integer                     | Index of the content part.                          |
| `part`          | object                      | The completed content part with final `transcript`. |

#### response.output\_text.delta

Streaming text chunk from the model.

| Field           | Type                        | Description                |
| --------------- | --------------------------- | -------------------------- |
| `type`          | response.output\_text.delta | Event type.                |
| `event_id`      | string                      | Server-generated event ID. |
| `response_id`   | string                      | Response identifier.       |
| `item_id`       | string                      | Item identifier.           |
| `output_index`  | integer                     | Index of the output item.  |
| `content_index` | integer                     | Index of the content part. |
| `delta`         | string                      | Text chunk.                |

#### response.output\_text.done

Text output finished.

| Field           | Type                       | Description                |
| --------------- | -------------------------- | -------------------------- |
| `type`          | response.output\_text.done | Event type.                |
| `event_id`      | string                     | Server-generated event ID. |
| `response_id`   | string                     | Response identifier.       |
| `item_id`       | string                     | Item identifier.           |
| `output_index`  | integer                    | Index of the output item.  |
| `content_index` | integer                    | Index of the content part. |
| `text`          | string                     | Complete text output.      |

#### response.output\_audio\_transcript.delta

Streaming transcript for generated audio.

| Field           | Type                                     | Description                |
| --------------- | ---------------------------------------- | -------------------------- |
| `type`          | response.output\_audio\_transcript.delta | Event type.                |
| `event_id`      | string                                   | Server-generated event ID. |
| `response_id`   | string                                   | Response identifier.       |
| `item_id`       | string                                   | Item identifier.           |
| `output_index`  | integer                                  | Index of the output item.  |
| `content_index` | integer                                  | Index of the content part. |
| `delta`         | string                                   | Transcript chunk.          |

#### response.output\_audio\_transcript.done

Final transcript for generated audio.

| Field           | Type                                    | Description                |
| --------------- | --------------------------------------- | -------------------------- |
| `type`          | response.output\_audio\_transcript.done | Event type.                |
| `event_id`      | string                                  | Server-generated event ID. |
| `response_id`   | string                                  | Response identifier.       |
| `item_id`       | string                                  | Item identifier.           |
| `output_index`  | integer                                 | Index of the output item.  |
| `content_index` | integer                                 | Index of the content part. |
| `transcript`    | string                                  | Complete transcript.       |

#### response.output\_audio.delta

Streaming audio alignment data for generated speech. Over WebRTC, audio travels on the RTP media track; this data-channel event carries only `timestamp_info` (the `delta` field is empty). Present only when `providerData.tts.timestamp_type` is set.

| Field            | Type                         | Description                                                                                                                                                       |
| ---------------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`           | response.output\_audio.delta | Event type.                                                                                                                                                       |
| `event_id`       | string                       | Server-generated event ID.                                                                                                                                        |
| `response_id`    | string                       | Response identifier.                                                                                                                                              |
| `item_id`        | string                       | Item identifier.                                                                                                                                                  |
| `output_index`   | integer                      | Index of the output item.                                                                                                                                         |
| `content_index`  | integer                      | Index of the content part.                                                                                                                                        |
| `delta`          | string                       | Always empty over WebRTC (audio is on the media track).                                                                                                           |
| `timestamp_info` | object                       | TTS alignment data. Contains `word_alignment` or `character_alignment`. See [TTS timestamps and alignment](/realtime/provider-data#tts-timestamps-and-alignment). |

#### response.output\_audio.done

Audio output for a content part finished.

| Field           | Type                        | Description                |
| --------------- | --------------------------- | -------------------------- |
| `type`          | response.output\_audio.done | Event type.                |
| `event_id`      | string                      | Server-generated event ID. |
| `response_id`   | string                      | Response identifier.       |
| `item_id`       | string                      | Item identifier.           |
| `output_index`  | integer                     | Index of the output item.  |
| `content_index` | integer                     | Index of the content part. |

#### response.function\_call\_arguments.delta

Streaming function call arguments.

| Field           | Type                                     | Description                             |
| --------------- | ---------------------------------------- | --------------------------------------- |
| `type`          | response.function\_call\_arguments.delta | Event type.                             |
| `event_id`      | string                                   | Server-generated event ID.              |
| `response_id`   | string                                   | Response identifier.                    |
| `item_id`       | string                                   | Item identifier.                        |
| `output_index`  | integer                                  | Index of the output item.               |
| `content_index` | integer                                  | Index of the content part.              |
| `delta`         | string                                   | Arguments chunk (JSON string fragment). |

#### response.function\_call\_arguments.done

Function call arguments finished.

| Field           | Type                                    | Description                                     |
| --------------- | --------------------------------------- | ----------------------------------------------- |
| `type`          | response.function\_call\_arguments.done | Event type.                                     |
| `event_id`      | string                                  | Server-generated event ID.                      |
| `response_id`   | string                                  | Response identifier.                            |
| `item_id`       | string                                  | Item identifier.                                |
| `output_index`  | integer                                 | Index of the output item.                       |
| `content_index` | integer                                 | Index of the content part.                      |
| `arguments`     | string                                  | Complete function call arguments (JSON string). |

#### input\_audio\_buffer.speech\_started

Voice activity detected — user started speaking.

| Field            | Type                                 | Description                                                       |
| ---------------- | ------------------------------------ | ----------------------------------------------------------------- |
| `type`           | input\_audio\_buffer.speech\_started | Event type.                                                       |
| `event_id`       | string                               | Server-generated event ID.                                        |
| `audio_start_ms` | integer                              | Millisecond offset in the audio stream where speech was detected. |
| `item_id`        | string                               | The conversation item ID associated with this speech segment.     |

#### input\_audio\_buffer.speech\_stopped

Voice activity ended — user stopped speaking.

| Field          | Type                                 | Description                                                   |
| -------------- | ------------------------------------ | ------------------------------------------------------------- |
| `type`         | input\_audio\_buffer.speech\_stopped | Event type.                                                   |
| `event_id`     | string                               | Server-generated event ID.                                    |
| `audio_end_ms` | integer                              | Millisecond offset in the audio stream where speech ended.    |
| `item_id`      | string                               | The conversation item ID associated with this speech segment. |

#### input\_audio\_buffer.committed

Buffered audio was committed as a conversation item.

| Field              | Type                           | Description                                           |
| ------------------ | ------------------------------ | ----------------------------------------------------- |
| `type`             | input\_audio\_buffer.committed | Event type.                                           |
| `event_id`         | string                         | Server-generated event ID.                            |
| `previous_item_id` | string                         | The ID of the preceding conversation item, or `null`. |
| `item_id`          | string                         | The new conversation item ID for the committed audio. |

#### input\_audio\_buffer.timeout\_triggered

Idle timeout fired on the input buffer (server VAD only — gated by `turn_detection.idle_timeout_ms`).

| Field            | Type                                    | Description                                                        |
| ---------------- | --------------------------------------- | ------------------------------------------------------------------ |
| `type`           | input\_audio\_buffer.timeout\_triggered | Event type.                                                        |
| `event_id`       | string                                  | Server-generated event ID.                                         |
| `audio_start_ms` | integer                                 | Audio buffer start offset (ms) at the time the idle timeout fired. |
| `audio_end_ms`   | integer                                 | Audio buffer end offset (ms) at the time the idle timeout fired.   |
| `item_id`        | string                                  | Conversation item ID associated with the idle audio buffer.        |

#### input\_audio\_buffer.turn\_suggestion

Server VAD smart-turn detector predicts an end-of-turn boundary. Use this signal to drive low-latency UI cues or to pre-warm a response without waiting for the final `speech_stopped` commit. May be followed by `input_audio_buffer.turn_suggestion_revoked` if the user resumes speaking.

| Field                 | Type                                  | Description                                                                                             |
| --------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| `type`                | input\_audio\_buffer.turn\_suggestion | Event type.                                                                                             |
| `event_id`            | string                                | Server-generated event ID.                                                                              |
| `item_id`             | string                                | The conversation item ID associated with this utterance.                                                |
| `utterance_index`     | integer                               | Monotonic index of the utterance within the session. Pairs with the matching `turn_suggestion_revoked`. |
| `probability`         | number                                | Smart-turn model end-of-turn probability (0.0–1.0).                                                     |
| `trailing_silence_ms` | number                                | Trailing silence at the time of inference, in milliseconds.                                             |
| `audio_duration_ms`   | number                                | Audio duration of the utterance at the time of inference, in milliseconds.                              |
| `inference_ms`        | number                                | Smart-turn model inference latency, in milliseconds.                                                    |

#### input\_audio\_buffer.turn\_suggestion\_revoked

Emitted when the user resumes speaking after a previous `turn_suggestion`. Pairs with the most recent `turn_suggestion` sharing the same `utterance_index`.

| Field             | Type                                           | Description                                                               |
| ----------------- | ---------------------------------------------- | ------------------------------------------------------------------------- |
| `type`            | input\_audio\_buffer.turn\_suggestion\_revoked | Event type.                                                               |
| `event_id`        | string                                         | Server-generated event ID.                                                |
| `item_id`         | string                                         | The conversation item ID associated with this utterance.                  |
| `utterance_index` | integer                                        | Index of the utterance whose previous `turn_suggestion` is being revoked. |

#### Other audio buffer events

| Event type                    | Description                          |
| ----------------------------- | ------------------------------------ |
| `input_audio_buffer.cleared`  | Input audio buffer was cleared.      |
| `output_audio_buffer.started` | Server started sending output audio. |
| `output_audio_buffer.stopped` | Server stopped sending output audio. |
| `output_audio_buffer.cleared` | Output audio buffer was cleared.     |

#### response.backchannel.audio.delta

Streaming PCM audio chunk for a low-latency back-channel interjection (e.g. "uh-huh", "right") emitted while the user is mid-utterance. Out-of-band from the main response stream — use `backchannel_id` to group chunks belonging to the same interjection.

| Field            | Type                             | Description                                                                                                                                                                           |
| ---------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`           | response.backchannel.audio.delta | Event type.                                                                                                                                                                           |
| `event_id`       | string                           | Server-generated event ID.                                                                                                                                                            |
| `backchannel_id` | string                           | Synthetic ID grouping deltas + done for a single back-channel interjection. Use as the playback bucket key so chunks of one interjection don't collide with the active response item. |
| `delta`          | string                           | Base64-encoded audio chunk in the session's configured `audio.output.format` (PCM16, `audio/pcmu`, or `audio/pcma`).                                                                  |

#### response.backchannel.audio.done

All audio for a back-channel interjection has been streamed. No teardown required — playback queues until exhausted.

| Field            | Type                            | Description                                                                                                                     |
| ---------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `type`           | response.backchannel.audio.done | Event type.                                                                                                                     |
| `event_id`       | string                          | Server-generated event ID.                                                                                                      |
| `backchannel_id` | string                          | Identifies which back-channel interjection finished streaming.                                                                  |
| `phrase`         | string                          | The chosen back-channel utterance (e.g. `"uh-huh"`). Optional — omitted when the decider doesn't surface the phrase to clients. |

#### response.backchannel.skipped

An evaluation tick chose not to fire a back-channel. Useful for client-side telemetry; clients that don't care can ignore this event.

| Field      | Type                         | Description                                                                                                                                                                                 |
| ---------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`     | response.backchannel.skipped | Event type.                                                                                                                                                                                 |
| `event_id` | string                       | Server-generated event ID.                                                                                                                                                                  |
| `reason`   | string                       | Short machine-readable string describing why no back-channel was emitted on this evaluation tick (e.g. `min_gap_not_elapsed`, `deadline_missed`, `no_phrase`). Stable enough for telemetry. |

#### rate\_limits.updated

Reports current rate limit state.

| Field      | Type                 | Description                |
| ---------- | -------------------- | -------------------------- |
| `type`     | rate\_limits.updated | Event type.                |
| `event_id` | string               | Server-generated event ID. |

***

## Schemas

### Session object

The session object configures model behavior, audio settings, tools, and more. It appears in signaling requests, `session.update`, `session.created`, and `session.updated` events.

| Field                    | Type                                            | Description                                                                                                                                                                                             |
| ------------------------ | ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `object`                 | realtime.session                                | Object type identifier (read-only).                                                                                                                                                                     |
| `type`                   | realtime                                        | Fixed value.                                                                                                                                                                                            |
| `id`                     | string                                          | Server-assigned session ID (read-only).                                                                                                                                                                 |
| `model`                  | string                                          | Model identifier.                                                                                                                                                                                       |
| `instructions`           | string                                          | System instructions for the model.                                                                                                                                                                      |
| `output_modalities`      | string\[]                                       | Output types: `"text"`, `"audio"`, or both.                                                                                                                                                             |
| `temperature`            | number                                          | The sampling temperature used for response generation.                                                                                                                                                  |
| `max_output_tokens`      | integer \| `"inf"`                              | Maximum tokens per response (1–4096 or `"inf"`).                                                                                                                                                        |
| `audio`                  | [AudioConfig](#audioconfig)                     | Audio input/output settings.                                                                                                                                                                            |
| `tools`                  | [Tool](#tool)\[]                                | Function tools available to the model.                                                                                                                                                                  |
| `tool_choice`            | string \| [ToolChoiceTarget](#toolchoicetarget) | `"none"`, `"auto"`, `"required"`, or a specific tool target.                                                                                                                                            |
| `truncation`             | string \| object                                | `"auto"`, `"disabled"`, or a `retention_ratio` config.                                                                                                                                                  |
| `tracing`                | string \| object                                | `"auto"` or a tracing config with `workflow_name`, `group_id`, `metadata`.                                                                                                                              |
| `include`                | string\[]                                       | Optional data to include, e.g. `"item.input_audio_transcription.logprobs"`.                                                                                                                             |
| `text_generation_config` | object                                          | Fine-grained LLM generation parameters including `reasoning` (`effort`, `maxTokens`, `exclude`). See [text\_generation\_config](/realtime/provider-data#text-generation-config-text_generation_config). |
| `providerData`           | object                                          | Inworld extensions: `stt`, `tts`, `memory`, `backchannel`, `responsiveness`. See [API Extensions](/realtime/provider-data).                                                                             |
| `expires_at`             | integer                                         | Unix timestamp for session expiration (read-only).                                                                                                                                                      |

### AudioConfig

| Field                   | Type                            | Description                                                                                                                                                                                                                                                                            |
| ----------------------- | ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `input.format`          | [AudioFormat](#audioformat)     | Input audio format.                                                                                                                                                                                                                                                                    |
| `input.noise_reduction` | object                          | Noise reduction config. `type`: `"near_field"` or `"far_field"`.                                                                                                                                                                                                                       |
| `input.transcription`   | object                          | Transcription config. `model`: transcription model identifier (e.g., `inworld/inworld-stt-1`, `assemblyai/u3-rt-pro`, `soniox/stt-rt-v4`). `language`: optional language code. `prompt`: optional transcription prompt.                                                                |
| `input.turn_detection`  | [TurnDetection](#turndetection) | Turn detection config.                                                                                                                                                                                                                                                                 |
| `output.format`         | [AudioFormat](#audioformat)     | Output audio format.                                                                                                                                                                                                                                                                   |
| `output.voice`          | string                          | Voice preset for audio output (e.g., `Dennis`). See the [List Voices](https://docs.inworld.ai/api-reference/voiceAPI/voiceservice/list-voices) API or the [Voice library](https://platform.inworld.ai/voice-library) page in the Inworld Portal for the full list of supported voices. |
| `output.model`          | string                          | The TTS model used for audio output.                                                                                                                                                                                                                                                   |
| `output.speed`          | number                          | Playback speed (0.25–1.5).                                                                                                                                                                                                                                                             |

### AudioFormat

| Field  | Type    | Description                                                  |
| ------ | ------- | ------------------------------------------------------------ |
| `type` | string  | MIME type: `"audio/pcm"`, `"audio/pcmu"`, or `"audio/pcma"`. |
| `rate` | integer | Sample rate in Hz. Currently `24000`.                        |

### TurnDetection

Turn detection has two modes, selected by the `type` field.

**Server VAD** (`type: "server_vad"`):

| Field                 | Type        | Description                                              |
| --------------------- | ----------- | -------------------------------------------------------- |
| `type`                | server\_vad | Mode selector.                                           |
| `threshold`           | number      | VAD sensitivity (0–1).                                   |
| `prefix_padding_ms`   | integer     | Milliseconds of audio to include before speech onset.    |
| `silence_duration_ms` | integer     | Silence duration (ms) before speech is considered ended. |
| `create_response`     | boolean     | Auto-trigger `response.create` after speech ends.        |
| `interrupt_response`  | boolean     | Allow new speech to interrupt active responses.          |
| `idle_timeout_ms`     | integer     | Idle timeout in milliseconds.                            |

**Semantic VAD** (`type: "semantic_vad"`):

| Field                | Type          | Description                                       |
| -------------------- | ------------- | ------------------------------------------------- |
| `type`               | semantic\_vad | Mode selector.                                    |
| `eagerness`          | string        | `"low"`, `"medium"`, `"high"`, or `"auto"`.       |
| `create_response`    | boolean       | Auto-trigger `response.create` after speech ends. |
| `interrupt_response` | boolean       | Allow new speech to interrupt active responses.   |

### ConversationItem

| Field     | Type                           | Required | Description                                                                             |
| --------- | ------------------------------ | -------- | --------------------------------------------------------------------------------------- |
| `object`  | realtime.item                  | No       | Object type identifier (read-only, present in server responses).                        |
| `id`      | string                         | No       | Item ID.                                                                                |
| `type`    | string                         | Yes      | Item type (e.g., `"message"`, `"function_call_result"`).                                |
| `status`  | string                         | No       | Item status: `"completed"` or `"in_progress"` (read-only, present in server responses). |
| `role`    | string                         | No       | `"system"`, `"user"`, `"assistant"`, or `"tool"`.                                       |
| `content` | [ContentPart](#contentpart)\[] | No       | Array of content parts.                                                                 |

### ContentPart

| Field        | Type   | Required | Description                                                                |
| ------------ | ------ | -------- | -------------------------------------------------------------------------- |
| `type`       | string | Yes      | Content type (e.g., `"input_text"`, `"input_audio"`, `"text"`, `"audio"`). |
| `text`       | string | No       | Text content.                                                              |
| `audio`      | string | No       | Base64-encoded audio.                                                      |
| `transcript` | string | No       | Human-readable transcript accompanying audio.                              |

### ResponseConfig

Per-response overrides for session defaults.

| Field               | Type                                            | Description                              |
| ------------------- | ----------------------------------------------- | ---------------------------------------- |
| `conversation`      | string                                          | `"auto"` or a conversation ID.           |
| `output_modalities` | string\[]                                       | `"text"`, `"audio"`, or both.            |
| `instructions`      | string                                          | Override instructions for this response. |
| `voice`             | string                                          | Override voice for this response.        |
| `max_output_tokens` | integer \| `"inf"`                              | Override max tokens.                     |
| `tool_choice`       | string \| [ToolChoiceTarget](#toolchoicetarget) | Override tool choice.                    |
| `tools`             | [Tool](#tool)\[]                                | Override available tools.                |

### Tool

| Field         | Type     | Required | Description                          |
| ------------- | -------- | -------- | ------------------------------------ |
| `type`        | function | Yes      | Tool type.                           |
| `name`        | string   | Yes      | Function name.                       |
| `description` | string   | No       | What the function does.              |
| `parameters`  | object   | No       | JSON Schema for function parameters. |

### ToolChoiceTarget

Specifies tool choice behavior. The server always returns this as an object.

| Field          | Type   | Required | Description                                                 |
| -------------- | ------ | -------- | ----------------------------------------------------------- |
| `type`         | string | Yes      | `"auto"`, `"none"`, `"required"`, `"function"`, or `"mcp"`. |
| `name`         | string | No       | Function name (when `type` is `"function"`).                |
| `server_label` | string | No       | MCP server label (when `type` is `"mcp"`).                  |