> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# WebRTC

> Connect to the Realtime API over WebRTC for low-latency audio, with JSON events over a data channel.

Connect via WebRTC for browser-native, low-latency voice. A WebRTC proxy bridges your peer connection to the same realtime service used by the [WebSocket](/realtime/connect/websocket) transport, transcoding OPUS ↔ PCM16 and forwarding events transparently.

## Endpoint

```
https://api.inworld.ai
```

| Endpoint                   | Method | Description                    |
| -------------------------- | ------ | ------------------------------ |
| `/v1/realtime/calls`       | POST   | SDP offer/answer exchange      |
| `/v1/realtime/ice-servers` | GET    | STUN/TURN server configuration |

## Authentication

Pass your Inworld API key as a Bearer token. The proxy forwards it to the realtime service.

```
Authorization: Bearer <base64-api-key>
```

<Warning>
  Keep the API key server-side. Serve it to the browser via a backend endpoint (see examples below).
</Warning>

## Flow

1. Fetch config from your server (API key + ICE servers)
2. Create `RTCPeerConnection` with ICE servers
3. Create data channel `oai-events` + add microphone track
4. Create SDP offer → POST to `/v1/realtime/calls` → set SDP answer
5. Data channel opens → send `session.update` → start conversation

Audio flows via RTP tracks (no manual encode/decode). Events flow via data channel using the same JSON schema as [WebSocket](/realtime/connect/websocket).

## Session Config

Same `session.update` as WebSocket, sent through the data channel. See [Configuring Models](/realtime/usage/using-realtime-models) for the full breakdown of STT, LLM, and TTS configuration.

```javascript theme={"system"}
dc.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    model: 'openai/gpt-4o-mini',
    instructions: 'You are a concise concierge.',
    output_modalities: ['audio', 'text'],
    audio: {
      input: {
        turn_detection: {
          type: 'semantic_vad',
          eagerness: 'medium',
          create_response: true,
          interrupt_response: true
        }
      },
      output: {
        voice: 'Clive',
        model: 'inworld-tts-2',
        speed: 1.0
      }
    },
    tools: [{
      type: 'function',
      name: 'get_weather',
      description: 'Fetch weather for a location',
      parameters: {
        type: 'object',
        properties: { location: { type: 'string' } },
        required: ['location']
      }
    }],
    providerData: {
      stt: {
        voice_profile: true,
        language_hints: ['en-US', 'es-MX'],
        end_of_turn_confidence_threshold: 0.7,
        min_end_of_turn_silence: 200,
        max_turn_silence: 5000,
        vad_threshold: 0.5
      },
      tts: {
        segmenter_strategy: 'sentence',
        steering_handling: 'emit_once',
        language: 'en-US',
        delivery_mode: 'CREATIVE',
        conversational: false
      },
      memory: {
        enabled: true,
        turn_interval: 5,
        max_facts: 50
      },
      backchannel: {
        enabled: true
      },
      responsiveness: {
        enabled: true
      }
    }
  }
}));
```

`providerData` carries Inworld-specific extensions to the OpenAI-compatible session shape — STT tuning, TTS segmentation/steering, automatic memory, and more. Most `providerData` fields are hot-swappable via partial `session.update`, but a few are read only at session open and ignored afterwards (notably `providerData.tts.conversational` and `providerData.tts.user_turn_mode`). See [Inworld Realtime API Extensions](/realtime/provider-data) for the field-by-field reference.

## Audio

Unlike WebSocket (manual base64 PCM), WebRTC handles audio natively:

* **Input**: browser captures mic and sends OPUS over RTP automatically
* **Output**: proxy sends AI audio back as an RTP track — attach to `<audio>` to play

```javascript theme={"system"}
pc.ontrack = (e) => {
  const audio = document.createElement('audio');
  audio.autoplay = true;
  audio.srcObject = new MediaStream([e.track]);
  document.body.appendChild(audio);
};
```

<Note>
  `response.output_audio.delta` events are **not** sent through the data channel — audio is delivered via the RTP track instead.
</Note>

## Text & Responses

Same as WebSocket, but sent through the data channel:

```javascript theme={"system"}
dc.send(JSON.stringify({
  type: 'conversation.item.create',
  item: { type: 'message', role: 'user', content: [{ type: 'input_text', text: 'Hello!' }] }
}));
dc.send(JSON.stringify({ type: 'response.create' }));
```

## Events

Same event types as [WebSocket](/realtime/connect/websocket#events), received on the data channel.

## Option 1: Direct WebRTC

Server — serves the page and a `/api/config` endpoint that fetches ICE servers and keeps the API key hidden:

```javascript theme={"system"}
import 'dotenv/config';
import { readFileSync } from 'fs';
import { createServer } from 'http';

const html = readFileSync('index.html');
const API_KEY = process.env.INWORLD_API_KEY || '';
const PROXY = 'https://api.inworld.ai';

const server = createServer(async (req, res) => {
  if (req.url === '/api/config') {
    let ice = [];
    try {
      const r = await fetch(`${PROXY}/v1/realtime/ice-servers`, {
        headers: { Authorization: `Bearer ${API_KEY}` },
      });
      if (r.ok) ice = (await r.json()).ice_servers || [];
    } catch {}
    res.writeHead(200, { 'Content-Type': 'application/json' });
    res.end(JSON.stringify({ api_key: API_KEY, ice_servers: ice, url: `${PROXY}/v1/realtime/calls` }));
    return;
  }
  res.writeHead(200, { 'Content-Type': 'text/html' });
  res.end(html);
});
let port = 3000;
server.on('error', (e) => {
  if (e.code === 'EADDRINUSE') { console.warn(`Port ${port} in use, trying ${++port}…`); server.listen(port); }
  else throw e;
});
server.listen(port, () => console.log(`http://localhost:${port}`));
```

Client — full WebRTC flow in the browser:

```javascript theme={"system"}
const cfg = await (await fetch('/api/config')).json();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

const pc = new RTCPeerConnection({ iceServers: cfg.ice_servers });
const dc = pc.createDataChannel('oai-events', { ordered: true });
stream.getAudioTracks().forEach(t => pc.addTrack(t, stream));

pc.ontrack = (e) => {
  const audio = document.createElement('audio');
  audio.autoplay = true;
  audio.srcObject = new MediaStream([e.track]);
  document.body.appendChild(audio);
};

dc.onopen = () => {
  dc.send(JSON.stringify({
    type: 'session.update',
    session: {
      type: 'realtime',
      model: 'openai/gpt-4o-mini',
      instructions: 'You are a helpful voice assistant.',
      output_modalities: ['audio', 'text'],
      audio: {
        input: { turn_detection: { type: 'semantic_vad', eagerness: 'medium', create_response: true, interrupt_response: true } },
        output: { voice: 'Clive', model: 'inworld-tts-2' }
      }
    }
  }));
};

dc.onmessage = (e) => {
  const msg = JSON.parse(e.data);
  if (msg.type === 'response.output_text.delta') console.log(msg.delta);
  if (msg.type === 'error') console.error(msg.error?.message);
};

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// wait for ICE gathering...
const res = await fetch(cfg.url, {
  method: 'POST',
  headers: { 'Content-Type': 'application/sdp', Authorization: `Bearer ${cfg.api_key}` },
  body: pc.localDescription.sdp,
});
await pc.setRemoteDescription({ type: 'answer', sdp: await res.text() });
```

## Option 2: OpenAI Agents SDK

The [OpenAI Agents SDK](https://github.com/openai/openai-agents-js) manages the full WebRTC lifecycle — peer connection, SDP exchange, mic, and audio playback:

```javascript theme={"system"}
import { RealtimeSession, RealtimeAgent, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'assistant',
  instructions: 'You are a helpful voice assistant.',
  model: 'openai/gpt-4o-mini',
});

const cfg = await (await fetch('/api/config')).json();
const audioEl = document.createElement('audio');
audioEl.autoplay = true;

const session = new RealtimeSession(agent, {
  transport: new OpenAIRealtimeWebRTC({
    useInsecureApiKey: true,
    audioElement: audioEl,
    changePeerConnection: async (pc) => {
      if (cfg.ice_servers?.length) pc.setConfiguration({ iceServers: cfg.ice_servers });
      return pc;
    },
  }),
  model: 'gpt-4o-realtime-preview-2025-06-03',
});

await session.connect({ url: cfg.url, apiKey: cfg.api_key });
session.sendMessage('Hello!');
```

The server-side `/api/config` endpoint is identical to Option 1.

## WebSocket vs WebRTC

|              | WebSocket             | WebRTC                    |
| ------------ | --------------------- | ------------------------- |
| **Audio**    | PCM16 base64 (manual) | OPUS via RTP (native)     |
| **Latency**  | Higher                | Lower (UDP)               |
| **NAT**      | Not needed            | ICE (STUN/TURN)           |
| **Events**   | WS messages           | DataChannel (same schema) |
| **Best for** | Server-side / Node.js | Browser voice apps        |

[API reference](/api-reference/realtimeAPI/realtime/realtime-websocket) for full event schemas.