WebRTC Voice Agent

> ## Documentation Index > Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt > Use this file to discover all available pages before exploring further. # WebRTC Quickstart Build a browser-based voice agent that streams audio to the Inworld Realtime API using WebRTC. Audio is handled natively by the browser — no manual PCM encoding or base64 conversion needed. WebRTC is ideal for browser voice apps with low latency. For server-side integrations, see the [WebSocket Quickstart](/realtime/quickstart-websocket). ## Get Started Create an [Inworld account](https://platform.inworld.ai/signup). In [Inworld Portal](https://platform.inworld.ai/), generate an API key by going to **Settings** > **API Keys**. Copy the Base64 credentials.

Create a `.env` file: ```shell .env theme={"system"} INWORLD_API_KEY=your-base64-api-key-here ``` Create `server.js`. It serves the page and provides a `/api/config` endpoint that fetches ICE servers from the WebRTC proxy while keeping the API key server-side. ```javascript server.js theme={"system"} import 'dotenv/config'; import { readFileSync } from 'fs'; import { createServer } from 'http'; const html = readFileSync('index.html'); const API_KEY = process.env.INWORLD_API_KEY || ''; const PROXY = 'https://api.inworld.ai'; const server = createServer(async (req, res) => { if (req.url === '/api/config') { let ice = []; try { const r = await fetch(`${PROXY}/v1/realtime/ice-servers`, { headers: { Authorization: `Bearer ${API_KEY}` }, }); if (r.ok) ice = (await r.json()).ice_servers || []; } catch {} res.writeHead(200, { 'Content-Type': 'application/json' }); res.end(JSON.stringify({ api_key: API_KEY, ice_servers: ice, url: `${PROXY}/v1/realtime/calls` })); return; } res.writeHead(200, { 'Content-Type': 'text/html' }); res.end(html); }); let port = 3000; server.on('error', (e) => { if (e.code === 'EADDRINUSE') { console.warn(`Port ${port} in use, trying ${++port}…`); server.listen(port); } else throw e; }); server.listen(port, () => console.log(`Open http://localhost:${port}`)); ``` Create `index.html` in the same directory. It connects via WebRTC, streams mic audio automatically, and plays agent audio through an RTP track. ```html index.html theme={"system"} WebRTC Voice Agent ``` ```bash theme={"system"} npm init -y && npm pkg set type=module npm install dotenv node server.js ``` Open [http://localhost:3000](http://localhost:3000) and click **Start Conversation**. The agent greets you with audio. ## Option 2: Using OpenAI Agents SDK If you're building a more advanced voice agent with features like agent handoffs, tool calling, and guardrails, you can use the [OpenAI Agents SDK](https://github.com/openai/openai-agents-js) with Inworld's WebRTC proxy. We provide a ready-to-run playground based on OpenAI's realtime agents demo. ```bash theme={"system"} git clone https://github.com/inworld-ai/experimental-oai-realtime-agents-playground.git cd experimental-oai-realtime-agents-playground npm install ``` If you are unable to access this repository, please contact **[support@inworld.ai](mailto:support@inworld.ai)** for access. Open `.env` and set `OPENAI_API_KEY` to your **Inworld** API key (the same Base64 credentials from [Inworld Portal](https://platform.inworld.ai/)): ```shell .env theme={"system"} OPENAI_API_KEY=your-inworld-base64-api-key-here ``` Despite the variable name `OPENAI_API_KEY`, this must be your **Inworld** API key — not an OpenAI key. The SDK uses this variable name by convention, but the playground routes all traffic through the Inworld WebRTC proxy. ```bash theme={"system"} npm run dev ``` Open [http://localhost:3000](http://localhost:3000). Select a scenario from the **Scenario** dropdown and start talking. The playground includes two agentic patterns: * **Chat-Supervisor** — A realtime chat agent handles basic conversation while a more capable text model (e.g. `gpt-4.1`) handles tool calls and complex responses. * **Sequential Handoff** — Specialized agents transfer the user between them to handle specific intents (e.g. authentication → returns → sales). For full details on customizing agents, see the playground's README. *** ## How It Works | Component | Role | | ------------------------ | --------------------------------------------------------------- | | **Browser** | Captures mic audio via WebRTC, plays agent audio from RTP track | | **Node.js server** | Serves the page and `/api/config` (ICE servers + API key) | | **WebRTC proxy** | Bridges WebRTC ↔ WebSocket, transcodes OPUS ↔ PCM16 | | **Inworld Realtime API** | Handles speech-to-text, LLM processing, and text-to-speech | Key differences from WebSocket: * Audio flows via **RTP tracks** (no base64 encoding) * Events flow via **DataChannel** (same JSON schema) * Browser handles **OPUS codec** natively ## Next Steps Full connection details, session config, and SDK integration. VAD configuration, audio formats, and conversation flow. Migrate from OpenAI Realtime API to Inworld.