WebRTC is ideal for browser voice apps with low latency. For server-side integrations, see the WebSocket Quickstart.
Get Started
Create an API key
Create an Inworld account.In Inworld Portal, generate an API key by going to Settings > API Keys. Copy the Base64 credentials.
Create a

.env file:.env
Create the server
Create
server.js. It serves the page and provides a /api/config endpoint that fetches ICE servers from the WebRTC proxy while keeping the API key server-side.server.js
Create the frontend
Create
index.html in the same directory. It connects via WebRTC, streams mic audio automatically, and plays agent audio through an RTP track.index.html
Install and run
Option 2: Using OpenAI Agents SDK
If you’re building a more advanced voice agent with features like agent handoffs, tool calling, and guardrails, you can use the OpenAI Agents SDK with Inworld’s WebRTC proxy. We provide a ready-to-run playground based on OpenAI’s realtime agents demo.Clone the playground
If you are unable to access this repository, please contact support@inworld.ai for access.
Configure the API key
Open
.env and set OPENAI_API_KEY to your Inworld API key (the same Base64 credentials from Inworld Portal):.env
Run
- Chat-Supervisor — A realtime chat agent handles basic conversation while a more capable text model (e.g.
gpt-4.1) handles tool calls and complex responses. - Sequential Handoff — Specialized agents transfer the user between them to handle specific intents (e.g. authentication → returns → sales).
How It Works
| Component | Role |
|---|---|
| Browser | Captures mic audio via WebRTC, plays agent audio from RTP track |
| Node.js server | Serves the page and /api/config (ICE servers + API key) |
| WebRTC proxy | Bridges WebRTC ↔ WebSocket, transcodes OPUS ↔ PCM16 |
| Inworld Realtime API | Handles speech-to-text, LLM processing, and text-to-speech |
- Audio flows via RTP tracks (no base64 encoding)
- Events flow via DataChannel (same JSON schema)
- Browser handles OPUS codec natively
Next Steps
WebRTC reference
Full connection details, session config, and SDK integration.
Voice agents
VAD configuration, audio formats, and conversation flow.
OpenAI migration
Migrate from OpenAI Realtime API to Inworld.