Voice Agent

> ## Documentation Index > Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt > Use this file to discover all available pages before exploring further. # WebSocket Quickstart Build a browser-based voice agent that streams audio to the Inworld Realtime API using WebSocket. The WebSocket transport is best for server-side and proxied connections where you can set custom headers. For browser-native voice with lower latency, see the [WebRTC Quickstart](/realtime/quickstart-webrtc). ## Get Started Create an [Inworld account](https://platform.inworld.ai/signup). In [Inworld Portal](https://platform.inworld.ai/), generate an API key by going to **Settings** > **API Keys**. Copy the Base64 credentials.

Set your API key as an environment variable. ```shell macOS and Linux theme={"system"} export INWORLD_API_KEY='your-base64-api-key-here' ``` ```shell Windows theme={"system"} setx INWORLD_API_KEY "your-base64-api-key-here" ``` Create `server.js`. It proxies WebSocket events between the browser and Inworld, configures the voice session, and triggers an initial greeting. ```javascript server.js theme={"system"} import { readFileSync } from 'fs'; import { createServer } from 'http'; import { WebSocketServer, WebSocket } from 'ws'; const html = readFileSync('index.html'); const server = createServer((req, res) => { res.writeHead(200, { 'Content-Type': 'text/html' }); res.end(html); }); const wss = new WebSocketServer({ server, path: '/ws' }); const SESSION_CFG = JSON.stringify({ type: 'session.update', session: { instructions: 'You are a friendly voice assistant. Keep responses brief.', } }); const GREET = JSON.stringify({ type: 'conversation.item.create', item: { type: 'message', role: 'user', content: [{ type: 'input_text', text: 'Greet the user' }] } }); wss.on('connection', (browser) => { let setup = 0; const api = new WebSocket( `wss://api.inworld.ai/api/v1/realtime/session?key=voice-${Date.now()}&protocol=realtime`, { headers: { Authorization: `Basic ${process.env.INWORLD_API_KEY}` } } ); api.on('message', (raw) => { if (setup < 2) { const t = JSON.parse(raw.toString()).type; if (t === 'session.created') { api.send(SESSION_CFG); setup = 1; } else if (t === 'session.updated' && setup === 1) { api.send(GREET); api.send('{"type":"response.create"}'); setup = 2; } } if (browser.readyState === WebSocket.OPEN) browser.send(raw.toString()); }); browser.on('message', (msg) => { if (api.readyState === WebSocket.OPEN) api.send(msg.toString()); }); browser.on('close', () => api.close()); api.on('close', () => { if (browser.readyState === WebSocket.OPEN) browser.close(); }); api.on('error', (e) => console.error('API error:', e.message)); }); let port = 3000; server.on('error', (e) => { if (e.code === 'EADDRINUSE') { console.warn(`Port ${port} in use, trying ${++port}…`); server.listen(port); } else throw e; }); server.listen(port, () => console.log(`Open http://localhost:${port}`)); ``` Create `index.html` in the same directory. It captures microphone audio, plays agent audio, and displays transcripts that fade after each turn. ```html index.html theme={"system"} Voice Agent ``` ```bash theme={"system"} npm init -y && npm pkg set type=module npm install ws node server.js ``` Open [http://localhost:3000](http://localhost:3000) and click **Start Conversation**. The agent greets you with audio. ## How It Works | Component | Role | | ------------------------ | ----------------------------------------------------------------------------------- | | **Browser** | Captures mic audio (PCM16, 24 kHz), plays agent audio | | **Server** | Proxies events between browser and Inworld, holds the API key server-side | | **Inworld Realtime API** | Handles speech-to-text, LLM processing, and text-to-speech in one WebSocket session | Key events used: * `input_audio_buffer.append` — streams mic audio to Inworld * `response.output_audio.delta` — agent audio chunks for playback * `input_audio_buffer.speech_started` — triggers interruption (stops agent playback) ## Next Steps Full connection details, session config, and event handling. Configure the key elements of your voice agent.