Telephony (Twilio) - Inworld AI Documentation

Twilio ConversationRelay handles call audio and speech-to-text for you and streams user transcripts to your server over a WebSocket. Your server then calls the Realtime Router API with the LLM + TTS feature, which returns the LLM response and synthesized audio in a single streaming request. The server hands the audio back to ConversationRelay for playback.

This integration relies on the Realtime Router API’s LLM + TTS streaming response. Because LLM inference and TTS synthesis happen in one call, the first sentence of audio starts playing before the LLM has finished writing the full response, resulting in improved latency.

Prerequisites

Node.js v20 or later
ngrok account with a reserved static domain (the free tier is sufficient)
Twilio account with a phone number that has Voice capability.
Inworld account with a Router API key

Setup

The steps below walk through the reference implementation in inworld-ai/inworld-api-examples.

1. Clone the example repo

git clone https://github.com/inworld-ai/inworld-api-examples.git
cd inworld-api-examples/integrations/twilio-conversation-relay

The remaining steps run from this directory.

2. Get your Inworld API key

Sign in to the Inworld Portal, open your workspace, and create an API key. The same key works for Router (LLM) and TTS because the integration uses the combined LLM + TTS endpoint.

3. Get a Twilio phone number

In the Twilio Console, buy a phone number with Voice capability. This is the number callers will dial.

4. Reserve an ngrok static domain (for local development)

Install ngrok and reserve a free static domain in the ngrok dashboard. A static domain matters here because Twilio’s webhook URL needs to stay stable between restarts. Without one, every new ngrok session changes the tunnel URL and you have to update the Twilio webhook by hand.

5. Configure environment

Copy the example env file and fill in the required variables:

cp .env.example .env

At minimum, set:

INWORLD_API_KEY=your_inworld_api_key
SERVER_URL=https://your-ngrok-domain.ngrok-free.app

Optional variables (with sensible defaults) let you change the system prompt, LLM model, TTS voice and model, STT provider, and welcome greeting. See the repo README for the full table.

6. Install and run

Install dependencies:

npm install

Then start the dev server and ngrok in two separate terminals:

npm run dev

ngrok http 3000 --url=your-ngrok-domain.ngrok-free.app

7. Point your Twilio number at the webhook

In the Twilio Console, go to Phone Numbers → Manage → Active Numbers → your number. Under Voice Configuration, set A call comes in to Webhook, enter https://your-ngrok-domain.ngrok-free.app/voice, and choose HTTP POST. Save the configuration.

ngrok is only needed for local development so Twilio can reach a server running on your machine. Once you deploy the server to production, update the Twilio webhook to point at your server’s public URL (for example, https://voice.yourdomain.com/voice) and you can drop ngrok entirely.

How it works

An inbound call hits /voice, and the server returns TwiML that hands the call off to ConversationRelay.
ConversationRelay handles the call audio and runs speech-to-text (Deepgram by default), then opens a WebSocket to your server and streams user transcripts as they arrive.
For each user turn, the server calls the Inworld Router API’s chat completions endpoint with an audio block, which returns an SSE stream of both text deltas and base64-encoded PCM audio chunks. This is the LLM + TTS feature.
Inworld’s chunking engine groups the response into text segments at natural sentence boundaries, and each segment’s audio may arrive as multiple base64-encoded PCM chunks. The server assembles the audio for each segment, wraps it in a WAV header, hosts it at a short-lived HTTP URL, and sends a play message to ConversationRelay so the caller hears the first sentence before the LLM has finished generating the rest.
On barge-in, ConversationRelay sends an interrupt message and the server aborts the in-flight Inworld stream via AbortController. Partial responses already spoken are kept in conversation history so context is preserved.

The TwiML returned from /voice looks like this:

<Response>
  <Connect>
    <ConversationRelay
      url="wss://your-ngrok-domain.ngrok-free.app/conversation"
      transcriptionProvider="Deepgram"
      interruptible="true"
      dtmfDetection="true"
    />
  </Connect>
</Response>

Test your integration

Call your Twilio number. The bot should greet you and hold a conversation.

Example implementation

Twilio ConversationRelay integration example

A complete Node.js reference implementation that bridges Twilio ConversationRelay to the Inworld Router API with LLM + TTS.

LLM + TTS (Voice Responses)

How to request combined LLM text and Inworld TTS audio in a single streaming call.

ConversationRelay TwiML Reference

Twilio’s reference for configuring ConversationRelay, including voice hints, parameters, and language options.

ConversationRelay WebSocket Protocol

Full specification of the WebSocket messages exchanged between ConversationRelay and your server.

List TTS Voices

API reference for fetching available Inworld TTS voice IDs to use in the audio block.

​Prerequisites

​Setup

​1. Clone the example repo

​2. Get your Inworld API key

​3. Get a Twilio phone number

​4. Reserve an ngrok static domain (for local development)

​5. Configure environment

​6. Install and run

​7. Point your Twilio number at the webhook

​How it works

​Test your integration

​Example implementation

Twilio ConversationRelay integration example

​Further reading

LLM + TTS (Voice Responses)

ConversationRelay TwiML Reference

ConversationRelay WebSocket Protocol

List TTS Voices

Prerequisites

Setup

1. Clone the example repo

2. Get your Inworld API key

3. Get a Twilio phone number

4. Reserve an ngrok static domain (for local development)

5. Configure environment

6. Install and run

7. Point your Twilio number at the webhook

How it works

Test your integration

Example implementation

Further reading