- Speech-to-text (STT) - for understanding speech inputs
- LLM - for generating the agent text response
- Text-to-speech (TTS) - for generating agent speech audio
Architecture
- Backend: Inworld Agent Runtime + Express.js
- Frontend: Vite + React
- Communication: WebSocket
Prerequisites
- Node.js v20 or higher: Download here
- Assembly.AI API key (required for speech-to-text functionality): Get your API key
- Inworld API key (required): Sign up here or see quickstart guide
Run the Template
Start the Server
-
Clone the Voice Agent GitHub repo:
-
Navigate to the server directory:
-
Copy the
.env-samplefile to.env: -
Configure your
.envfile with required API keys:.envGet your Assembly.AI API key for speech-to-text functionality. -
Install dependencies:
-
Start the server:
The server will start on port 4000.
Start the Client
- Open a new terminal window.
- Navigate to the client directory:
- (Optional) Create a
.envfile to customize client behavior:.env - Install dependencies:
- Start the client:
The client will start on port 3000 (or the next available port if 3000 is in use) and should automatically open in your default browser.
Chat with Your Agent
-
Configure the agent:
- Enter the agent system prompt
- Click “Create Agent”
-
Start chatting:
- Voice input: Click the microphone icon to unmute yourself, speak, then click again to mute
- Text input: Type in the input field and press Enter to send
-
Monitor performance:
- View dashboards, traces, and logs in the Inworld Portal
- Enable
VITE_ENABLE_LATENCY_REPORTING=truein client.envto see latency metrics in the UI
Next steps
Explore templates
Explore more templates for building with the Runtime SDK.
Vibe Code Your App
Learn how to vibe code any workflow or agent