- Speech-to-text (STT) - for understanding speech inputs
- LLM - for generating the agent text response
- Text-to-speech (TTS) - for generating agent speech audio
Architecture
- Backend: Inworld Runtime + Express.js
- Frontend: Vite + React
- Communication: WebSocket
Run the Template
Start the Server
-
Clone the Voice Agent GitHub repo:
-
Navigate to the server directory:
-
Copy the
.env-samplefile to.env: -
Configure your
.envfile with required API keys:.envGet your Assembly.AI API key for speech-to-text functionality. -
Install dependencies:
-
Start the server:
The server will start on port 4000.
Start the Client
- Open a new terminal window.
- Navigate to the client directory:
- (Optional) Create a
.envfile to customize client behavior:.env - Install dependencies:
- Start the client:
The client will start on port 3000 (or the next available port if 3000 is in use) and should automatically open in your default browser.
Chat with Your Agent
-
Configure the agent:
- Enter the agent system prompt
- Click “Create Agent”
-
Start chatting:
- Voice input: Click the microphone icon to unmute yourself, speak, then click again to mute
- Text input: Type in the input field and press Enter to send
-
Monitor performance:
- View dashboards, traces, and logs in the Inworld Portal
- Enable
VITE_ENABLE_LATENCY_REPORTING=truein client.envto see latency metrics in the UI