> ## Documentation Index > Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt > Use this file to discover all available pages before exploring further. # Multimodal Companion The Real-time Multimodal Companion Template demonstrates how to build an AI companion that combines speech-to-text, image understanding, and text-to-speech through WebSocket communication. This template includes both a Node.js server and a Unity client for a complete real-time interactive experience. Key concepts demonstrated: * Speech-to-text (STT)- Voice input processing with VAD-based segmentation * Multimodal image chat - Combined text and image understanding * Text-to-speech (TTS) - Streaming audio response generation * WebSocket communication - Real-time bidirectional data exchange * Unity integration - Full client implementation for mobile/desktop