Voice Activity Detector - Inworld AI Documentation

Overview > Voice Activity Detector Class: VoiceActivityDetector | Inherits from: PlayerVoiceDetector | Implements: ICalibrateAudioHandler Advanced voice activity detection module that uses the Inworld VAD system for speech detection. Extends PlayerVoiceDetector to provide more sophisticated voice activity analysis using native DLL functions. This detector provides more accurate results than simple volume-based detection.

Reference

DetectPlayerSpeaking

Detects whether the player is currently speaking using the Inworld VAD (Voice Activity Detection) system. Uses native DLL functions to analyze audio characteristics beyond simple volume thresholds.

Returns

Type: bool Description: True if voice activity is detected in the current audio data; otherwise, false.

GetAudioChunk

Converts a list of float audio samples into an InworldAudioChunk for processing by the VAD system. Creates the appropriate data structure required by the native DLL voice activity detection functions.

Parameters

Parameter	Type	Description
audioData	`List<float>`	The audio sample data as normalized float values.

Returns

Type: AudioChunk Description: An InworldAudioChunk object containing the audio data formatted for VAD processing.

Voice Activity Detection Process

The VoiceActivityDetector performs the following steps for voice detection:

Prerequisites Check

Controller Validation: Verifies InworldController instance exists
VAD Module Check: Ensures VAD module is available
Audio Data Validation: Confirms audio data is present

Detection Process

Audio Chunk Creation: Converts float samples to AudioChunk format
VAD Analysis: Calls native DLL function DetectVoiceActivity()
Result Interpretation: Returns true if VAD result is >= 0 (positive detection)

Audio Format Requirements

Sample Rate: 16kHz (16000 Hz)
Data Format: Normalized float values (-1.0 to 1.0)
Data Structure: InworldVector<float> containing audio samples

Inheritance and Interface Implementation

PlayerVoiceDetector Base Class

This class extends PlayerVoiceDetector, which provides:

Basic voice detection infrastructure
Audio data collection and management
Integration with the audio processing pipeline

ICalibrateAudioHandler Interface

The base class implements ICalibrateAudioHandler, providing:

OnStartCalibration() - Called by InworldAudioManager.StartCalibrate()
OnStopCalibration() - Called by InworldAudioManager.StopCalibrate()
OnCalibrate() - Performs calibration-specific operations

Important Notes

Native DLL Integration

This detector uses native DLL functions for voice activity detection:

Function: InworldController.VAD.DetectVoiceActivity(audioChunk)
Return Values:
- -1: Negative detection (no voice activity)
- 0 and above: Positive detection (voice activity present)

Advanced Detection Capabilities

Unlike simple volume-based detection, this module:

Analyzes audio characteristics beyond amplitude
Uses sophisticated algorithms for speech pattern recognition
Provides more accurate voice activity detection
Reduces false positives from background noise

Integration with Audio Pipeline

The VoiceActivityDetector integrates with the audio processing pipeline:

Receives audio data from the collection modules
Processes audio through the VAD system
Updates the IsPlayerSpeaking state in InworldAudioManager
Triggers appropriate events for voice start/stop detection

Performance Considerations

Uses native DLL functions for efficient processing
Requires proper audio data format for accurate detection
Depends on InworldController and VAD module availability
Processes audio in real-time during the audio coroutine

​Methods

​Reference

​DetectPlayerSpeaking

​Returns

​GetAudioChunk

​Parameters

​Returns

​Voice Activity Detection Process

​Prerequisites Check

​Detection Process

​Audio Format Requirements

​Inheritance and Interface Implementation

​PlayerVoiceDetector Base Class

​ICalibrateAudioHandler Interface

​Important Notes

​Native DLL Integration

​Advanced Detection Capabilities

​Integration with Audio Pipeline

​Performance Considerations

Methods

Reference

DetectPlayerSpeaking

Returns

GetAudioChunk

Parameters

Returns

Voice Activity Detection Process

Prerequisites Check

Detection Process

Audio Format Requirements

Inheritance and Interface Implementation

PlayerVoiceDetector Base Class

ICalibrateAudioHandler Interface

Important Notes

Native DLL Integration

Advanced Detection Capabilities

Integration with Audio Pipeline

Performance Considerations