Skip to main content
Overview > Voice Activity Detector Class: VoiceActivityDetector | Inherits from: PlayerVoiceDetector | Implements: ICalibrateAudioHandler Advanced voice activity detection module that uses the Inworld VAD system for speech detection. Extends PlayerVoiceDetector to provide more sophisticated voice activity analysis using native DLL functions. This detector provides more accurate results than simple volume-based detection.

Methods

Reference

DetectPlayerSpeaking

Detects whether the player is currently speaking using the Inworld VAD (Voice Activity Detection) system. Uses native DLL functions to analyze audio characteristics beyond simple volume thresholds.

Returns

Type: bool Description: True if voice activity is detected in the current audio data; otherwise, false.

GetAudioChunk

Converts a list of float audio samples into an InworldAudioChunk for processing by the VAD system. Creates the appropriate data structure required by the native DLL voice activity detection functions.

Parameters

ParameterTypeDescription
audioDataList<float>The audio sample data as normalized float values.

Returns

Type: AudioChunk Description: An InworldAudioChunk object containing the audio data formatted for VAD processing.

Voice Activity Detection Process

The VoiceActivityDetector performs the following steps for voice detection:

Prerequisites Check

  1. Controller Validation: Verifies InworldController instance exists
  2. VAD Module Check: Ensures VAD module is available
  3. Audio Data Validation: Confirms audio data is present

Detection Process

  1. Audio Chunk Creation: Converts float samples to AudioChunk format
  2. VAD Analysis: Calls native DLL function DetectVoiceActivity()
  3. Result Interpretation: Returns true if VAD result is >= 0 (positive detection)

Audio Format Requirements

  • Sample Rate: 16kHz (16000 Hz)
  • Data Format: Normalized float values (-1.0 to 1.0)
  • Data Structure: InworldVector<float> containing audio samples

Inheritance and Interface Implementation

PlayerVoiceDetector Base Class

This class extends PlayerVoiceDetector, which provides:
  • Basic voice detection infrastructure
  • Audio data collection and management
  • Integration with the audio processing pipeline

ICalibrateAudioHandler Interface

The base class implements ICalibrateAudioHandler, providing:

Important Notes

Native DLL Integration

This detector uses native DLL functions for voice activity detection:
  • Function: InworldController.VAD.DetectVoiceActivity(audioChunk)
  • Return Values:
    • -1: Negative detection (no voice activity)
    • 0 and above: Positive detection (voice activity present)

Advanced Detection Capabilities

Unlike simple volume-based detection, this module:
  • Analyzes audio characteristics beyond amplitude
  • Uses sophisticated algorithms for speech pattern recognition
  • Provides more accurate voice activity detection
  • Reduces false positives from background noise

Integration with Audio Pipeline

The VoiceActivityDetector integrates with the audio processing pipeline:
  • Receives audio data from the collection modules
  • Processes audio through the VAD system
  • Updates the IsPlayerSpeaking state in InworldAudioManager
  • Triggers appropriate events for voice start/stop detection

Performance Considerations

  • Uses native DLL functions for efficient processing
  • Requires proper audio data format for accurate detection
  • Depends on InworldController and VAD module availability
  • Processes audio in real-time during the audio coroutine