▶ Interactive Lab

Voice Activity Detection (VAD)

Energy threshold VAD: detect speech vs silence in synthetic audio.

Advertisement
Simulated audio: silence → speech → silence. VAD fires when energy crosses threshold.

What you're seeing

Simple energy-threshold VAD: compute RMS energy per 20ms frame; flag as speech if above threshold. Cheap and fast. Vulnerable to background noise spikes.

Neural VAD (Silero, py-webrtcvad) replaces threshold with a small classifier. Near-zero false positives on stationary noise. Standard in production voice agents.

★ KEY TAKEAWAY
VAD distinguishes speech from silence. Energy threshold is simple but noisy. Neural VAD (Silero) is the production default.
▶ WHAT TO TRY
  • Slide Threshold — see VAD activation change.
  • Too low: triggers on noise. Too high: misses quiet speech.