A jitter buffer trades latency for smoothness. Static buffers either run dry (audio glitches) or overshoot (lag). Adaptive jitter buffers — what every modern stack ships — adjust target depth to recent jitter statistics.

Advertisement

Target depth from jitter percentile

Sample inter-arrival time over a sliding window. Set target buffer depth = p95 of recent jitter, with floor (10ms) and ceiling (200ms for conversational, 500ms for one-way). Update every 100-500ms.

Adjusting without artifacts

Sudden depth changes cause clicks. Stretch/compress audio gracefully — time-scale modification (WSOLA, PSOLA) preserves pitch while changing duration. Most stacks fade in/out chunks during depth adjustments.

Advertisement

When to drop instead of buffer

Catastrophic delay (>500ms behind): drop the backlog and resync. Better one audible discontinuity than 5 seconds of perpetual lag. Most VoIP stacks trigger this around 800ms-1s behind.

Adaptive target = p95 of recent jitter + WSOLA for smooth changes + emergency drop at catastrophic delay.