Echo Cancellation for Speakerphone

Acoustic Echo Cancellation (AEC) removes your own voice (played from the speaker) from the microphone input before sending to the far end. Without it, every speakerphone call becomes a feedback loop. Modern AEC uses adaptive filters that learn the acoustic path from speaker to mic and subtract the echo in real time.

Advertisement

The signal model

Far-end signal x(n) plays through speaker → room acoustics → reaches mic with delay and reverberation. Near-end signal d(n) (mic) = h(n)*x(n) + s(n) + noise, where h(n) is the unknown impulse response of the room and s(n) is the near-end speaker. AEC estimates h(n) and subtracts.

NLMS adaptive filter

Normalized Least Mean Squares: update filter coefficients ĥ(n) proportional to error × normalized input. Converges quickly, low complexity. μ=0.2 typical. Filter length matches expected reverb time (e.g., 200ms @ 16kHz = 3200 taps).

Advertisement

Double-talk detection (DTD)

When both far-end and near-end speak simultaneously, NLMS would 'unlearn' the room model. A DTD freezes adaptation during double-talk. Common method: compare normalized cross-correlation of mic and reference — high correlation = no double-talk, low = both talking.

Frequency-domain implementation

Time-domain filter with N taps costs O(N) per sample. Frequency-domain (Partitioned Block Convolution) costs O(log N) per sample via FFT. WebRTC's AEC3 uses frequency-domain with multiple delay partitions — essential for long reverberation tails.

Residual echo + AES

Linear AEC removes ~30-40 dB of echo. Remaining residual (mainly nonlinear distortion from cheap speakers) goes through Acoustic Echo Suppressor (AES) — frequency-domain gain reduction during far-end activity. Together: 50-60 dB echo reduction, indistinguishable from a non-echoey call.

NLMS + DTD + frequency-domain + AES residual suppressor. WebRTC AEC3 is the open-source reference; don't roll your own from scratch.