▶ Interactive Lab

Residual Gradient Flow

With and without residuals: how gradient survives depth.

Advertisement
Without residuals: gradient ≈ ‖∂f‖^L. Vanishes for deep nets. With residuals: ≈ 1 (direct path).

What you're seeing

Plot ‖∂L/∂x_0‖ as a function of depth. Without skip connections: product shrinks exponentially. With skips: each layer contributes I + ∂f → identity path keeps gradient ~1.

★ KEY TAKEAWAY
Residual connections keep gradients alive at any depth. Without them, gradients vanish exponentially in layer count.
▶ WHAT TO TRY
  • Increase Depth L to 48 and watch the red (no-residual) curve crash to 10⁻¹⁰.
  • The green (with-residual) curve stays near 1 regardless of depth — that's why we can train 100-layer transformers.