Advertisement
Forward: compute outputs. Backward: compute gradients via chain rule.
What you're seeing
3-layer network: x → h1 → h2 → ŷ. Forward computes activations layer by layer. Backward computes ∂L/∂params via chain rule: from loss back to weights, multiplying Jacobians.
PyTorch's autograd builds a computation graph during forward and traverses it backward automatically. You write forward; backward is free.
★ KEY TAKEAWAY
Backward = forward, traversed in reverse. Each layer's gradient is the chain rule applied to its local Jacobian.
▶ WHAT TO TRY
- Click Forward pass to see activations flow left-to-right.
- Click Backward pass to see gradients flow back right-to-left.
- This is what PyTorch's autograd does for you, automatically, for any computation graph.