▶ Interactive Lab

Cross-Entropy Loss Surface

See how loss changes as the model's predicted probability shifts.

Advertisement
loss = -log(prob of correct class). Confident wrong → huge loss. Confident right → near 0.

What you're seeing

3-class example. Adjust the predicted distribution. Loss = -log(p[true_class]). P(class 2) is automatic = 1 - P(0) - P(1).

The asymmetry matters: a confident wrong prediction gets punished orders of magnitude more than a confident right one gets rewarded. This is why a single bad batch can spike training loss.

★ KEY TAKEAWAY
Cross-entropy = -log(prob of correct class). Confident-and-right is cheap; confident-and-wrong is catastrophic.
▶ WHAT TO TRY
  • Set the predicted prob of the true class very low (1%) — loss spikes to ~4.6.
  • Set it high (95%) — loss drops to ~0.05.
  • This asymmetry is why one bad batch can spike training loss.