▶ Interactive Lab

MoE Top-K Routing

Tokens route to top-K experts; load balance matters.

Advertisement
Active params per token = K × params/expert. Total = N × params/expert. Memory = total.

What you're seeing

Router decides per-token routing. Without balancing loss: dead experts.

★ KEY TAKEAWAY
MoE routes each token to top-K experts. Total params high, active compute low. Memory cost = all experts must be loaded.
▶ WHAT TO TRY
  • Click Route 16 tokens repeatedly — watch load balance vary.
  • Without an auxiliary balance loss, some experts go unused and others overload.