MoE Top-K Routing — Belgavi.AI Lab

Advertisement

Experts K

Active params per token = K × params/expert. Total = N × params/expert. Memory = total.

Router decides per-token routing. Without balancing loss: dead experts.

★ KEY TAKEAWAY

MoE routes each token to top-K experts. Total params high, active compute low. Memory cost = all experts must be loaded.

▶ WHAT TO TRY