← Back to blog Demo 2 / 3

Sparse Activation Explorer

Configuration

Quick presets

Live metrics iThese four numbers capture the core MoE efficiency tradeoff. Activation rate and FLOPs saved scale with k/N. Capacity/compute ratio = N/k. Memory footprint stays fixed at N regardless of k.

Expert pool — active experts highlighted per token iEach cell is one expert. Purple = active for this token; grey = dormant. Every token independently routes to its own top-k subset — the grid updates with each step of the animation. Click any cell to identify it by index.

Grid view:
Click any cell to see its index
Active
Idle

Token routing simulation iEach incoming token is independently routed to k experts by the gating network. Different tokens typically activate different experts — no single expert sees every token, which is why all-to-all communication is needed on real multi-GPU hardware.

Incoming tokens
Click "Animate token stream" to route tokens

Compute & memory comparison vs. equivalent dense model iCompute (FLOPs) scales with k — only active experts run. Memory stays fixed at N — all experts must be loaded into VRAM. This asymmetry is the central MoE hardware constraint: you pay full memory cost to get partial compute cost.