โ† Back to blog Demo 1 / 3

Gating Network Visualizer

Input token
๐Ÿ’ป
def sort(arr):
Code / syntax
โˆซ
โˆซ f(x) dx
Mathematical
๐ŸŒ
Bonjour
Multilingual
๐Ÿ”—
If A then B
Logical
๐Ÿ“–
The capital of
Factual / entity
Routing config
What am I seeing?
The gating network is a single linear layer that maps the token embedding to N logit scores โ€” one per expert. After TopK masking and softmax, the top experts receive the token with weighted contributions.
Input
Token embedding
def sort(arr):
768-dim vector (simplified)
W_g ยท x
Gating network
Gate
Linear โ†’ TopK โ†’ Softmax
H = W_g ยท x + noise
G = softmax(topK(H, 2))
weights
Experts

Routing probabilities โ€” raw logits vs. softmax weights

Expert
Weight (softmax)
Logit
Prob
Output formula
Output = wโ‚ยทEโ‚(x) + wโ‚‚ยทEโ‚‚(x)
Active experts: