Interactive demo — autoregressive vs diffusion generation

Each row = one token committed, left-to-right. The token is locked in before the model sees what comes next.

All 42 positions start masked (█). ~2–3 tokens reveal per step across all positions — not left-to-right.

step 0 / 20 — 0 / 42 revealed t = 20
Key difference: The LLM commits to each token left-to-right — every choice is final before the rest of the sentence exists. The diffusion model sees the whole sequence at once and fills in 2–3 tokens per step across all positions simultaneously, using whatever has already been revealed as context.

Both panels use the same 42-character phrase. Notice that the diffusion model reveals tokens scattered across all positions each step — not sequentially from the left.