FlashAttention Tiling — Belgavi.AI Lab

Advertisement

Q tile × K tile fits in SRAM. Online softmax across tiles. No full N×N matrix.

Same math; never materialize full attention matrix. 5-10× faster on long context.

★ KEY TAKEAWAY

FlashAttention computes attention block-by-block in SRAM, never materializing the N×N matrix. Same math; 5-10× faster on long context.

▶ WHAT TO TRY

Click Step to advance through Q-tile, K-tile pairs.
Online softmax (running max + sum) gives exact result without ever storing the full matrix.