Cache Blocking for Matmul — Belgavi.AI Lab

Advertisement

Block size 2

Each block of C is computed from corresponding tiles of A and B.

C[i:i+bs, j:j+bs] += A[i:i+bs, :] · B[:, j:j+bs]. Tiles fit in L1; reused many times.

★ KEY TAKEAWAY

Cache blocking: tile matmul so each block fits in L1. Reuse data many times before evicting. ~10× speedup over naive.

▶ WHAT TO TRY