Advertisement
Y[i,j] = sum_k X[i,k] * W[k,j]. Click Step to compute one entry at a time.
What you're seeing
For shapes [N,d] · [d,m] = [N,m]. Each output entry is a dot product of a row of X with a column of W. Cost: N·m dot products of length d. Total: O(N·m·d) multiply-adds.
BLAS libraries (OpenBLAS, MKL) implement this with cache-blocking, SIMD vectorization, and threading. ~10-100× faster than naive Python loops.
★ KEY TAKEAWAY
Matrix multiplication = a grid of dot products. Each output cell is one row × one column.
▶ WHAT TO TRY
- Click Step one entry to watch one Y[i,j] get computed at a time.
- Click Compute all to see the final result instantly.
- Notice that BLAS libraries do this same math, but ~100× faster via cache blocking + SIMD.