▶ Interactive Lab

Q4_K Block Layout

Block of 256 weights = sub-groups × 4-bit values + scales.

Advertisement
256 weights → 8 sub-groups × 32 weights + 8 fp16 scales = 144 bytes.

What you're seeing

Per-sub-group scales capture local variation. Cheap metadata for big quality gain over per-tensor.

★ KEY TAKEAWAY
Q4_K packs 256 weights into ~144 bytes (4.5 bits/weight) using per-32 sub-group scaling. Block-wise quant captures local variation cheaply.
▶ WHAT TO TRY
  • See how 8 sub-groups each get their own FP16 scale.
  • Total cost: 4 bits per weight + ~0.5 bits of metadata overhead.