Advertisement
Lower bit width = smaller memory + more rounding error. Visualize the trade.
What you're seeing
FP16: 5-bit exponent + 10-bit mantissa. Wide dynamic range, fine precision. ~2 bytes.
INT8: -128 to 127 with a scale factor. ~1 byte. 0.5-2% quality drop on most LLMs.
INT4: 16 levels. 0.5 bytes. 1-3% quality drop with AWQ/GPTQ. Standard for memory-constrained.
★ KEY TAKEAWAY
Lower bit width = smaller memory + more rounding error. Quality drop for typical transformer weights: <0.5% at INT8, ~1-3% at INT4.
▶ WHAT TO TRY
- Click Resample for new weights.
- Compare each precision's reconstruction. MSE rises as bits drop.