Advertisement
GGUF: header + tensor index + tensor data. Self-describing, mmap-friendly.
What you're seeing
GGUF (GGML Unified Format): used by llama.cpp, Ollama, LM Studio, etc. Single file holds model metadata, tokenizer, tensors. Memory-mapped at load time → fast cold start.
Quant suffixes: Q4_K_M = 4-bit K-means with medium mix. Q8_0 = 8-bit. F16 = half precision. Smaller = less RAM but lower quality.
★ KEY TAKEAWAY
GGUF: single file with header + tokenizer + tensors. Self-describing. Used by llama.cpp ecosystem.
▶ WHAT TO TRY
- Switch quant variants (Q4_K_M, Q5_K_M, etc.) and model sizes.
- See how tensor data dominates total file size.