Small language models (1B-9B) are where most production LLM workloads will land in 2026 — cheaper, faster, and increasingly capable enough for domain tasks. The big three to know: Microsoft Phi, Alibaba Qwen, Google Gemma. Each has a flavor.
Phi family — quality per parameter
Microsoft's bet: high-quality synthetic data over raw size. Phi-3 (3.8B) competes with 7B models on benchmarks. Strong on reasoning and code. Weaker on knowledge breadth.
Qwen — multilingual + tool use
Alibaba's models with strong Chinese + English coverage, native tool-calling, long context (128K+). Qwen2.5 sizes from 0.5B to 72B. Often the best non-English baseline.
Gemma — open and lightweight
Google's smaller cousins to Gemini. Gemma 2 (2B, 9B) optimized for on-device. Permissive license. Strong English performance. Less tool-use polish than Qwen.
Choosing for your task
Reasoning/code with limited inputs: Phi. Multilingual or tool-heavy: Qwen. On-device English: Gemma. None of these will beat GPT-4-class on hard tasks; they win on cost and latency for tractable tasks.
Fine-tuning lifts the floor
Domain fine-tuning a 3B-7B model often beats GPT-4 on the specific domain task. QLoRA fine-tunes these on a single GPU in hours. The 2026 'cheap and accurate' pattern.