Examples
3-digit multiplication. IPA transliteration. Chain-of-thought benefits. All ~poor below ~50B params, sudden competence at 100B+.
Advertisement
Critique
Schaeffer et al 2023: emergence often artifact of harsh metrics (exact-match). Continuous metrics show gradual improvement.
Advertisement
Design implication
Test with SMALL models first. If task fails at 8B, may work at 70B. But: expensive test. And smooth-metric emergence is real for hard tasks.