Examples

3-digit multiplication. IPA transliteration. Chain-of-thought benefits. All ~poor below ~50B params, sudden competence at 100B+.

Advertisement

Critique

Schaeffer et al 2023: emergence often artifact of harsh metrics (exact-match). Continuous metrics show gradual improvement.

Advertisement

Design implication

Test with SMALL models first. If task fails at 8B, may work at 70B. But: expensive test. And smooth-metric emergence is real for hard tasks.