Are Emergent Abilities of Large Language Models a Mirage?
https://arxiv.org/pdf/2304.15004.pdf
Authored by Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo
Computer Science, Stanford University
This work challenges the notion of emergent abilities in large language models, suggesting that these abilities are not inherent to the model’s scale but rather a result of the choice of metrics used in research. Emergent abilities are defined as new capabilities that appear abruptly and unpredictably as the model scales up. The authors propose that when a specific task and model family are analyzed with fixed model outputs, the appearance of emergent abilities is influenced by the type of metric chosen: nonlinear or discontinuous metrics tend to show emergent abilities, whereas linear or continuous metrics show smooth, predictable changes in performance.
To support this hypothesis, the authors present a simple mathematical model and conduct three types of analyses:
- Examining the effect of metric choice on the InstructGPT/GPT-3 family in tasks where emergent abilities were previously claimed.
- Performing a meta-analysis on the BIG-Bench project to test predictions about metric choices in relation to emergent abilities.
- Demonstrating how metric selection can create the illusion of emergent abilities in various vision tasks across different deep networks.
Their findings suggest that what has been perceived as emergent abilities could be an artifact of certain metrics or insufficient statistical analysis, implying that these abilities might not be a fundamental aspect of scaling AI models.
—
Emergent abilities of large language models are created by the researcher’s chosen
https://arxiv.org/pdf/2304.15004.pdf
metrics, not unpredictable changes in model behavior with scale.
The term “emergent abilities of LLMs” was recently and crisply defined as “abilities that are not
present in smaller-scale models but are present in large-scale models; thus they cannot be predicted
by simply extrapolating the performance improvements on smaller-scale models”. Such emergent abilities were first discovered in the GPT-3 family. Subsequent work emphasized the discovery, writing that “[although model] performance is predictable at a general level, performance on a
specific task can sometimes emerge quite unpredictably and abruptly at scale”.
