Loss Landscape Degeneracy and Stagewise Development of Transformers

Authors

Jesse Hoogland
Timaeus
George Wang
Timaeus
Matthew Farrugia-Roberts
Timaeus
Liam Carroll
Timaeus
Susan Wei
University of Melbourne
Daniel Murfet
University of Melbourne

Publication Details

Published:
February 4, 2024
Venue:
TMLR
Award:
🏆 Best Paper at 2024 ICML HiLD Workshop

Abstract

We show that in-context learning emerges in transformers in discrete developmental stages, when they are trained on either language modeling or linear regression tasks. We introduce two methods for detecting the milestones that separate these stages, by probing the geometry of the population loss in both parameter space and function space. We study the stages revealed by these new methods using a range of behavioral and structural metrics to establish their validity.

Research Details

Main contributions:

  • The LLC automatically detects hidden developmental stages. The LLC reveals a stage-wise development in the formation of in-context learning, much of which is hidden from the loss.
  • Essential dynamics discovers emergent behaviors: This paper introduces essential dynamics and shows that this can be used to discover emergent behaviors. We expect that follow-ups of this technique will lead to new kinds of evals for automatically discovering emergent capabilities (rather than manually evaluated hand-picked capabilities).
  • Developmental interpretability works. Upon further inspection, we find that these hidden stages can be interpreted both behaviorally & structurally. The developmental approach supports mechanistic and behavioral analyses.

See the accompanying tweet thread (distillation coming soon).