Update

See our paper The Developmental Landscape of In-context Learning, for a developmental analysis of induction formation.

Background

If grokking is the first example that comes to mind when thinking of phase transitions in neural networks, then induction heads are the second example.

What does the “induction bump” look like from the perspective of the learning coefficient? Can we detect the formation of induction heads using this quantity? When comparing models of different sizes, do we notice the difference between single-layer transformers and multi-layer transformers in the learning coefficient?

Quick Links

Induction Heads

Project Details

Team & Contact

Tags

Update

Background

Where to Begin