devinterp Project

Induction Heads

In the original context, detect the formation of induction heads.

Project Details

Status: Completed
Difficulty: Medium
Type: Applied

Team & Contact

Lead: George Wang
Discord: @_protocol

Tags

devinterp

Update

See our paper The Developmental Landscape of In-context Learning, for a developmental analysis of induction formation.


Background

If grokking is the first example that comes to mind when thinking of phase transitions in neural networks, then induction heads are the second example.

What does the “induction bump” look like from the perspective of the learning coefficient? Can we detect the formation of induction heads using this quantity? When comparing models of different sizes, do we notice the difference between single-layer transformers and multi-layer transformers in the learning coefficient?

Where to Begin

Before starting this project, we recommend familiarizing yourself with these resources:

Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.