Algorithmic Tasks
Can we detect phase transitions in settings like modular arithmetic, multitask sparse parity, and greatest common denominator using the learning coefficient?
A setting we’re particularly interested in exploring are algorithmic tasks. These include, for example, modular arithmetic (the typical “grokking” setting), multitask sparse parity, finding the greatest common denominator, various teacher-student set-ups, etc.
In many of these models, researchers have observed beahviors that qualitatively appear to be phase transitions. Can we make this more precise? Are they phase transitions in the Bayesian sense? Using techniques like learning coefficient estimation, can we detect novel signals of phase transitions?
These kind of settings offer a partial ground truth — we know how to implement the given algorithms using conventional models of computation — which makes them interesting test beds for exploring the experimental techniques of developmental interpretability.
One of the best places to go for inspiration on this front is Callum McDougall’s monthly mechanistic interpretability challenges. These are excellent toy problems with quick feedback loops that will get you quickly up to speed with the techniques of developmental interpretability. This is also extremely parallelizable work: take any one model and chances are, nobody else has looked at its development yet.
Where to begin:
- Quantifying degeneracy (Lau et al. 2023) ,
- Modular arithmetic ,
- Multitask sparse parity ,
- Greatest common denominator ,
- Grokking notebook (start here) ,
- Monthly algorithmic problems in Mech Interp
If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.