devinterp Project

Algorithmic Tasks

Can we detect phase transitions in settings like modular arithmetic, multitask sparse parity, and greatest common denominator using the learning coefficient?

Project Details

Status: Unstarted

Difficulty: Easy

Type: Applied

Team & Contact

Tags

devinterp

A setting we’re particularly interested in exploring are algorithmic tasks. These include, for example, modular arithmetic (the typical “grokking” setting), multitask sparse parity, finding the greatest common denominator, various teacher-student set-ups, etc.

In many of these models, researchers have observed beahviors that qualitatively appear to be phase transitions. Can we make this more precise? Are they phase transitions in the Bayesian sense? Using techniques like learning coefficient estimation, can we detect novel signals of phase transitions?

These kind of settings offer a partial ground truth — we know how to implement the given algorithms using conventional models of computation — which makes them interesting test beds for exploring the experimental techniques of developmental interpretability.

One of the best places to go for inspiration on this front is Callum McDougall’s monthly mechanistic interpretability challenges. These are excellent toy problems with quick feedback loops that will get you quickly up to speed with the techniques of developmental interpretability. This is also extremely parallelizable work: take any one model and chances are, nobody else has looked at its development yet.

Where to Begin

Before starting this project, we recommend familiarizing yourself with these resources:

Quantifying degeneracy (Lau et al. 2023)

Modular arithmetic

Multitask sparse parity

Greatest common denominator

Grokking notebook (start here)

Monthly algorithmic problems in Mech Interp

Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.