Project Ideas

This page outlines various project ideas to inspire you if you're interested in getting involved in DevInterp. A good place to start is with the open problems lecture from the 2023 DevInterp conference:

Starter Notebooks

Before diving into a new project, we recommend building familiarity by going through some of the starter notebooks in the devinterp repo. These notebooks can also serve as a starting point for further investigation.

Active Projects

We encourage replication but discourage scooping each other: there are enough interesting problems to solve that we shouldn't be unnecessarily duplicating effort, since it slows progress in AI safety and is bad for the community. That said, if you're particularly interested in one of the projects that's active, please reach out to see if there's an opportunity to get involved and collaborate.

Natural Gradient Descent

SLT suggests natural gradient descent should generalize worse. Can we formalize this with the RLCT?

Type: Applied

Difficulty: Easy

Status: In-progress

Understanding relative finite variance in simple models

SLT requires the assumption of "relative finite variance". Is this assumption satisfied?

Type: Theoretical

Difficulty: Hard

Status: In-progress

SGD vs. Bayes in Toy Landscapes

Guillaume Corlouer

SLT is about Bayesian learning. What can it say about SGD?

Type: Applied

Difficulty: Hard

Status: In-progress

Trojan Detection via Learning Coefficient Analysis

Kelechi Stewart & Ben Blaker

Investigating the use of the Local Learning Coefficient (LLC) for detecting trojans in neural networks

Type: Applied

Difficulty: Hard

Status: In-progress

Development of Vision Circuits

Explore vision circuits development.

Type: Applied

Difficulty: Hard

Status: In-progress

DevInterp-Flavored Projects

The number one thing we encourage people to do who want to get involved in DevInterp, especially if they're interested in the empirical aspect, is to just go out and study the development of models that haven't been studied yet.

The easiest place to start is transformers trained on algorithmic tasks. Just choose one (or come up with your own), and start applying tools from devinterp (such as local learning coefficient estimation, essential dynamics (coming soon), Oku-Aihara covariance analysis (ditto), etc.) as well as more "traditional" tools from mechinterp (i.e., "progress measures").

Algorithmic Tasks

Can we detect phase transitions in settings like modular arithmetic, multitask sparse parity, and greatest common denominator using the learning coefficient?

Type: Applied

Difficulty: Easy

Status: Unstarted

Toy Models of Superposition

Can we classify further transitions in toy models?

Type: Applied

Difficulty: Easy

Status: Unstarted

LLC Analysis of Grokking Phenomena

Investigating grokking through the lens of the local learning coefficient

Type: Applied

Difficulty: Medium

Status: Unstarted

Lottery Tickets vs. DevInterp

Is the lottery ticket hypothesis compatible with DevInterp? Or do they contradict?

Type: Applied

Difficulty: Medium

Status: Unstarted

Toy Models of LayerNorm

LayerNorm can have a large impact on learning dynamics. Can we characterize this in a simple toy model?

Type: Applied

Difficulty: Medium

Status: Unstarted

LLC Dynamics in Adversarial Training

Analyzing how adversarial training affects the Local Learning Coefficient and exploring its relationship with adversarial robustness

Type: Applied

Difficulty: Hard

Status: Unstarted

Learning Coefficient Analysis of Double Descent Phenomena

Examining epoch-wise and model-wise double descent through the lens of the learning coefficient

Type: Applied

Difficulty: Hard

Status: Unstarted

LLC Analysis of Jailbreak Susceptibility in Language Models

Investigating the relationship between Local Learning Coefficient dynamics and susceptibility to jailbreaks in large language models

Type: Applied

Difficulty: Hard

Status: Unstarted

RL of Board Game Agents

Train board game bots like tic-tac-toe, Othello, etc., and track learning coefficients.

Type: Applied

Difficulty: Hard

Status: Unstarted

SLT-Flavored Projects

If you're interested in something slightly more theoretical, there many interesting questions in the context of SLT.

LLCs and Ablations

Comparing LLC estimation in weights to different forms of ablations in activations.

Type: Applied

Difficulty: Medium

Status: Unstarted

LLCs of Compiled Neural Networks

Studying the Local Learning Coefficient in neural networks compiled from known programs.

Type: Applied

Difficulty: Medium

Status: Unstarted

LLCs and Unlearning

Investigating how unlearning procedures like LEACE affect the Local Learning Coefficient.

Type: Applied

Difficulty: Medium

Status: Unstarted

Review of Complexity Measures

A comprehensive review and comparison of different notions of effective dimensionality in machine learning models.

Type: Applied

Difficulty: Hard

Status: Unstarted

Extending the MDL Principle to Singular Models

Investigating the connection between the learning coefficient and the Minimum Description Length principle

Type: Theoretical

Difficulty: Hard

Status: Unstarted

Saddles and Metastability in SLT

Extending Singular Learning Theory to saddle points and investigating metastability in neural networks

Type: Theoretical

Difficulty: Hard

Status: Unstarted

Scaling Local Learning Coefficients

Investigating how the Local Learning Coefficient (LLC) varies across model sizes and architectures.

Type: Applied

Difficulty: Hard

Status: Unstarted

Engineering Projects

Are you more of an research engineer than research scientist? Consider filing a PR and adding features/fixing bugs in the devinterp repo. There's plenty to do.

Theoretical Projects

We discourage you from working on more theoretical projects unless you really, really know what you're doing. Reach out to us.

Extending the MDL Principle to Singular Models

Investigating the connection between the learning coefficient and the Minimum Description Length principle

Type: Theoretical

Difficulty: Hard

Status: Unstarted

Saddles and Metastability in SLT

Extending Singular Learning Theory to saddle points and investigating metastability in neural networks

Type: Theoretical

Difficulty: Hard

Status: Unstarted

Completed Projects

Just because a project is marked as "completed" here doesn't mean that this direction is closed off. It's often very helpful to begin with replications because it gives you a clear reference to compare results against. You're also sure to run into follow-up questions that the original authors didn't address, so you can always go deeper.

Task Variability

Arjun Panickssery

Explore the effects of changing the number of tasks or digits in MNIST.

Type: Applied

Difficulty: Easy

Status: Completed

Induction Heads

In the original context, detect the formation of induction heads.

Type: Applied

Difficulty: Medium

Status: Completed