Natural Gradient Descent
SLT suggests natural gradient descent should generalize worse. Can we formalize this with the RLCT?
Various project ideas to inspire you if you're interested in getting involved in DevInterp and SLT research.
A good place to start is with the open problems lecture from the 2023 DevInterp conference. This comprehensive overview covers key research directions and opportunities in developmental interpretability.
Before diving into a new project, we recommend building familiarity by going through some of the starter notebooks in the devinterp repo. These notebooks can also serve as a starting point for further investigation.
We encourage replication but discourage scooping each other: there are enough interesting problems to solve that we shouldn't be unnecessarily duplicating effort, since it slows progress in AI safety and is bad for the community. That said, if you're particularly interested in one of the projects that's active, please reach out to see if there's an opportunity to get involved and collaborate.
SLT suggests natural gradient descent should generalize worse. Can we formalize this with the RLCT?
SLT requires the assumption of "relative finite variance". Is this assumption satisfied?
SLT is about Bayesian learning. What can it say about SGD?
Investigating the use of the Local Learning Coefficient (LLC) for detecting trojans in neural networks
Explore vision circuits development.
The number one thing we encourage people to do who want to get involved in DevInterp, especially if they're interested in the empirical aspect, is to just go out and study the development of models that haven't been studied yet.
The easiest place to start is transformers trained on algorithmic tasks. Just choose one (or come up with your own), and start applying tools from devinterp (such as local learning coefficient estimation, essential dynamics (coming soon), Oku-Aihara covariance analysis (ditto), etc.) as well as more "traditional" tools from mechinterp (i.e., "progress measures").
Can we detect phase transitions in settings like modular arithmetic, multitask sparse parity, and greatest common denominator using the learning coefficient?
Can we classify further transitions in toy models?
Investigating grokking through the lens of the local learning coefficient
Is the lottery ticket hypothesis compatible with DevInterp? Or do they contradict?
LayerNorm can have a large impact on learning dynamics. Can we characterize this in a simple toy model?
Analyzing how adversarial training affects the Local Learning Coefficient and exploring its relationship with adversarial robustness
Examining epoch-wise and model-wise double descent through the lens of the learning coefficient
Investigating the relationship between Local Learning Coefficient dynamics and susceptibility to jailbreaks in large language models
Train board game bots like tic-tac-toe, Othello, etc., and track learning coefficients.
If you're interested in something slightly more theoretical, there many interesting questions in the context of SLT.
Comparing LLC estimation in weights to different forms of ablations in activations.
Studying the Local Learning Coefficient in neural networks compiled from known programs.
Investigating how unlearning procedures like LEACE affect the Local Learning Coefficient.
A comprehensive review and comparison of different notions of effective dimensionality in machine learning models.
Investigating the connection between the learning coefficient and the Minimum Description Length principle
Extending Singular Learning Theory to saddle points and investigating metastability in neural networks
Investigating how the Local Learning Coefficient (LLC) varies across model sizes and architectures.
Are you more of an research engineer than research scientist? Consider filing a PR and
adding features/fixing bugs in the devinterp
repo. There's plenty to do.
We discourage you from working on more theoretical projects unless you really, really know what you're doing. Reach out to us.
Investigating the connection between the learning coefficient and the Minimum Description Length principle
Extending Singular Learning Theory to saddle points and investigating metastability in neural networks
Just because a project is marked as "completed" here doesn't mean that this direction is closed off. It's often very helpful to begin with replications because it gives you a clear reference to compare results against. You're also sure to run into follow-up questions that the original authors didn't address, so you can always go deeper.
Explore the effects of changing the number of tasks or digits in MNIST.
In the original context, detect the formation of induction heads.