LLC Analysis of Grokking Phenomena

Investigating grokking through the lens of the local learning coefficient

Type: Applied
Difficulty: Medium
Status: Unstarted

This project aims to investigate the grokking phenomenon using the local learning coefficient (LLC) and other developmental interpretability techniques.

Key research questions:

  1. Can we use the LLC to detect and characterize the phase transition associated with grokking in modular arithmetic tasks?
  2. How does the LLC behave differently for “pizza” vs. “clock” solutions in modular arithmetic tasks?
  3. Can we distinguish between different types of tasks (e.g., addition vs. division) using LLC analysis?
  4. How does the LLC behave in multi-task settings where models learn multiple operations simultaneously?
  5. Can LLC analysis provide insights into the underlying mechanisms of grokking?

Methodology:

  1. Implement various grokking scenarios, including modular arithmetic with different operations and multi-task settings.
  2. Train models on these tasks, tracking the LLC and other relevant metrics throughout training.
  3. Analyze LLC trajectories to identify potential phase transitions associated with grokking.
  4. Compare LLC behavior for different solution types (e.g., “pizza” vs. “clock”) and task types.
  5. Investigate LLC behavior in multi-task settings where models learn multiple operations.
  6. Explore ways to improve the numerical stability of LLC estimates in grokking scenarios.

Expected outcomes:

  1. Characterization of grokking phase transitions using LLC analysis.
  2. Insights into the differences between “pizza” and “clock” solutions from an LLC perspective.
  3. Understanding of how LLC behaves across different types of arithmetic operations and in multi-task settings.
  4. Improved techniques for numerically stable LLC estimation in grokking scenarios.
  5. Potential new insights into the mechanisms underlying the grokking phenomenon.

This research will extend our understanding of grokking through the lens of Singular Learning Theory and developmental interpretability. It may provide new tools for detecting and analyzing sudden generalization in neural networks.

Update

Nina Rimsky and Dmitry Vaintrob put out an investigation into the learning coefficient in modular arithmetic and a random commutative operation. Check out their work here.

Note: there’s still a lot to explore in this setting. For example, a natural follow-up question is whether we can distinguish the pizza vs. clock solutions using the learning coefficient. Other questions in this vein would include comparing the learning coefficient for different kinds of task (addition vs. division, etc.). Additional interesting extensions would be to look at modular arithmetic settings with multiple tasks. If we include the operator and modulus as a token in the context window, then models should be able to learn multiple operations. How does varying the tasks affect the learning coefficient?

Where to begin:

If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.