Scaling Local Learning Coefficients

Investigating how the Local Learning Coefficient (LLC) varies across model sizes and architectures.

Type: Applied
Difficulty: Hard
Status: Unstarted

We’d like to better understand how the Local Learning Coefficient (LLC) varies across model size. This project aims to investigate the scaling behavior of LLCs and develop rigorous methods for comparing LLCs between models of different sizes.

Key research questions include:

  1. How does the LLC scale with model size for different architectures (e.g., MLPs, CNNs, Transformers)?
  2. Under what conditions are we allowed to compare LLCs between models of different sizes?
  3. What modifications do we have to make to hyperparameters to make these comparisons valid?
  4. How does a rehyperparametrization like the MuParametrization affect LLC estimates?

Methodology:

  1. Implement efficient LLC estimation techniques that can scale to large models.
  2. Train a series of models of increasing size across different architectures.
  3. Estimate LLCs at various checkpoints during training for each model.
  4. Develop and test methods for normalizing LLC estimates across model sizes.
  5. Investigate the impact of different parameterizations (e.g., MuParametrization) on LLC estimates.

Expected outcomes:

  1. Empirical data on how LLCs scale with model size across different architectures.
  2. A set of best practices for comparing LLCs between models of different sizes.
  3. Insights into how different parameterizations affect LLC estimates.
  4. Potential discovery of scaling laws for LLCs.

This research could provide valuable insights into how model complexity changes with scale, which is crucial for understanding the development of capabilities in large models.

Where to begin:

If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.