Scaling Local Learning Coefficients
Investigating how the Local Learning Coefficient (LLC) varies across model sizes and architectures.
Project Details
Tags
We’d like to better understand how the Local Learning Coefficient (LLC) varies across model size. This project aims to investigate the scaling behavior of LLCs and develop rigorous methods for comparing LLCs between models of different sizes.
Key research questions include:
- How does the LLC scale with model size for different architectures (e.g., MLPs, CNNs, Transformers)?
- Under what conditions are we allowed to compare LLCs between models of different sizes?
- What modifications do we have to make to hyperparameters to make these comparisons valid?
- How does a rehyperparametrization like the MuParametrization affect LLC estimates?
Methodology:
- Implement efficient LLC estimation techniques that can scale to large models.
- Train a series of models of increasing size across different architectures.
- Estimate LLCs at various checkpoints during training for each model.
- Develop and test methods for normalizing LLC estimates across model sizes.
- Investigate the impact of different parameterizations (e.g., MuParametrization) on LLC estimates.
Expected outcomes:
- Empirical data on how LLCs scale with model size across different architectures.
- A set of best practices for comparing LLCs between models of different sizes.
- Insights into how different parameterizations affect LLC estimates.
- Potential discovery of scaling laws for LLCs.
This research could provide valuable insights into how model complexity changes with scale, which is crucial for understanding the development of capabilities in large models.
Where to Begin
Before starting this project, we recommend familiarizing yourself with these resources:
Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.