LLCs and Ablations
Comparing LLC estimation in weights to different forms of ablations in activations.
This project aims to systematically compare LLC estimation in weights to different forms of ablations in activations. By studying the relationship between LLCs and various ablation techniques, we can gain insights into how the LLC reflects the functional structure of neural networks.
Key research questions:
- How do different ablation techniques affect the LLC of a neural network?
- Can changes in the LLC due to ablations be used to identify important circuits or components in the network?
- How does the relationship between LLCs and ablations vary across different model architectures?
- Can LLC analysis complement or improve existing ablation-based interpretability techniques?
Methodology:
- Implement a variety of ablation techniques (e.g., neuron ablation, attention head pruning, causal scrubbing) for different model architectures.
- Estimate LLCs before and after applying each ablation technique.
- Compare the impact of ablations on the LLC to their impact on model performance and behavior.
- Investigate whether changes in the LLC can be used to identify important circuits or components in the network.
- Explore the use of LLC analysis in conjunction with existing ablation-based interpretability techniques.
Expected outcomes:
- Empirical data on how different ablation techniques affect the LLC.
- Insights into the relationship between functional structure (as revealed by ablations) and model complexity (as measured by the LLC).
- Potential development of LLC-based techniques for identifying important components in neural networks.
- Improved understanding of how the LLC reflects the functional organization of neural networks.
This research could enhance our ability to interpret neural networks by providing a bridge between weight-space complexity measures (LLCs) and activation-space interpretability techniques (ablations).
Where to begin:
If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.