Learn about SLT

SLT for Alignment

Start here:

SLT

Start here: Dialogue introduction to SLT

I want to learn theoretical SLT:

I want to learn applied SLT:

Theoretical SLT

Essential reading:

Advanced materials:

Textbooks

The textbooks in SLT are:

The Grey Book
Sumio Watanabe “Algebraic Geometry and Statistical Learning Theory” 2009

The Green Book
Sumio Watanabe “Mathematical Theory of Bayesian Statistics” 2018

There is also an exercise textbook:
Joe Suzuki, “WAIC and WBIC with R Stan Joe Suzuki 100 Exercises for Building Logic” 2019

Applied/Experimental SLT

LLC Estimation

Currently, the key experimental technique in applying SLT to real-world models is local learning coefficient (LLC) estimation, introduced in Lau et al. (2023).

Putting it in practice: Once you’ve read the above materials, get some hands-on practice with the example notebooks in devinterp, starting with this introductory notebook.

Developmental interpretability

Developmental interpretability proposes to study changes in neural network structure over the course of training (rather than trying to interpret isolated snapshots). This draws on ideas and methods from a range of areas of mathematics, statistics, and the (biological) sciences.

At the moment, the key techniques, namely applying LLC estimation over the course of training, come from Singular Learning Theory (SLT) and to a lesser extent developmental biology and statistical physics.

The readings focus on SLT:

Bonus

​​Alignment

Basics

Interpretability

Bonus