Timaeus is a new AI safety organization dedicated to scoping "developmental interpretability," a new research agenda that aims to detect, locate, and interpret phase transitions in neural networks.

Developmental Interpretability

Developmental interpretability is a new alignment research agenda grounded in singular learning theory (SLT), statistical physics, and developmental biology. The aim of developmental interpretability is to build tools for detecting, locating, and interpreting phase transitions that govern training and in-context learning. This has the potential to reduce the alignment tax for existing techniques and inform scalable new methods for interpreting neural networks.

2023 SLT & Alignment Summit

The SLT & Alignment Summit ("Singularities against the Singularity") was run in June 2023 (and is actually still ongoing). In the first week, we recorded more than 20 hours of lectures on the necessary background, all of which you can find here. In the second week, we're starting research collaborations on the open problems.

We'll post a review, as well as lecture notes and more informal posts over the course of the next month.