Timaeus is an AI safety research organization working on applications of singular learning theory (SLT) to alignment. These applications include:
- Developmental Interpretability, which aims to characterize the fundamental units of computational structure in neural networks and to invent scalable, automated methods for finding and classifying them.
- Structural Generalization, which aims to characterize out-of-distribution generalization and to invent scalable, automated methods for detecting mechanistic anomalies and predicting adversarial vulnerabilities.
- Geometry of Program Synthesis, which aims to characterize and understand inductive biases towards dangerous forms of algorithmic reasoning like search, mesa-optimization, and deception.
Check out Timaeus's public announcement.
Recent Work
- The Developmental Landscape of In-Context Learning by Hoogland et al. (2023).
- Estimating the Local Learning Coefficient at Scale by Furman and Lau (2023).
Check out more work by Timaeus and collaborators.
Developmental Interpretability
Developmental interpretability is a alignment research agenda grounded in singular learning theory (SLT), statistical physics, and developmental biology. The aim of developmental interpretability is to build tools for detecting, locating, and interpreting phase transitions that govern training and in-context learning. This has the potential to reduce the alignment tax for existing techniques and inform scalable new methods for interpreting neural networks.
Check out the research agenda.
Structural Generalization
More information coming soon.
Geometry of Program Synthesis
More information coming soon.
The research agenda that we are contributing to was established by Daniel Murfet, who is a mathematician at the University of Melbourne and an expert in singular learning theory, algebraic geometry, and mathematical logic.