Timaeus is an AI safety research organization working on applications of singular learning theory (SLT) to alignment. These applications include:
- Developmental Interpretability, which aims to characterize the fundamental units of computational structure in neural networks and to invent scalable, automated methods for finding and classifying them.
- Structural Generalization, which aims to characterize out-of-distribution generalization and to invent scalable, automated methods for detecting mechanistic anomalies and predicting adversarial vulnerabilities.
- Geometry of Program Synthesis, which aims to characterize and understand inductive biases towards dangerous forms of algorithmic reasoning like search, mesa-optimization, and deception.
Check out Timaeus's public announcement.
Recent Work
- Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient by Wang et al. (2024).
- Loss landscape geometry reveals stagewise development of transformers by Wang et al. (2024).
Check out more work by Timaeus and collaborators.
Developmental Interpretability
Developmental interpretability is a alignment research agenda grounded in singular learning theory (SLT), statistical physics, and developmental biology. The aim of developmental interpretability is to build tools for detecting, locating, and interpreting phase transitions that govern training and in-context learning. This has the potential to reduce the alignment tax for existing techniques and inform scalable new methods for interpreting neural networks.
Check out the research agenda.
Structural Generalization
More information coming soon.
Geometry of Program Synthesis
More information coming soon.
The research agenda that we are contributing to was established by Daniel Murfet, who is a mathematician at the University of Melbourne and an expert in singular learning theory, algebraic geometry, and mathematical logic.