Timaeus Update November 2023
A quick round of updates from the past month:
- We ran the DevInterp conference and had a lot of fun, made headway on some projects, and seeded new research collaborations. You can watch the lectures here. To highlight one talk, here’s an update on the projects we’re currently finishing up where we investigate the development of structure in a family of transformers trained to do linear regression.
- We followed this up with a smaller workshop on linear logic involving senior researchers from around Europe. This is relevant to a research direction we’re pursuing that we call “Geometry of Program Synthesis,” where we hope to probe the relationship between loss landscape geometry and computational structure in easier-to-understand settings outside NNs.
- We put out a bunch of writing. A comprehensive review of generalization theory. A distillation on the development of structure in the toy models of superposition. And some helpful details on learning coefficient estimation.). There was also a great discussion in the comments following Joar Skalse’s critiques of SLT.
Over the next month:
- We’re working hard towards ICML. We’re planning on finishing two of the projects we’re currently working on for this deadline:
- “ICL 1”, which studies the development of in-context learning in transformers trained to perform linear regression, where we find a strong connection between changes in geometry and changes in computational structure,
- “Quantifying Degeneracy 2”, a follow-up to Lau et al. (2023) which advances the techniques we’re developing for estimating learning coefficients and other properties of the local loss landscape.
- We’ll also be publishing a research agenda we call “Geometry of Program Synthesis”, with a first example coming out of a set of notes known internally as SLT40, which argues that “simplicity bias = speed bias” (sort of).