Stagewise Reinforcement Learning and the Geometry of the Regret Landscape

We extend singular learning theory to reinforcement learning, showing that phase transitions in policy development are governed by the local learning coefficient, which detects transitions even when policies appear identical in terms of regret.

Authors
Chris Elliott1, Einar Urdshals1, David Quarel1, Matthew Farrugia-Roberts2, Daniel Murfet1
1Timaeus · 2University of Oxford
Published
January 12, 2026

Build on our work

Our tools for susceptibilities, local learning coefficients, and SGMCMC sampling are open source in the devinterp library.

Work with us

We're hiring Research Scientists, Engineers & more to join the team full-time.

Senior researchers can also express interest in a part-time affiliation through our new Research Fellows Program.