SGD vs. Bayes in Toy Landscapes
SLT is about Bayesian learning. What can it say about SGD?
Project Details
Update
See Corlouer and Macé’s update here.
SGD is a theory about Bayesian learning. The transitions we encounter are “quasistatic”: they don’t really involve a time aspect and instead involve an equilibrium distribution changing as a function of the number of samples. What does this say about SGD, which is non-equilibrium and inherently dynamic?
Chen et al. 2023 look at one possible link in terms of the Bayesian Antecedent Hypothesis (BAH): that dynamical transitions in SGD are “backed” by an underlying Bayesian transition. We don’t hold this hypothesis particularly strongly, and it would be interesting to look for violations.
One way to explore this question is to investigate the differences between these learning processes in toy settings. The person to talk to about this is Guillaume Corlouer (guillaume5439
in the discord).
Where to Begin
Before starting this project, we recommend familiarizing yourself with these resources:
Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.