SGD vs. Bayes in Toy Landscapes

Lead: Guillaume Corlouer guillaume5439

SLT is about Bayesian learning. What can it say about SGD?

Type: Applied
Difficulty: Hard
Status: In-progress

Update

See Corlouer and Macé’s update here.


SGD is a theory about Bayesian learning. The transitions we encounter are “quasistatic”: they don’t really involve a time aspect and instead involve an equilibrium distribution changing as a function of the number of samples. What does this say about SGD, which is non-equilibrium and inherently dynamic?

Chen et al. 2023 look at one possible link in terms of the Bayesian Antecedent Hypothesis (BAH): that dynamical transitions in SGD are “backed” by an underlying Bayesian transition. We don’t hold this hypothesis particularly strongly, and it would be interesting to look for violations.

One way to explore this question is to investigate the differences between these learning processes in toy settings. The person to talk to about this is Guillaume Corlouer (guillaume5439 in the discord).

Where to begin:

If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.