Singular Learning Theory

Natural Gradient Descent

SLT suggests natural gradient descent should generalize worse. Can we formalize this with the RLCT?

Project Details

Status: In-progress
Difficulty: Easy
Type: Applied

Team & Contact

Lead: Moosa & Zach
Discord: cxtraa

Tags

slt

In natural gradient descent, the gradient vector (obtained from the cost function) is pre-multiplied by the inverse of the FIM. This operation transforms the gradient vector to account for the curvature of the parameter space. It preconditions gradients to treat all directions in parameter space equally.

Intuitively, SLT suggests this should generalize worse. Is this true. You could try running natural gradient descent vs. SGD on a model and comparing their final model complexities using the learning coefficient. Does the learning coefficient end up being higher for natural gradient descent? What does this say about generalization?

Where to Begin

Before starting this project, we recommend familiarizing yourself with these resources:

Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.