Natural Gradient Descent
SLT suggests natural gradient descent should generalize worse. Can we formalize this with the RLCT?
Project Details
Tags
In natural gradient descent, the gradient vector (obtained from the cost function) is pre-multiplied by the inverse of the FIM. This operation transforms the gradient vector to account for the curvature of the parameter space. It preconditions gradients to treat all directions in parameter space equally.
Intuitively, SLT suggests this should generalize worse. Is this true. You could try running natural gradient descent vs. SGD on a model and comparing their final model complexities using the learning coefficient. Does the learning coefficient end up being higher for natural gradient descent? What does this say about generalization?
Where to Begin
Before starting this project, we recommend familiarizing yourself with these resources:
Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.