Natural Gradient Descent
SLT suggests natural gradient descent should generalize worse. Can we formalize this with the RLCT?
In natural gradient descent, the gradient vector (obtained from the cost function) is pre-multiplied by the inverse of the FIM. This operation transforms the gradient vector to account for the curvature of the parameter space. It preconditions gradients to treat all directions in parameter space equally.
Intuitively, SLT suggests this should generalize worse. Is this true. You could try running natural gradient descent vs. SGD on a model and comparing their final model complexities using the learning coefficient. Does the learning coefficient end up being higher for natural gradient descent? What does this say about generalization?
Where to begin:
If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.