Optimizers

Jihong on June 27, 2017

Training a neural network with gradient descent is not easy as it is a non-convex optimization problem. The updates can be trapped in the so-called saddle-points or even plateaus because an inappropriate update strategy. In this post, we are going to talk about the common problems of weight updates and the state-of-art solutions.

Table of Contents

SGD

Momentum

Nestrov Momentum

Adagrad

RMSProp

AdaDelta

Adam