Optimizers - Jihong Ju

Training a neural network with gradient descent is not easy as it is a non-convex optimization problem. The updates can be trapped in the so-called saddle-points or even plateaus because an inappropriate update strategy. In this post, we are going to talk about the common problems of weight updates and the state-of-art solutions.