Reviews: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

This paper theoretically justify the phenomenon that deep learning generalizes if a large learning rate is used in the early stage of training. To do so, this paper considers a rather simple problem setting and shows that 2-layer neural network generalizes better if it is trained by a large learning rate first followed by an annealed learning rate than a small learning rate. This concept is supported by numerical experiments on CIFAR10. This is an interesting paper and gives a rigorous insight to the well known phenomenon. This would open up a new research field on this topic that many researchers would follow.

Paper ID:	6229
Title:	Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks