Part of Advances in Neural Information Processing Systems 3 (NIPS 1990)
Yves Chauvin
For a simple linear case, a mathematical analysis of the training and gener(cid:173) alization (validation) performance of networks trained by gradient descent on a Least Mean Square cost function is provided as a function of the learn(cid:173) ing parameters and of the statistics of the training data base. The analysis predicts that generalization error dynamics are very dependent on a pri(cid:173) ori initial weights. In particular, the generalization error might sometimes weave within a computable range during extended training. In some cases, the analysis provides bounds on the optimal number of training cycles for minimal validation error. For a speech labeling task, predicted weaving effects were qualitatively tested and observed by computer simulations in networks trained by the linear and non-linear back-propagation algorithm.