Part of Advances in Neural Information Processing Systems 11 (NIPS 1998)
Christoph Neukirchen, Gerhard Rigoll
This paper introduces a method for regularization ofHMM systems that avoids parameter overfitting caused by insufficient training data. Regu(cid:173) larization is done by augmenting the EM training method by a penalty term that favors simple and smooth HMM systems. The penalty term is constructed as a mixture model of negative exponential distributions that is assumed to generate the state dependent emission probabilities of the HMMs. This new method is the successful transfer of a well known regularization approach in neural networks to the HMM domain and can be interpreted as a generalization of traditional state-tying for HMM sys(cid:173) tems. The effect of regularization is demonstrated for continuous speech recognition tasks by improving overfitted triphone models and by speaker adaptation with limited training data.