Higher Order Statistical Decorrelation without Information Loss

Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)

Bibtex Metadata Paper

Authors

Gustavo Deco, Wilfried Brauer

Abstract

A neural network learning paradigm based on information theory is pro(cid:173) posed as a way to perform in an unsupervised fashion, redundancy reduction among the elements of the output layer without loss of infor(cid:173) mation from the sensory input. The model developed performs nonlin(cid:173) ear decorrelation up to higher orders of the cumulant tensors and results in probabilistic ally independent components of the output layer. This means that we don't need to assume Gaussian distribution neither at the input nor at the output. The theory presented is related to the unsuper(cid:173) vised-learning theory of Barlow, which proposes redundancy reduction as the goal of cognition. When nonlinear units are used nonlinear princi(cid:173) pal component analysis is obtained. In this case nonlinear manifolds can be reduced to minimum dimension manifolds. If such units are used the network performs a generalized principal component analysis in the sense that non-Gaussian distributions can be linearly decorrelated and higher orders of the correlation tensors are also taken into account. The basic structure of the architecture involves a general transfOlmation that is volume conserving and therefore the entropy, yielding a map without loss of infoIIDation. Minimization of the mutual infoIIDation among the output neurons eliminates the redundancy between the outputs and results in statistical decorrelation of the extracted features. This is known as factorialleaming.

248

Gustavo Deco, Wilfried Brauer