Part of Advances in Neural Information Processing Systems 6 (NIPS 1993)
Richard Zemel, Geoffrey E. Hinton
The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representa(cid:173) tion that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump. The coordinates of the hidden units in the implicit space are also learned, thus allow(cid:173) ing flexibility, as the network develops a discontinuous topography when presented with different input classes. Population-coding in a space other than the input enables a network to extract nonlinear higher-order properties of the inputs.
Most existing unsupervised learning algorithms can be understood using the Min(cid:173) imum Description Length (MDL) principle (Rissanen, 1989). Given an ensemble of input vectors, the aim of the learning algorithm is to find a method of coding each input vector that minimizes the total cost, in bits, of communicating the input vectors to a receiver. There are three terms in the total description length:
• The code-cost is the number of bits required to communicate the code
that the algorithm assigns to each input vector.