Human Reading and the Curse of Dimensionality

Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)

Bibtex Metadata Paper

Authors

Gale Martin

Abstract

Whereas optical character recognition (OCR) systems learn to clas(cid:173) sify single characters; people learn to classify long character strings in parallel, within a single fixation . This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified im(cid:173) ages is reduced by consistent and optimal eye fixation positions, and by character sequence regularities.

An interesting difference exists between human reading and optical character recog(cid:173) nition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) . OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).

OCR (Low Dbnensionality) I Dorothy lived In the .... I [Q] ... _ .................................... "D" ~ ................................. .. "0" o