Predictive Representations of State

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper

Authors

Michael L. Littman, Richard S. Sutton

Abstract

We show that states of a dynamical system can be usefully repre(cid:173) sented by multi-step, action-conditional predictions of future ob(cid:173) servations. State representations that are grounded in data in this way may be easier to learn, generalize better, and be less depen(cid:173) dent on accurate prior models than, for example, POMDP state representations. Building on prior work by Jaeger and by Rivest and Schapire, in this paper we compare and contrast a linear spe(cid:173) cialization of the predictive approach with the state representa(cid:173) tions used in POMDPs and in k-order Markov models. Ours is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls). We show that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model.

In predicting or controlling a sequence of observations, the concepts of state and state estimation inevitably arise. There have been two dominant approaches. The generative-model approach, typified by research on partially observable Markov de(cid:173) cision processes (POMDPs), hypothesizes a structure for generating observations and estimates its state and state dynamics. The history-based approach, typified by k-order Markov methods, uses simple functions of past observations as state, that is, as the immediate basis for prediction and control. (The data flow in these two ap(cid:173) proaches are diagrammed in Figure 1.) Of the two, the generative-model approach is more general. The model's internal state gives it temporally unlimited memory(cid:173) the ability to remember an event that happened arbitrarily long ago--whereas a history-based approach can only remember as far back as its history extends. The bane of generative-model approaches is that they are often strongly dependent on a good model of the system's dynamics. Most uses of POMDPs, for example, assume a perfect dynamics model and attempt only to estimate state. There are algorithms for simultaneously estimating state and dynamics (e.g., Chrisman, 1992), analogous to the Baum-Welch algorithm for the uncontrolled case (Baum et al., 1970), but these are only effective at tuning parameters that are already approximately cor(cid:173) rect (e.g., Shatkay & Kaelbling, 1997).

observations (and actions)