Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Samuel Choi, Dit-Yan Yeung, Nevin Zhang
Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonsta(cid:173) tionary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. While HM-MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more gen(cid:173) eral POMDP model unnecessarily increases the problem complex(cid:173) ity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.