Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Ralf Schoknecht, Artur Merke
Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. How(cid:173) ever, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields suffi(cid:173) cient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a cer(cid:173) tain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transi(cid:173) tion data.