Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Yu, Jin; Aberdeen, Douglas; Schraudolph, Nicol

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)

Bibtex Metadata Paper

Authors

Jin Yu, Douglas Aberdeen, Nicol N. Schraudolph

Abstract

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, ofﬂine conjugate, and natural policy gradient methods.

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Authors

Abstract

Name Change Policy