Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)
Yaakov Engel, Peter Szabo, Dmitry Volkinshtein
The Octopus arm is a highly versatile and complex limb. How the Octo- pus controls such a hyper-redundant arm (not to mention eight of them!) is as yet unknown. Robotic arms based on the same mechanical prin- ciples may render present day robotic arms obsolete. In this paper, we tackle this control problem using an online reinforcement learning al- gorithm, based on a Bayesian approach to policy evaluation known as Gaussian process temporal difference (GPTD) learning. Our substitute for the real arm is a computer simulation of a 2-dimensional model of an Octopus arm. Even with the simplifications inherent to this model, the state space we face is a high-dimensional one. We apply a GPTD- based algorithm to this domain, and demonstrate its operation on several learning tasks of varying degrees of difficulty.