# Publications Freek Stulp

Back to Homepage
 • Sorted by Date • Classified by Publication Type • Classified by Research Category • Robot Skill Learning: From Reinforcement Learning to Evolution Strategies Freek Stulp and Olivier Sigaud. Robot Skill Learning: From Reinforcement Learning to Evolution Strategies. Paladyn. Journal of Behavioral Robotics, 4(1):49–61, September 2013. Download [HTML]244kB Abstract Due to trends towards searching in parameter space and using reward-weighted averaging, reinforcement learning (RL) algorithms for policy improvement are now able to learn sophisticated robot skills. A side-effect of these trends has been that RL algorithms have become more and more similar to evolution strategies, which treat policy improvement as a black-box optimization problem, and thus do not leverage the problem structure as RL algorithms do.We demonstrate how two straightforward simplifications to the state-of-the-art RL algorithm PI2 suffice to convert it into the black-box optimization algorithm (\mu_W,\lambda)-ES. Furthermore, we show that (\mu_W,\lambda)-ES empirically outperforms PI2 on several tasks. It is striking that PI2 and (\mu_W,\lambda)-ES share a common core, and that the simpler, older algorithm outperforms the more sophisticated, newer one.We argue that this is due to a third trend in robot skill learning: the predominant use of dynamic movement primitives (DMPs). We show how DMPs dramatically simplify the learning problem, and discuss the implications of this for past and future work on robot skill learning. BibTeX
 @Article{stulp13robot, title = {Robot Skill Learning: From Reinforcement Learning to Evolution Strategies}, author = {Freek Stulp and Olivier Sigaud}, journal = {Paladyn. Journal of Behavioral Robotics}, year = {2013}, month = {September}, number = {1}, pages = {49--61}, volume = {4}, abstract = {Due to trends towards searching in parameter space and using reward-weighted averaging, reinforcement learning (RL) algorithms for policy improvement are now able to learn sophisticated robot skills. A side-effect of these trends has been that RL algorithms have become more and more similar to evolution strategies, which treat policy improvement as a black-box optimization problem, and thus do not leverage the problem structure as RL algorithms do. We demonstrate how two straightforward simplifications to the state-of-the-art RL algorithm PI2 suffice to convert it into the black-box optimization algorithm (\mu_W,\lambda)-ES. Furthermore, we show that (\mu_W,\lambda)-ES empirically outperforms PI2 on several tasks. It is striking that PI2 and (\mu_W,\lambda)-ES share a common core, and that the simpler, older algorithm outperforms the more sophisticated, newer one. We argue that this is due to a third trend in robot skill learning: the predominant use of dynamic movement primitives (DMPs). We show how DMPs dramatically simplify the learning problem, and discuss the implications of this for past and future work on robot skill learning. }, bib2html_pubtype = {Journal}, bib2html_rescat = {Reinforcement Learning of Robot Skills} } 

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints.

Generated by bib2html.pl (written by Patrick Riley ) on Mon Jul 20, 2015 21:50:11