
Freek Stulp and Olivier Sigaud. Robot Skill Learning: From Reinforcement Learning to Evolution Strategies. Paladyn. Journal
of Behavioral Robotics, 4(1):49–61, September 2013.



[HTML]244kB



Due to trends towards searching in parameter space and using rewardweighted averaging, reinforcement learning (RL) algorithms
for policy improvement are now able to learn sophisticated robot skills. A sideeffect of these trends has been that RL algorithms
have become more and more similar to evolution strategies, which treat policy improvement as a blackbox optimization problem,
and thus do not leverage the problem structure as RL algorithms do.We demonstrate how two straightforward simplifications
to the stateoftheart RL algorithm PI2 suffice to convert it into the blackbox optimization algorithm (\mu_W,\lambda)ES.
Furthermore, we show that (\mu_W,\lambda)ES empirically outperforms PI2 on several tasks. It is striking that PI2 and (\mu_W,\lambda)ES
share a common core, and that the simpler, older algorithm outperforms the more sophisticated, newer one.We argue that this
is due to a third trend in robot skill learning: the predominant use of dynamic movement primitives (DMPs). We show how DMPs
dramatically simplify the learning problem, and discuss the implications of this for past and future work on robot skill learning.



@Article{stulp13robot,
title = {Robot Skill Learning: From Reinforcement Learning to Evolution Strategies},
author = {Freek Stulp and Olivier Sigaud},
journal = {Paladyn. Journal of Behavioral Robotics},
year = {2013},
month = {September},
number = {1},
pages = {4961},
volume = {4},
abstract = {Due to trends towards searching in parameter space and using rewardweighted averaging, reinforcement learning (RL) algorithms for policy improvement are now able to learn sophisticated robot skills. A sideeffect of these trends has been that RL algorithms have become more and more similar to evolution strategies, which treat policy improvement as a blackbox optimization problem, and thus do not leverage the problem structure as RL algorithms do.
We demonstrate how two straightforward simplifications to the stateoftheart RL algorithm PI2 suffice to convert it into the blackbox optimization algorithm (\mu_W,\lambda)ES. Furthermore, we show that (\mu_W,\lambda)ES empirically outperforms PI2 on several tasks. It is striking that PI2 and (\mu_W,\lambda)ES share a common core, and that the simpler, older algorithm outperforms the more sophisticated, newer one.
We argue that this is due to a third trend in robot skill learning: the predominant use of dynamic movement primitives (DMPs). We show how DMPs dramatically simplify the learning problem, and discuss the implications of this for past and future work on robot skill learning. },
bib2html_pubtype = {Journal},
bib2html_rescat = {Reinforcement Learning of Robot Skills}
}

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein
are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the
terms and constraints.
Generated by
bib2html.pl
(written by Patrick Riley
) on
Mon Jul 20, 2015 21:50:11 