
Freek Stulp and Olivier Sigaud. Path Integral Policy Improvement w/ Covariance Matrix Adaptation. In The 10th European
Workshop on Reinforcement Learning (EWRL), 2012. Nonarchival. Accepted for the workshop as a double submission with ICML.



(unavailable)



There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized
policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control
with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods
which share the concept of probabilityweighted averaging to iteratively update parameters to optimize a cost function. At
the conceptual level, we compare PI2 to other members of the same family, being CrossEntropy Methods and CMAES. The comparison
suggests the derivation of a novel algorithm which we call PI2CMA for ``Path Integral Policy Improvement with Covariance
Matrix Adaptation''. PI2CMA's main advantage is that it determines the magnitude of the exploration noise automatically



@InProceedings{stulp12patha,
title = {Path Integral Policy Improvement w/ Covariance Matrix Adaptation},
author = {Freek Stulp and Olivier Sigaud},
booktitle = {The 10th European Workshop on Reinforcement Learning (EWRL)},
year = {2012},
note = {Nonarchival. Accepted for the workshop as a double submission with ICML.},
abstract = {There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probabilityweighted averaging to iteratively update parameters to optimize a cost function. At the conceptual level, we compare PI2 to other members of the same family, being CrossEntropy Methods and CMAES. The comparison suggests the derivation of a novel algorithm which we call PI2CMA for ``Path Integral Policy Improvement with Covariance Matrix Adaptation''. PI2CMA's main advantage is that it determines the magnitude of the exploration noise automatically},
bib2html_pubtype = {Refereed Workshop Paper},
bib2html_rescat = {Reinforcement Learning of Robot Skills}
}

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein
are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the
terms and constraints.
Generated by
bib2html.pl
(written by Patrick Riley
) on
Mon Jul 20, 2015 21:50:11 