
Freek Stulp and Olivier Sigaud. Path Integral Policy Improvement with Covariance Matrix Adaptation. In Proceedings of the
29th International Conference on Machine Learning (ICML), 2012.



[PDF]877.7kB



There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized
policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control
with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods
which share the concept of probabilityweighted averaging to iteratively update parameters to optimize a cost function. At
the conceptual level, we compare PI2 to other members of the same family, being CrossEntropy Methods and CMAES. The comparison
suggests the derivation of a novel algorithm which we call PI2CMA for ``Path Integral Policy Improvement with Covariance
Matrix Adaptation''. PI2CMA's main advantage is that it determines the magnitude of the exploration noise automatically



@InProceedings{stulp12path,
title = {Path Integral Policy Improvement with Covariance Matrix Adaptation},
author = {Freek Stulp and Olivier Sigaud},
booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML)},
year = {2012},
abstract = {There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probabilityweighted averaging to iteratively update parameters to optimize a cost function. At the conceptual level, we compare PI2 to other members of the same family, being CrossEntropy Methods and CMAES. The comparison suggests the derivation of a novel algorithm which we call PI2CMA for ``Path Integral Policy Improvement with Covariance Matrix Adaptation''. PI2CMA's main advantage is that it determines the magnitude of the exploration noise automatically},
bib2html_pubtype = {Refereed Conference Paper},
bib2html_rescat = {Reinforcement Learning of Robot Skills}
}

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein
are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the
terms and constraints.
Generated by
bib2html.pl
(written by Patrick Riley
) on
Mon Jul 20, 2015 21:50:11 