You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.
Comparison with previous researches. What are the novelties/good points?
Key points
The framework consists of the dynamics model, analytic approximate policy evaluation, and gradient- based.
Compute probability distribution at time step t as p_\theta (x_t), then compute the cost function J^\pi(\theta)
cost c(x) can be solved analytically (eq. 25)
Analytic derivatives of J can be computed, and "standard gradient-based non-convex optimization methods, e.g., CG or L- BFGS" are used to update the parameter \theta
How the author proved effectiveness of the proposal?
Cart-Pole (real)
Unicycle (simulation)
Any discussions?
What should I read next?
The text was updated successfully, but these errors were encountered:
Summary
Link
PILCO: a model-based and data-efficient approach to policy search
PILCO - 第一回高橋研究室モデルベース強化学習勉強会
Author/Institution
What is this
Abstract Quote
Comparison with previous researches. What are the novelties/good points?
Key points
The framework consists of the dynamics model, analytic approximate policy evaluation, and gradient- based.
Compute probability distribution at time step t as p_\theta (x_t), then compute the cost function J^\pi(\theta)
cost c(x) can be solved analytically (eq. 25)
Analytic derivatives of J can be computed, and "standard gradient-based non-convex optimization methods, e.g., CG or L- BFGS" are used to update the parameter \theta
How the author proved effectiveness of the proposal?
Any discussions?
What should I read next?
The text was updated successfully, but these errors were encountered: