PILCO: A Model-Based and Data-Efficient Approach to Policy Search #44

nagataka · 2022-01-27T19:56:47Z

Summary

Link

PILCO: a model-based and data-efficient approach to policy search
PILCO - 第一回高橋研究室モデルベース強化学習勉強会

Author/Institution

What is this

Abstract Quote

In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Comparison with previous researches. What are the novelties/good points?

Key points

The framework consists of the dynamics model, analytic approximate policy evaluation, and gradient- based.

Compute probability distribution at time step t as p_\theta (x_t), then compute the cost function J^\pi(\theta)

cost c(x) can be solved analytically (eq. 25)

Analytic derivatives of J can be computed, and "standard gradient-based non-convex optimization methods, e.g., CG or L- BFGS" are used to update the parameter \theta

How the author proved effectiveness of the proposal?

Cart-Pole (real)
Unicycle (simulation)

Any discussions?

What should I read next?

nagataka added Dissertation Model-Based RL Policy Gradients Reinforcement Learning Robotics Uncertainty labels Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PILCO: A Model-Based and Data-Efficient Approach to Policy Search #44

PILCO: A Model-Based and Data-Efficient Approach to Policy Search #44

nagataka commented Jan 27, 2022

PILCO: A Model-Based and Data-Efficient Approach to Policy Search #44

PILCO: A Model-Based and Data-Efficient Approach to Policy Search #44

Comments

nagataka commented Jan 27, 2022

Summary

Link

Author/Institution

What is this

Abstract Quote

Comparison with previous researches. What are the novelties/good points?

Key points

How the author proved effectiveness of the proposal?

Any discussions?

What should I read next?