Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PILCO: A Model-Based and Data-Efficient Approach to Policy Search #44

Open
nagataka opened this issue Jan 27, 2022 · 0 comments
Open

PILCO: A Model-Based and Data-Efficient Approach to Policy Search #44

nagataka opened this issue Jan 27, 2022 · 0 comments

Comments

@nagataka
Copy link
Owner

Summary

Link

PILCO: a model-based and data-efficient approach to policy search
PILCO - 第一回高橋研究室モデルベース強化学習勉強会

Author/Institution

What is this

Abstract Quote

In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Comparison with previous researches. What are the novelties/good points?

Key points

The framework consists of the dynamics model, analytic approximate policy evaluation, and gradient- based.

スクリーンショット 2022-01-27 11 54 06

Compute probability distribution at time step t as p_\theta (x_t), then compute the cost function J^\pi(\theta)

cost c(x) can be solved analytically (eq. 25)

Analytic derivatives of J can be computed, and "standard gradient-based non-convex optimization methods, e.g., CG or L- BFGS" are used to update the parameter \theta

How the author proved effectiveness of the proposal?

  • Cart-Pole (real)
  • Unicycle (simulation)

Any discussions?

What should I read next?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant