Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning Latent Dynamics for Planning from Pixels #32

Open
nagataka opened this issue May 5, 2020 · 0 comments
Open

Learning Latent Dynamics for Planning from Pixels #32

nagataka opened this issue May 5, 2020 · 0 comments

Comments

@nagataka
Copy link
Owner

nagataka commented May 5, 2020

Summary

Link

Learning Latent Dynamics for Planning from Pixels

Official repo: google-research/planet

Author/Institution

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson
Google Brain, University of Toronto, DeepMind, Google Research, University of Michigan

What is this

  • Proposed the Deep Planning Network (PlaNet)
    • a purely model-based agent that learns the environment dynamics from images and choose actions through fast online planning in latent space.
  • Proposed a multi-step variational inference objective that we name latent overshooting.
  • Showed that the agent solved continuous control tasks, partial observability, and sparse rewards problems using only pixel observations
  • Achieved close to or sometimes higher performance than strong model-free algorithms
  • Control: Model Predictive Control (MPC)
    • replan at each step (sounds computationally expensive)
  • Planning algorithm: Cross entropy method (CEM) to search for the best action sequence under the model
    • Why CEM?: "We decided on this algorithm because of its robustness and because it solved all considered tasks when given the true dynamics for planning"

"Model" in this architecture refers three thigs:

  • transition model $p(s_t|s_{t-1}, a_{t-1})$
    • Gaussian with mean and variance parameterized by a feed-forward neural network
  • observation model $p(o_t|s_t)$
    • Gaussian with mean parameterized by a deconvolutional neural network and identity covariance
  • reward model $p(r_t|s_t)$
    • scalar Gaussian with mean parameterized by a feed-forward neural network and unit variance

and policy $p(a_t|o_t,a_t)$ aimes to maximize the expected sum of rewards.

Screenshot from 2020-05-04 18-04-20

Comparison with previous researches. What are the novelties/good points?

  • The robotics community focuses on video prediction models for planning (Agrawal et al., 2016; Finn & Levine, 2017; Ebert et al., 2018; Zhang et al., 2018) that deal with the visual complexity of the real world and solve tasks with a simple gripper, such as grasping or pushing objects.
    • In comparison, we focus on simulated environments, where we leverage latent planning to scale to larger state and action spaces, longer planning horizons, as well as sparse reward tasks
  • E2C (Watter et al., 2015) and RCE (Banija- mali et al., 2017) embed images into a latent space, where they learn local-linear latent transitions and plan for actions using LQR. These methods balance simulated cartpoles and control 2-link arms from images, but have been difficult to scale up.
    • We lift the Markov assumption of these models, making our method applicable under partial observability, and present results on more challenging environments that include longer planning horizons, contact dynamics, and sparse

Key points

Regarding recurrent network for planning, they claim the following:

our experiments show that both stochastic and deterministic paths in the transition model are crucial for successful planning

and the network architecture looks like Figure2 (c) which is called Recurrent state-space model (RSSM)
Screenshot from 2020-05-04 18-05-49

How the author proved the effectiveness of the proposal?

Experiments in continuous control tasks:
Cartpole Swing Up, Reacher Easy, Cheetah Run, Finger Spin, Cup Catch, and Walker Walk from DeepMind control suite

Confirmed that the proposed model achieved comparable performance to the best model-free algorithms while using 200× fewer episodes and similar or less computation time.

Any discussions?

What should I read next?

Broader contextual review:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant