Learning Latent Dynamics for Planning from Pixels #32

nagataka · 2020-05-05T01:06:28Z

Summary

Link

Learning Latent Dynamics for Planning from Pixels

Author/Institution

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson
Google Brain, University of Toronto, DeepMind, Google Research, University of Michigan

What is this

Proposed the Deep Planning Network (PlaNet)
- a purely model-based agent that learns the environment dynamics from images and choose actions through fast online planning in latent space.
Proposed a multi-step variational inference objective that we name latent overshooting.
Showed that the agent solved continuous control tasks, partial observability, and sparse rewards problems using only pixel observations
Achieved close to or sometimes higher performance than strong model-free algorithms
Control: Model Predictive Control (MPC)
- replan at each step (sounds computationally expensive)
Planning algorithm: Cross entropy method (CEM) to search for the best action sequence under the model
- Why CEM?: "We decided on this algorithm because of its robustness and because it solved all considered tasks when given the true dynamics for planning"

"Model" in this architecture refers three thigs:

transition model $p(s_t|s_{t-1}, a_{t-1})$
- Gaussian with mean and variance parameterized by a feed-forward neural network
observation model $p(o_t|s_t)$
- Gaussian with mean parameterized by a deconvolutional neural network and identity covariance
reward model $p(r_t|s_t)$
- scalar Gaussian with mean parameterized by a feed-forward neural network and unit variance

and policy $p(a_t|o_t,a_t)$ aimes to maximize the expected sum of rewards.

Comparison with previous researches. What are the novelties/good points?

The robotics community focuses on video prediction models for planning (Agrawal et al., 2016; Finn & Levine, 2017; Ebert et al., 2018; Zhang et al., 2018) that deal with the visual complexity of the real world and solve tasks with a simple gripper, such as grasping or pushing objects.
- In comparison, we focus on simulated environments, where we leverage latent planning to scale to larger state and action spaces, longer planning horizons, as well as sparse reward tasks
E2C (Watter et al., 2015) and RCE (Banija- mali et al., 2017) embed images into a latent space, where they learn local-linear latent transitions and plan for actions using LQR. These methods balance simulated cartpoles and control 2-link arms from images, but have been difficult to scale up.
- We lift the Markov assumption of these models, making our method applicable under partial observability, and present results on more challenging environments that include longer planning horizons, contact dynamics, and sparse

Key points

Regarding recurrent network for planning, they claim the following:

our experiments show that both stochastic and deterministic paths in the transition model are crucial for successful planning

and the network architecture looks like Figure2 (c) which is called Recurrent state-space model (RSSM)

How the author proved the effectiveness of the proposal?

Experiments in continuous control tasks:
Cartpole Swing Up, Reacher Easy, Cheetah Run, Finger Spin, Cup Catch, and Walker Walk from DeepMind control suite

Confirmed that the proposed model achieved comparable performance to the best model-free algorithms while using 200× fewer episodes and similar or less computation time.

Any discussions?

What should I read next?

Broader contextual review:

nagataka added Model-Based RL Reinforcement Learning labels May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning Latent Dynamics for Planning from Pixels #32

Learning Latent Dynamics for Planning from Pixels #32

nagataka commented May 5, 2020

Learning Latent Dynamics for Planning from Pixels #32

Learning Latent Dynamics for Planning from Pixels #32

Comments

nagataka commented May 5, 2020

Summary

Link

Author/Institution

What is this

Comparison with previous researches. What are the novelties/good points?

Key points

How the author proved the effectiveness of the proposal?

Any discussions?

What should I read next?