Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning #46

Open
nagataka opened this issue Feb 24, 2022 · 0 comments
Open

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning #46

nagataka opened this issue Feb 24, 2022 · 0 comments

Comments

@nagataka
Copy link
Owner

Summary

Link

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

Author/Institution

What is this

  • Proposes DARLA (DisentAngled Representation Learning Agent)
  • Problems/Motivation
    • Learning good internal representations with both source and target domain data
      • The reliance on target domain information can be problematic, as the data may be expensive or difficult to obtain
    • Learning exclusively on the source domain using deep RL appraoch
      • Poor domain adaptation performance
    • DARLA tackle both issues by focusing on learning underlying low-dimensional factorised representation of the world
  • Demonstrate how disentangled representations can improve the robustness of RL algorithms in domain adaptation scenarios
    • The theoretical utility of disentangled representations for reinforcement learning has been described before, but it has not been empirically validated
  • RL algorithms
    • DQN
    • A3C
    • Model-Free Episodic Control

Comparison with previous researches. What are the novelties/good points?

Key points

  • Consists of a three stage pipeline

    1. learning to see
    2. learning to act
    3. transfer
  • replaces the reconstruction loss in the VAE objective as follows

    • スクリーンショット 2022-02-23 16 55 54
    • J is a denoising autoencoder
    • スクリーンショット 2022-02-24 12 43 08
  • "the disentangled model used for DARLA was trained with a β hyperparameter value of 1"

    • "Note that by replacing the pixel based reconstruction loss in Eq. 1 with high-level feature recon- struction loss in Eq. 2 we are no longer optimising the vari- ational lower bound, and β-VAEDAE with β = 1 loses its equivalence to the Variational Autoencoder (VAE) frame- work as proposed by (Kingma & Welling, 2014; Rezende et al., 2014)."

How the author proved effectiveness of the proposal?

  • Experiments
    • DeepMind Lab
    • Jaco robotic arm (including sim2real set-up: Mujoco simulation is the source domain and the real robotic arm is the target domain)

Any discussions?

What should I read next?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant