CCFDM: Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model
This repository is the official implementation of CCFDM for the DeepMind control experiments. Our implementation of SAC is based on SAC+AE by Denis Yarats.
All of the dependencies are in the conda_env.yml
file. They can be installed manually or with the following command:
conda env create -f conda_env.yml
To train CCFDM on all the tasks from image-based observations run bash script/run_all_ri.sh
from the root of this directory. You can modify to try different environments / hyperparamters by changing the scripts in the script
folder.
In your console, you should see printouts that look like:
| train | E: 221 | S: 28000 | D: 18.1 s | R: 785.2634 | BR: 3.8815 | A_LOSS: -305.7328 | CR_LOSS: 190.9854 | CU_LOSS: 0.0000
| train | E: 225 | S: 28500 | D: 18.6 s | R: 832.4937 | BR: 3.9644 | A_LOSS: -308.7789 | CR_LOSS: 126.0638 | CU_LOSS: 0.0000
| train | E: 229 | S: 29000 | D: 18.8 s | R: 683.6702 | BR: 3.7384 | A_LOSS: -311.3941 | CR_LOSS: 140.2573 | CU_LOSS: 0.0000
| train | E: 233 | S: 29500 | D: 19.6 s | R: 838.0947 | BR: 3.7254 | A_LOSS: -316.9415 | CR_LOSS: 136.5304 | CU_LOSS: 0.0000
Log abbreviation mapping:
train - training episode
E - total number of episodes
S - total number of environment steps
D - duration in seconds to train 1 episode
R - mean episode reward
BR - average reward of sampled batch
A_LOSS - average loss of actor
CR_LOSS - average loss of critic
CU_LOSS - average loss of the CURL encoder
All data related to the run is stored in the specified working_dir
. To enable model or video saving, use the --save_model
or --save_video
flags. For all available flags, inspect train.py
. To visualize progress with tensorboard run:
tensorboard --logdir log --port 6006
and go to localhost:6006
in your browser. If you're running headlessly, try port forwarding with ssh.
For GPU accelerated rendering, make sure EGL is installed on your machine and set export MUJOCO_GL=egl
. For environment troubleshooting issues, see the DeepMind control documentation.
This is the code for the paper
Thanh Nguyen, Tung M. Luu, Thang Vu, Chang D. Yoo. Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model. IROS 2021 - 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems [ArXiv]
If you want to cite this paper:
@article{nguyen2021sample,
title={Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model},
author={Nguyen, Thanh and Luu, Tung M and Vu, Thang and Yoo, Chang D},
journal={arXiv preprint arXiv:2103.08255},
year={2021}
}
This work was partly supported by Institute for Information &communications Technology Planning & Evaluation(IITP)grant funded by the Korea government(MSIT) (No. 2019-0-01396, Development of framework for analyzing, detecting,mitigating of bias in AI model and training data and No. 2021-0-01381, Development of Causal AI through Video Understanding and Reinforcement Learning)