The original repository is under the google-research repo, so this is our "fork."
Very minor changes, mostly to get things to work with newer libraries and to make evaluation consistent with our own work. No need to install anything new if VPACE is already installed. Only other expected requirement (also should be set in VPACE install) is environment variable VPACE_TOP_DIR.
To make installation a bit more streamlined, we have updated some packages and the requirements.txt file. After entering a conda or vitual env, try:
pip install tf-agents==0.19.0
pip install -r requirements.txt
- This will cause some gym/cloudpickle dependency issues, which can be ignored.
In addition to the above steps:
pip install tensorflow[and-cuda]
- Since the above will replace the tensorflow version, run
pip install tensorflow==2.15
- Confirm with
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
First, ensure VPACE_TOP_DIR
is set.
To produce sqil-theirs
and rce-theirs
results in VPACE, we used, e.g.:
bash all_seeds.bash sawyer_drawer_open test 0 true false
See all_seeds.bash
for details on meaning of these arguments.
Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov
NeurIPS 2021 (oral)
tldr: In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved. To address this problem of example-based control, we propose an off-policy algorithm, recursive-classification of examples (RCE). In contrast to prior approaches, out method does require expert demonstrations and never learns an explicit reward function. Nonetheless, this method retains many of the theoretical properties of reward-based learning, such as Bellman equations, where the standard reward function term is replaced by data. Experiments show that our approach outperforms these prior approaches on a range of simulated robotic manipulation tasks.
If you use this code, please consider adding the corresponding citation:
@inproceedings{
eysenbach2021replacing,
title={Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification},
author={Benjamin Eysenbach and Sergey Levine and Ruslan Salakhutdinov},
booktitle={Advances in Neural Information Processing Systems},
year={2021},
}
These instructions were tested in Google Cloud Compute with Ubuntu version 18.04.
Copy your mujoco key to ~/.mujoco/mjkey.txt
, then complete the steps below:
sudo apt install unzip gcc libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so
wget https://www.roboti.us/download/mujoco200_linux.zip -P /tmp
unzip /tmp/mujoco200_linux.zip -d ~/.mujoco
mv ~/.mujoco/mujoco200_linux ~/.mujoco/mujoco200
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco200/bin" >> ~/.bashrc
wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh
chmod +x Miniconda2-latest-Linux-x86_64.sh
./Miniconda2-latest-Linux-x86_64.sh
Restart your terminal so the changes take effect.
Updated 12/28/22. Thanks to Kevin Lin and Mingdong Wu for developing these revised instructions.
conda env create -f environment.yml
git clone https://github.com/rail-berkeley/d4rl.git
- In
d4rl/setup.py
, remove installation requirements ofmjrl@master
anddm_control@master
. - In the d4rl repository, run
pip install -e .
The following lines replicate the RCE experiments on the Sawyer tasks and Adept task using state-based observations. Please see the discussion below for an explanation of what the various command line options do.
python train_eval.py --root_dir=~/rce/sawyer_drawer_open --gin_bindings='train_eval.env_name="sawyer_drawer_open"'
python train_eval.py --root_dir=~/rce/sawyer_push --gin_bindings='train_eval.env_name="sawyer_push"'
python train_eval.py --root_dir=~/rce/sawyer_lift --gin_bindings='train_eval.env_name="sawyer_lift"'
python train_eval.py --root_dir=~/rce/door --gin_bindings='train_eval.env_name="door-human-v0"'
python train_eval.py --root_dir=~/rce/sawyer_box_close --gin_bindings='train_eval.env_name="sawyer_box_close"'
python train_eval.py --root_dir=~/rce/sawyer_bin_picking --gin_bindings='train_eval.env_name="sawyer_bin_picking"' --gin_bindings='critic_loss.q_combinator="max"' --gin_bindings='actor_loss.q_combinator="max"'
python train_eval.py --root_dir=~/rce/hammer --gin_bindings='train_eval.env_name="hammer-human-v0"'
To run SQIL, add the additional arguments:
--gin_bindings='critic_loss.loss_name="q"' --gin_bindings='train_eval.n_step=None'
If you have any questions, comments, or suggestions, please reach out to Benjamin Eysenbach ([email protected]).