Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning.

This repository contains the code used to produce the results in our paper - https://link.springer.com/article/10.1007/s10994-023-06458-y.

The algorithms used in the work can be found in the folder "Algorithms".

Our work makes use of the D4RL benchmarking suite. Installation instructions can be found here - https://github.com/Farama-Foundation/D4RL. Note that D4RL is based on OpenAI's MuJoCo - https://github.com/openai/mujoco-py (not Deepmind's).

Offline reinforcement learning

We provide individual examples trained on the D4RL datasets, one for each domain (MuJoCo, Maze2d, AntMaze, Adroit). To train on a different dataset, simply replace the dataset name under the "Load environment" section of the code. Note that agents are evaluated every 10,000 gradient updates only as a means of tracking progress and to check the code runs correctly. In our paper, all agents are trained for 1M gradient updates and the policy at the last iteration used for evaluation.

Online fine-tuning

We provide individual examples, one for each approach (TD3-BC-N and SAC-BC-N). To fine-tune on a different dataset, replace the dataset names under the "Load environment" section of the code. Remember to apply any data transformations to newly aquired interactions (e.g. state normalisation, reward scaling).

Computational efficiency

We provide examples of calculating computation time for 10,000 gradient updates for TD3-BC-N and SAC-BC-N. Other algorithms can be tested by simply amending the import.

Feedback

If you experience any problems or have any queries, please raise an issue or pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Algorithms		Algorithms
OnlineFT_SAC_BC_N_AntMaze_MD.py		OnlineFT_SAC_BC_N_AntMaze_MD.py
OnlineFT_TD3_BC_N_MuJoCo_Hopper_M.py		OnlineFT_TD3_BC_N_MuJoCo_Hopper_M.py
README.md		README.md
SAC_BC_N_Adroit_PC.py		SAC_BC_N_Adroit_PC.py
SAC_BC_N_AntMaze_MD.py		SAC_BC_N_AntMaze_MD.py
SAC_BC_N_Maze2d_M.py		SAC_BC_N_Maze2d_M.py
SAC_BC_N_MuJoCo_Hopper_M.py		SAC_BC_N_MuJoCo_Hopper_M.py
SpeedTest_SAC_BC_N.py		SpeedTest_SAC_BC_N.py
SpeedTest_TD3_BC_N.py		SpeedTest_TD3_BC_N.py
TD3_BC_N_Adroit_PC.py		TD3_BC_N_Adroit_PC.py
TD3_BC_N_AntMaze_MD.py		TD3_BC_N_AntMaze_MD.py
TD3_BC_N_Maze2d_M.py		TD3_BC_N_Maze2d_M.py
TD3_BC_N_MuJoCo_Hopper_M.py		TD3_BC_N_MuJoCo_Hopper_M.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning.

Offline reinforcement learning

Online fine-tuning

Computational efficiency

Feedback

About

Releases

Packages

Languages

AlexBeesonWarwick/OfflineRLConstrainedEnsemble

Folders and files

Latest commit

History

Repository files navigation

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning.

Offline reinforcement learning

Online fine-tuning

Computational efficiency

Feedback

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages