DSRL (Datasets for Safe Reinforcement Learning) provides a rich collection of datasets specifically designed for offline Safe Reinforcement Learning (RL). Created with the objective of fostering progress in offline safe RL research, DSRL bridges a crucial gap in the availability of safety-centric public benchmarks and datasets.
DSRL provides:
- Diverse datasets: 38 datasets across different safe RL environments and difficulty levels in SafetyGymnasium, BulletSafetyGym, and MetaDrive, all prepared with safety considerations.
- Consistent API with D4RL: For easy use and evaluation of offline learning methods.
- Data post-processing filters: Allowing alteration of data density, noise level, and reward distributions to simulate various data collection conditions.
This package is a part of a comprehensive benchmarking suite that includes FSRL and OSRL and aims to promote advancements in the development and evaluation of safe learning algorithms.
We provided a detailed breakdown of the datasets, including all the environments we use, the dataset sizes, and the cost-reward-return plot for each dataset. These details can be found in the docs folder.
To learn more, please visit our project website.
DSRL is currently hosted on PyPI, you can simply install it by:
pip install dsrl
It will by default install bullet-safety-gym
and safety-gymnasium
environments automatically.
If you want to use the MetaDrive
environment, please install it via:
pip install git+https://github.com/HenryLHH/metadrive_clean.git@main
Pull this repo and install:
git clone https://github.com/liuzuxin/DSRL.git
cd DSRL
pip install -e .
You can also install the MetaDrive
package simply by specify the option:
pip install -e .[metadrive]
DSRL uses the Gymnasium API. Tasks are created via the gymnasium.make
function. Each task is associated with a fixed offline dataset, which can be obtained with the env.get_dataset()
method. This method returns a dictionary with:
observations
: An N × obs_dim array of observations.next_observations
: An N × obs_dim of next observations.actions
: An N × act_dim array of actions.rewards
: An N dimensional array of rewards.costs
: An N dimensional array of costs.terminals
: An N dimensional array of episode termination flags. This is true when episodes end due to termination conditions such as falling over.timeouts
: An N dimensional array of termination flags. This is true when episodes end due to reaching the maximum episode length.
The usage is similar to D4RL. Here is an example code:
import gymnasium as gym
import dsrl
# Create the environment
env = gym.make('OfflineCarCircle-v0')
# Each task is associated with a dataset
# dataset contains observations, next_observatiosn, actions, rewards, costs, terminals, timeouts
dataset = env.get_dataset()
print(dataset['observations']) # An N x obs_dim Numpy array of observations
# dsrl abides by the OpenAI gym interface
obs, info = env.reset()
obs, reward, terminal, timeout, info = env.step(env.action_space.sample())
cost = info["cost"]
# Apply dataset filters [optional]
# dataset = env.pre_process_data(dataset, filter_cfgs)
Datasets are automatically downloaded to the ~/.dsrl/datasets
directory when get_dataset()
is called. If you would like to change the location of this directory, you can set the $DSRL_DATASET_DIR
environment variable to the directory of your choosing, or pass in the dataset filepath directly into the get_dataset
method.
You can use run the following example scripts to play with the offline dataset of all the supported environments:
python examples/run_mujoco.py --agent [your_agent] --task [your_task]
python examples/run_bullet.py --agent [your_agent] --task [your_task]
python examples/run_metadrive.py --road [your_road] --traffic [your_traffic]
- Set target cost by using
env.set_target_cost(target_cost)
function, wheretarget_cost
is the undiscounted sum of costs of an episode - You can use the
env.get_normalized_score(return, cost_return)
function to compute a normalized reward and cost for an episode, wherereturns
andcost_returns
are the undiscounted sum of rewards and costs respectively of an episode. - The individual min and max reference returns are stored in
dsrl/infos.py
for reference.
All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.