Key Features | Documentation | Installation | Quick Start | Contributing
The Fast Safe Reinforcement Learning (FSRL) package provides modularized implementations of Safe RL algorithms based on PyTorch and the Tianshou framework. Safe RL is a rapidly evolving subfield of RL, focusing on ensuring the safety of learning agents during the training and deployment process. The study of Safe RL is essential because it addresses the critical challenge of preventing unintended or harmful actions while still optimizing an agent's performance in complex environments.
This project offers high-quality and fast implementations of popular Safe RL algorithms, serving as an ideal starting point for those looking to explore and experiment in this field. By providing a comprehensive and accessible toolkit, the FSRL package aims to accelerate research in this crucial area and contribute to the development of safer and more reliable RL-powered systems. Your feedback and contributions are highly appreciated, as they help us improve the FSRL package.
To learn more, please visit our project website. If you find this code useful, please cite our paper, which has been accepted by the DMLR journal:
@article{
liu2024offlinesaferl,
title={Datasets and Benchmarks for Offline Safe Reinforcement Learning},
author={Zuxin Liu and Zijian Guo and Haohong Lin and Yihang Yao and Jiacheng Zhu and Zhepeng Cen and Hanjiang Hu and Wenhao Yu and Tingnan Zhang and Jie Tan and Ding Zhao},
journal={Journal of Data-centric Machine Learning Research},
year={2024}
}
FSRL is designed with several key aspects in mind:
- High-quality implementations. For instance, the CPO implementation by SafetyGym fails to satisfy constraints according to their benchmark results. As a result, many safe RL papers that adopt these implementations may also report failure results. However, we discovered that with appropriate hyper-parameters and our implementation, it can achieve good safety performance in most tasks as well.
- Fast training speed. FSRL cares about accelerating experimentation and benchmarking processes, providing fast training times for popular safe RL tasks. For example, most algorithms can solve the SafetyCarCircle-v0 task in 10 minutes with 4 cpus. The CVPO algorithm implementation can also achieve 5x faster training than the original implementation.
- Well-tuned hyper-parameters. We carefully studied the effects of key hyperparameters for these algorithms and plan to provide a practical guide for tuning them. We believe both implementations and hyper-parameters play a crucial role in learning a successful safe RL agent.
- Modular design and easy usability. FSRL is built upon the elegant RL framework Tianshou. We provide an agent wrapper, refactored loggers for both Tensorboard and Wandb, and pyrallis configuration support to further facilitate usage. Our algorithms also support multiple constraints and standard RL tasks (like Mujoco).
The implemented safe RL algorithms include:
Algorithm | Type | Description |
---|---|---|
CPO | on-policy | Constrained Policy Optimization |
FOCOPS | on-policy | First Order Constrained Optimization in Policy Space |
PPOLagrangian | on-policy | PPO with PID Lagrangian |
TRPOLagrangian | on-policy | TRPO with PID Lagrangian |
DDPGLagrangian | off-on-policy (1) | DDPG with PID Lagrangian |
SACLagrangian | off-on-policy (1) | SAC with PID Lagrangian |
CVPO | off-policy | Constrained Variational Policy Optimization |
(1): Off-on-policy means that the base learning algorithm is off-policy, but the Lagrange multiplier is updated in an on-policy fashion. Our previous finding suggested that using off-policy style Lagrange update may result in poor performance
The implemented algorithms are well-tuned for many tasks in the following safe RL environments, which cover the majority of tasks in recent safe RL papers:
- BulletSafetyGym, FSRL will install this environment by default as the testing ground.
- SafetyGymnasium, note that you need
to install it from the source because our current version adopts the
gymnasium
API.
Note that the latest versions of FSRL and the above environments use the gymnasium >= 0.26.3
API. But if you want to use the old gym
API such as the safety_gym
, you can simply change the example scripts from import gymnasium as gym
to import gym
.
The tutorials and API documentation are hosted on fsrl.readthedocs.io.
The majority of the API design in FSRL follows Tianshou, and we aim to reuse their modules as much as possible. For example, the Env, Batch, Buffer, and (most) Net modules are used directly in our repo. This means that you can refer to their comprehensive documentation to gain a good understanding of the code structure. We highly recommend you read the following Tianshou tutorials:
- Get Started with Jupyter Notebook. You can get a quick overview of different modules through this tutorial.
- Basic concepts in Tianshou. Note that the basic concepts in FSRL are almost the same as Tianshou.
- Understanding Batch. Note that the Batch data structure is extensively used in this repo.
We observe that for most existing safe RL environments, a few layers of neural networks can solve them quite effectively. Therefore, we provide an 'Agent' class with default MLP networks to facilitate the usage. You can refer to the tutorial for more details.
Example training and evaluation scripts for both default MLP agent and customized networks are available at the examples folder.
FSRL requires Python >= 3.8. You can install it from source by:
git clone https://github.com/liuzuxin/fsrl.git
cd fsrl
pip install -e .
You can also directly install it with pip through GitHub:
pip install git+https://github.com/liuzuxin/fsrl.git@main --upgrade
You can check whether the installation is successful by:
import fsrl
print(fsrl.__version__)
This is an example of training a PPO-Lagrangian agent with a Tensorboard logger and default parameters.
First, import relevant packages:
import bullet_safety_gym
import gymnasium as gym
from tianshou.env import DummyVectorEnv
from fsrl.agent import PPOLagAgent
from fsrl.utils import TensorboardLogger
Then initialize the environment, logger, and agent:
task = "SafetyCarCircle-v0"
# init logger
logger = TensorboardLogger("logs", log_txt=True, name=task)
# init the PPO Lag agent with default parameters
agent = PPOLagAgent(gym.make(task), logger)
# init the envs
training_num, testing_num = 10, 1
train_envs = DummyVectorEnv([lambda: gym.make(task) for _ in range(training_num)])
test_envs = DummyVectorEnv([lambda: gym.make(task) for _ in range(testing_num)])
Finally, start training:
agent.learn(train_envs, test_envs, epoch=100)
You can check the experiment results in the logs/SafetyCarCircle-v0
folder.
We provide easy-to-use example training script for all the agents in the examples/mlp
folder. Each training script is by default use the Wandb logger and Pyrallis configuration system. The default hyper-parameters are located the fsrl/config
folder.
You have three alternatives to run the experiment with your customized hyper-parameters:
python examples/mlp/train_ppol_agent.py --arg value --arg2 value2 ...
where --arg
specify the parameter you want to override. For example, --task SafetyAntRun-v0
. Note that if you specify --use_default_cfg 1
, the script will automatically use the task's default parameters for training. We plan to release more default configs in the future.
For example, you want to use a different learning-rate and training epochs from our default ones, create a my_cfg.yaml
:
task: "SafetyDroneCircle-v0"
epoch: 500
lr: 0.001
Then you can starting training with above parameters by:
python examples/mlp/train_ppol_agent.py --config my_cfg.yaml
where --config
specify the path of the configuration parameters.
For example, you can inherent the PPOLagAgent
config by:
from dataclasses import dataclass
from fsrl.config.ppol_cfg import TrainCfg
@dataclass
class MyCfg(TrainCfg):
task: str = "SafetyDroneCircle-v0"
epoch: int = 500
lr: float = 0.001
@pyrallis.wrap()
def train(args: MyCfg):
...
Then, you can start training with your own default configs:
python examples/mlp/train_ppol_agent.py
Note that our example scripts support the auto_name
feature, meaning that it can automatically compare your specified hyper-parameters with our default ones, and create the experiment name based on the difference. The default training statistics are saved in the logs
directory.
While the pre-defined MLP agent is sufficient for solving many existing safe RL benchmarks, for more complex tasks, it may be necessary to customize the value and policy networks. Our modular design supports Tianshou's style training scripts. Example training scripts can be found in the examples/customized
folder. For more details on building networks, please refer to Tianshou's tutorial, as our algorithms are mostly compatible with their networks.
To evaluate a trained model, for example, a pre-trained PPOLag model in the logs/exp_name
folder, run:
python examples/mlp/eval_ppol_agent.py --path logs/exp_name --eval_episodes 20
It will load the saved config.yaml
from logs/exp_name/config.yaml
and pre-trained model from logs/exp_name/checkpoint/model.pt
, run 20 episodes and print the average reward and cost. If the best
model is saved during training, you can evaluate it by setting --best 1
.
FSRL is heavily inspired by the Tianshou project. In addition, there are several other remarkable safe RL-related projects:
- Safety-Gymnasium, a well-maintained and customizable safe RL environments based on Mujoco.
- Bullet-Safety-Gym, a tuned and fast safe RL environments based on PyBullet.
- Safe-Multi-Agent-Mujoco, a multi-agent safe RL environments based on Mujoco.
- Safe-Control-Gym, a learning-based control and RL library with PyBullet.
- OmniSafe, a well-maintained infrastructural framework for safe RL algorithms.
- SafePO, another benchmark repository for safe RL algorithms.
The main maintainers of this project are: Zuxin Liu, Zijian Guo.
If you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community!