A high-performance framework for reinforcement learning research utilizing NVIDIA's Isaac Gym physics simulation engine.
This framework implements state-of-the-art reinforcement learning algorithms in a highly parallelized environment, leveraging NVIDIA's Isaac Gym for physics-based simulation. The implementation focuses on maximizing computational efficiency through GPU acceleration and vectorized environments.
- Operating System: Ubuntu 18.04 or 20.04 LTS
- NVIDIA GPU with CUDA support
- Python 3.7+
- NVIDIA Isaac Gym (Preview 4)
- PyTorch
- Weights & Biases (for experiment tracking)
- CUDA Toolkit
- Download Isaac Gym Preview 4 from the NVIDIA Developer Portal
- Follow the installation instructions in the Isaac Gym documentation
- Install additional dependencies:
pip install -r requirements.txt
- Proximal Policy Optimization (PPO)
- Soft Actor-Critic (SAC)
- Highly parallelized environment simulation
- GPU-accelerated physics computation
- Integrated experiment tracking via Weights & Biases
- Multi-GPU support for distributed training
PPO Implementation:
python3 train.py task=RLTask train=TaskPPO headless=True wandb_activate=True
SAC Implementation:
python3 train.py task=RLTask train=TaskSAC headless=True wandb_activate=True
For distributed training across multiple GPUs:
torchrun --standalone --nnodes=1 nproc_per_node=3 train.py \
multi_gpu=True \
task=RLTask \
train=TaskPPO \
headless=True \
wandb_activate=True
To evaluate a trained model:
python3 train.py task=RLTask train=TaskSAC test=True \
checkpoint=runs/TaskSAC_xxx/nn/TaskSAC.pth
A CPU-based implementation using Stable Baselines3 is available in the sb3_implementation
directory for systems without NVIDIA GPU support. While this implementation maintains algorithm fidelity, it does not achieve the same performance characteristics as the GPU-accelerated version.
Our ongoing research focuses on several key areas:
- Policy Optimization: Investigation of advanced policy optimization techniques beyond vanilla implementations
- Experience Replay: Analysis of sophisticated replay buffer mechanisms for off-policy algorithms
- Reward Engineering: Development of more nuanced reward functions to guide policy learning
- Human Feedback Integration: Exploration of RLHF (Reinforcement Learning with Human Feedback) methodologies
This research builds upon the foundational work of the NVIDIA Isaac Gym team. Core functionalities are derived from the Isaac Gym Environments repository.
This project is released under the MIT License. See the LICENSE file for details.