This repository contains my first stab at training a locomotion policy for the Unitree G1 (23DOF) using Isaac Lab. I also implemented symmetry loss and removed the phase-based rewards. First stab at sim2sim deployment using Mujoco is also included.
The following readme.md was generated by Cursor - please forgive its slop vibe
g1-23dof-isaac-lab-current.mp4
# Basic training
python scripts/rsl_rl/train.py --task=Loco --headless
# Play trained policy
python scripts/rsl_rl/play.py --task Loco --num_envs 32
# Run parameter sweep with automatic analysis
python sweep_script_with_analysis.py
├── scripts/rsl_rl/ # Training and evaluation scripts
│ ├── train.py # Main training script
│ ├── play.py # Policy evaluation script
│ └── cli_args.py # Command line argument definitions
├── deployment/ # Sim2sim deployment
│ ├── deploy_sim.py # MuJoCo deployment script
│ ├── policy.pt # Trained policy weights
│ └── g1_description/ # Robot description files
├── source/ # Isaac Lab environment source
│ └── g1_23dof_locomotion_isaac/
│ └── tasks/manager_based/g1_23dof_locomotion_isaac/
│ ├── g1_23dof_locomotion_isaac_env_cfg.py # Environment configuration
│ ├── agents/rsl_rl_ppo_cfg.py # PPO agent configuration
│ └── events.py # Domain randomization events
├── sweep_script_with_analysis.py # Parameter sweep automation
├── sweep_analyzer.py # Sweep results analysis
└── outputs/ # Training outputs and logs
When running sweep_script_with_analysis.py
, ensure the experiment name is consistent across:
sweep_script_with_analysis.py
(line 8)source/g1_23dof_locomotion_isaac/g1_23dof_locomotion_isaac/tasks/manager_based/g1_23dof_locomotion_isaac/agents/rsl_rl_ppo_cfg.py
- Experiment Name:
g1_23dof_sweep_v16
- Resume from Run: 27 (configurable in sweep script)
The active sweep investigates:
env.rewards.feet_air_time.weight
: [1.0, 3.0, 5.0]env.rewards.both_feet_air.weight
: [-0.1, -0.5, 0.0]env.rewards.action_rate_l2.weight
: [-0.005, -0.01]env.rewards.joint_deviation_arms.weight
: [-0.1, -0.2]
The deployment/
folder contains MuJoCo-based deployment scripts:
cd deployment
python deploy_sim.py
- WASD: Forward/backward/left/right movement
- Q/E: Yaw rotation
- Arrow Keys: Alternative control scheme
- Domain Randomization: Current sim2sim performance is poor - need to significantly expand randomization parameters (friction, mass, motor gains, terrain properties) to improve transfer robustness
- Real-time Policy Control: Need to implement real-time policy switching/control in Isaac Sim to verify if poor performance is due to sim2sim gap vs human control limitations
This project is based on the Isaac Lab framework and follows the same licensing terms.
Note: This is a work in progress. The current implementation focuses on symmetry-based learning approaches and requires further domain randomization work for robust sim2sim deployment.