About

This is the official codebase of 'A dynamical clipping approach with task feedback for Proximal Policy Optimization' paper_link (our lastest version arxiv will be released soon)

In this research, we treat the clipping bounds secltion as a multi-arm bandit problem. And solve this problem via introducing Upper Confidence Bound (UCB), recommending the clipping bound with the highest UCB value in each iterations.

Project Requirement

Requirement
linux platform
mujoco
stable_baselines3
gym
torch

Running Examples

Citations

If you utilize our codebase, please cite below:

@misc{zhang2024dynamicalclippingapproachtask,
      title={A dynamical clipping approach with task feedback for Proximal Policy Optimization}, 
      author={Ziqi Zhang and Jingzehua Xu and Zifeng Zhuang and Hongyin Zhang and Jinxin Liu and Donglin wang and Shuai Zhang},
      year={2024},
      eprint={2312.07624},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2312.07624}, 
}

Thanks

Our codebase is built upon stable_baselines3 program_link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

README.md

README.md

About

Project Requirement

Running Examples

Citations

Thanks

Files

README.md

Latest commit

History

README.md

File metadata and controls

About

Project Requirement

Running Examples

Citations

Thanks