About

This is the official codebase of 'A dynamical clipping approach with task feedback for Proximal Policy Optimization' paper_link (our lastest version arxiv will be released soon)

In this research, we treat the clipping bounds secltion as a multi-arm bandit problem. And solve this problem via introducing Upper Confidence Bound (UCB), recommending the clipping bound with the highest UCB value in each iterations.

Project Requirement

Requirement
linux platform
mujoco
stable_baselines3
gym
torch

Running Examples

Citations

If you utilize our codebase, please cite below:

@misc{zhang2024dynamicalclippingapproachtask,
      title={A dynamical clipping approach with task feedback for Proximal Policy Optimization}, 
      author={Ziqi Zhang and Jingzehua Xu and Zifeng Zhuang and Hongyin Zhang and Jinxin Liu and Donglin wang and Shuai Zhang},
      year={2024},
      eprint={2312.07624},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2312.07624}, 
}

Thanks

Our codebase is built upon stable_baselines3 program_link

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
LICENSE		LICENSE
README.md		README.md
mujoco_main.py		mujoco_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Project Requirement

Running Examples

Citations

Thanks

About

Releases

Packages

Languages

License

stevezhangzA/pb_ppo

Folders and files

Latest commit

History

Repository files navigation

About

Project Requirement

Running Examples

Citations

Thanks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages