This Repository is Reinforcece Learning Implementation related with PPO. The framework used in this Repository is Pytorch. The multi-processing method is basically built in. The agents are trained by PAAC(Parallel Advantage Actor Critic) strategy.
- Script : LunarLander_ppo.py
- Environment : LunarLander-v2
- Orange : 8 Process, Blue : 4 Process, Red : 1 Process
- Script : Breakout_ppo.py
- Environment : BreakoutDeterministic-v4
- Red: 8 Process, Blue: 4 Process, Orange: 1 Process
- Script : Breakout_ppo_icm.py
- Environment : BreakoutNoFrameskip-v4(handled by custom environment)
- With no environment Reward
- Because the game initial key is not selected, the peak point and performance drop is generated.
- Left : Comparison between (extrinsic reward and intrinsic, oragne) and (only intrinsic reward, gray), the average of three times of experiment
- Right : only intrinsic reward
- 32 process
- Script : MountainCar_ppo_icm.py
- Environment : MountainCart-v0
- With no environment Reward
- 32 process
- Script : PushBlock_ppo_icm.py
- Environment : PushBlock
- 32 Environment, PAAC
- orange : 0.5int + 0.5ext, blue : only int, Red : only ext
- reward shaping for sparse-reward environment : sucess - 1, others - 0
- The environment has not sparsed-reward property even if the reward is engineered to two categories(0, 1)
- Script : Pyramid_ppo_icm.py
- Environment : Pyramid
- 16 Environment, PAAC
- orange : only ext, blue : 0.01int + 0.99ext
[1] mario_rl
[2] Proximal Policy Optimization
[2] Efficient Parallel Methods for Deep Reinforcement Learning
[3] High-Dimensional Continuous Control Using Generalized Advantage Estimation
[4] Curiosity-driven Exploration by Self-supervised Prediction
[5] Large-Scale Study of Curiosity-Driven Learning
[6] curiosity-driven-exploration-pytorch
[7] ml-agents