GitHub - shushushulian/ElegantRL: Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

ElegantRL is developped for researchers and practitioners with the following advantages:

Lightweight: The core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).
Efficient: in many testing cases, we find it more efficient than Ray RLlib.
Stable: much more stable than Stable Baselines 3. Stable Baselines 3 can only use single GPU, but ElegantRL can use 1~8 GPUs for stable training.

ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:

DDPG, TD3, SAC, PPO, PPO (GAE),REDQ for continuous actions
DQN, DoubleDQN, D3QN, SAC for discrete actions
QMIX, VDN; MADDPG, MAPPO, MATD3 for multi-agent environment

For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.

《诗经·小雅·鹤鸣》中「他山之石，可以攻玉」，是我们的库“小雅”名字的来源。

News

[Towardsdatascience] ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library
[Towardsdatascience] ElegantRL: Mastering PPO Algorithms
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I)
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II)

Framework (Helloworld folder)

An agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).

A high-level overview:

1). Instantiate an environment in Env.py, and an agent in Agent.py with an Actor network and a Critic network in Net.py;
2). In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer;
3). The agent fetches a batch of transitions from the Replay Buffer to train its networks;
4). After each update, an evaluator evaluates the agent's performance (e.g., fitness score or cumulative return) and saves the agent if the performance is good.

Code Structure

Core Codes

elegantrl/agents/net.py # Neural networks.
- Q-Net,
- Actor network,
- Critic network,
elegantrl/agents/Agent___.py # RL algorithms.
- AgentBase,
elegantrl/train/run___.py # run DEMO 1 ~ 4
- Parameter initialization,
- Training loop,
- Evaluator.

Until Codes

elegantrl/envs/ # gym env or custom env, including FinanceStockEnv.
- gym_utils.py: A PreprocessEnv class for gym-environment modification.
- Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
eRL_demo_BipedalWalker.ipynb # BipedalWalker-v2 in jupyter notebooks
eRL_demos.ipynb # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
eRL_demo_SingleFilePPO.py # Use a single file to train PPO, more simple than tutorial version
eRL_demo_StockTrading.py # Stock Trading Application in jupyter notebooks

Start to Train

Initialization:

hyper-parameters args.
env = PreprocessEnv() : creates an environment (in the OpenAI gym format).
agent = agent.XXX() : creates an agent for a DRL algorithm.
buffer = ReplayBuffer() : stores the transitions.
evaluator = Evaluator() : evaluates and stores the trained model.

Training (a while-loop):

agent.explore_env(…): the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.
agent.update_net(…): the agent uses a batch from the ReplayBuffer to update the network parameters.
evaluator.evaluate_save(…): evaluates the agent's performance and keeps the trained model with the highest score.

The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.

Experiments

Experimental Demos

LunarLanderContinuous-v2

BipedalWalkerHardcore-v2

Note: BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward. Check out an experiment video: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.

Requirements

Necessary:
| Python 3.6+     |           
| PyTorch 1.6+    |    

Not necessary:
| Numpy 1.18+     | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0      | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==0.18 pyglet==1.6. Change to gym==0.17.0, pyglet==1.5)
| pybullet 2.7+   | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots. 

pip3 install gym==0.17.0 pybullet Box2D matplotlib

To install StarCraftII env,
bash ./elegantrl/envs/installsc2.sh
pip install -r sc2_requirements.txt

Citation:

To cite this repository:

@misc{erl,
  author = {Liu, Xiao-Yang and Li, Zechu and Wang, Zhaoran and Zheng, Jiahao},
  title = {{ElegantRL}: A Scalable and Elastic Deep Reinforcement Learning Library},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AI4Finance-Foundation/ElegantRL}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,374 Commits
.idea		.idea
docs		docs
elegantrl		elegantrl
elegantrl_helloworld		elegantrl_helloworld
figs		figs
Awesome_Deep_Reinforcement_Learning_List.md		Awesome_Deep_Reinforcement_Learning_List.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
tutorial_BipedalWalker.ipynb		tutorial_BipedalWalker.ipynb
tutorial_Pendulum.ipynb		tutorial_Pendulum.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

Contents

News

Framework (Helloworld folder)

Code Structure

Core Codes

Until Codes

Start to Train

Initialization:

Training (a while-loop):

Experiments

Experimental Demos

Requirements

Citation:

About

Releases

Packages

Languages

License

shushushulian/ElegantRL

Folders and files

Latest commit

History

Repository files navigation

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

Contents

News

Framework (Helloworld folder)

Code Structure

Core Codes

Until Codes

Start to Train

Initialization:

Training (a while-loop):

Experiments

Experimental Demos

Requirements

Citation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages