Description

A repsitory for conducting multi-agent reinforcement learning experiments.

Getting Started

Requirements

Python 3.10.11
Conda 23.7.2

Instructions

(Tested using Ubuntu 20.04)

Create the Conda environment from the environment.yml file. This will install all the necessary python dependencies on your system.
```
conda env create -f environment.yml
```
If you desire to run experiments using the SUMO traffic simulator, install SUMO and set the necessary environment variables (the example below sets the SUMO_HOME variable in your ~/.bashrc file).
```
sudo add-apt-repository ppa:sumo/stable
sudo apt-get update
sudo apt-get install sumo sumo-tools sumo-doc

echo 'export SUMO_HOME="/usr/share/sumo"' >> ~/.bashrc
source ~/.bashrc
```
Activate the environment.
```
conda activate marl-exp
```
Install the marl_experiments package
```
pip install -e .
```
It is possible to run experiments using command line arguments, but to make it easier to define experiments and maintain tracability for data processing, configurations are defined using configuration files. These files are then passed to the desired module via command line. For example, to run the actor-critic single objective learning algorithm:
```
python ac-indpendent-SUMO.py -c experiments/sumo-4x4-dqn-independent.config
```
NOTE: some files contain the "SUMO" suffix. This indicates that this file was speicifally updated to support using the sumo-rl environment but it should still be compatible with other PettingZoo environments as well. Likewise, it should also be possible to use the sumo-rl environment with a module that does not contain the "SUMO" suffix. For example, to run the independent DQN with parameter sharing module using a sumo-rl configured experiment:
```
python dqn-indpendent-ps.py -c experiments/sumo-2x2-ac-independent-ps.config
```
In the future, the "SUMO" suffix will be removed from file names.
The experiment results are logged in the csv directory and named using various elements defined in the configuration file. Additionally, the results can be viewed with TensorBoard. (Please install tensorboard to use this feature).
```
tensorboard --logdir runs/
```

Configuring an Experiment

Currently, all pre-defined experiment files are configured to utilize the sumo-rl environment but this can be changed by leveraging the gym-id configuration parameter. If you'd like to create your own experiment using a different environment and set of hyperparameters, we recommend using an existing file as a template. See here for examples of experiment configurations.

Configuration Parameter Definitions

Coming soon...

max-cycles: TODO: should be set to sumo-seconds when it exists, this variable tells the learning algorithm when to reset during training, it is not used for analysis purposes so online rollouts will run for all sumo-seconds even if max-cycles is less

List of Implemented Algorithms

Coming soon...

Environments

PettingZoo

This repository primarily conducts experiments on environments from the PettingZoo library. Environments can be specified through configuration arguments. Specific environment configuration parameters are also configurable.

SUMO

sumo-RL is a Python module that wraps the SUMO traffic simulator in a PettingZoo-style API. This allows for a SUMO simulation to be created in the same style as any PettingZoo library.

Understanding the Reward Structure of the SUMO Environment

The SUMO environment uses a default reward structure defined here. Maximizing the change in total weighting times of all vehicles at a given step may not be the most intuitive reward function. While the method of defining the "best" state of traffic may be up for debate, most experiments in this project use the queue reward function which returns the negative number of all vehicles stopped in the environment at the current step. Stopped vehicles are generally bad for traffic flow so the agents attempt to minimize this value. The wiki has additional notes regarding the sumo-rl environment.

Acknowledgments

We would like to thank the contributors of CleanRL for the base DQN implementation. We would also like to thank the contributors of PettingZoo and sumo-RL for maintaining MARL environments.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
experiments		experiments
marl_utils		marl_utils
nets		nets
rl_core		rl_core
sumo_utils		sumo_utils
.gitignore		.gitignore
README.md		README.md
ac-independent-SUMO.py		ac-independent-SUMO.py
ac-independent-hybrid-target.py		ac-independent-hybrid-target.py
batch_policy_learning_ablation_study.py		batch_policy_learning_ablation_study.py
dqn-centralized.py		dqn-centralized.py
dqn-independent-SUMO.py		dqn-independent-SUMO.py
dqn-independent-empathetic.py		dqn-independent-empathetic.py
dqn-independent-greedy.py		dqn-independent-greedy.py
dqn-independent-ps.py		dqn-independent-ps.py
dqn-independent.py		dqn-independent.py
environment.yml		environment.yml
offline_batch_RL_policy_learning.py		offline_batch_RL_policy_learning.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Getting Started

Requirements

Instructions

Configuring an Experiment

Configuration Parameter Definitions

List of Implemented Algorithms

Environments

PettingZoo

SUMO

Understanding the Reward Structure of the SUMO Environment

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

HIRO-group/marl-experiments

Folders and files

Latest commit

History

Repository files navigation

Description

Getting Started

Requirements

Instructions

Configuring an Experiment

Configuration Parameter Definitions

List of Implemented Algorithms

Environments

PettingZoo

SUMO

Understanding the Reward Structure of the SUMO Environment

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages