A repsitory for conducting multi-agent reinforcement learning experiments.
- Python 3.10.11
- Conda 23.7.2
(Tested using Ubuntu 20.04)
- Create the Conda environment from the
environment.yml
file. This will install all the necessary python dependencies on your system.conda env create -f environment.yml
- If you desire to run experiments using the SUMO traffic simulator, install SUMO and set the necessary environment variables (the example below sets the
SUMO_HOME
variable in your~/.bashrc
file).sudo add-apt-repository ppa:sumo/stable sudo apt-get update sudo apt-get install sumo sumo-tools sumo-doc echo 'export SUMO_HOME="/usr/share/sumo"' >> ~/.bashrc source ~/.bashrc
- Activate the environment.
conda activate marl-exp
- Install the marl_experiments package
pip install -e .
- It is possible to run experiments using command line arguments, but to make it easier to define experiments and maintain tracability for data processing, configurations are defined using configuration files. These files are then passed to the desired module via command line. For example, to run the actor-critic single objective learning algorithm:
NOTE: some files contain the "SUMO" suffix. This indicates that this file was speicifally updated to support using the
python ac-indpendent-SUMO.py -c experiments/sumo-4x4-dqn-independent.config
sumo-rl
environment but it should still be compatible with other PettingZoo environments as well. Likewise, it should also be possible to use thesumo-rl
environment with a module that does not contain the "SUMO" suffix. For example, to run the independent DQN with parameter sharing module using asumo-rl
configured experiment:In the future, the "SUMO" suffix will be removed from file names.python dqn-indpendent-ps.py -c experiments/sumo-2x2-ac-independent-ps.config
- The experiment results are logged in the
csv
directory and named using various elements defined in the configuration file. Additionally, the results can be viewed with TensorBoard. (Please install tensorboard to use this feature).tensorboard --logdir runs/
Currently, all pre-defined experiment files are configured to utilize the sumo-rl
environment but this can be changed by leveraging the gym-id
configuration parameter.
If you'd like to create your own experiment using a different environment and set of hyperparameters, we recommend using an existing file as a template. See here for examples of experiment configurations.
Coming soon...
- max-cycles: TODO: should be set to sumo-seconds when it exists, this variable tells the learning algorithm when to reset during training, it is not used for analysis purposes so online rollouts will run for all sumo-seconds even if max-cycles is less
Coming soon...
This repository primarily conducts experiments on environments from the PettingZoo library. Environments can be specified through configuration arguments. Specific environment configuration parameters are also configurable.
sumo-RL is a Python module that wraps the SUMO traffic simulator in a PettingZoo-style API. This allows for a SUMO simulation to be created in the same style as any PettingZoo library.
The SUMO environment uses a default reward structure defined here. Maximizing the change in total weighting times of all vehicles at a given step may not be the most intuitive reward function. While the method of defining the "best" state of traffic may be up for debate, most experiments in this project use the queue
reward function which returns the negative number of all vehicles stopped in the environment at the current step. Stopped vehicles are generally bad for traffic flow so the agents attempt to minimize this value. The wiki has additional notes regarding the sumo-rl environment.
We would like to thank the contributors of CleanRL for the base DQN implementation. We would also like to thank the contributors of PettingZoo and sumo-RL for maintaining MARL environments.