The repository contains the code for the paper:
Optimizing Pretrained Transformers for Autonomous Driving by Vasileios Kochliaridis, Evangelos Kostinoudis, Ioannis Vlahavas
This repository contains code to train a agent in the Carla simulator. Specifically, we use both Imitation Learning (IL) and Reinforcement Learning (RL) techniques to train an agent. The objective it to use IL to train a basic agent and the RL to fine tune it in order to achieve better performance.
This repository supports only the
CIL++ architecture for the agent
(with many variations). To use this model, you must download it from the link
(_results.tar.gz)
that is provided in the GitHub repo. After that you must extract the data in the
models/CILv2_multiview/_results folder. You can use the commands:
mkdir -p models/CILv2_multiview/_results
tar -zxvf _results.tar.gz -C models/CILv2_multiview/_resultsIf you want to use the environment to train your model, take a look at CILv2_env which is the environment for the CIL++ architecture.
We recommend to use a wrapper class (like in the CILv2_sub_env) to run the environment in a separate process to avoid ending the training when the Carla server crashes. To do this just copy this file and change the environment class.
You also need to modify the training scripts in order to use your model.
As we previously mentioned, there are two training faces:
- Imitation Learning
- Reinforcement Leaning
In both cases you must install the requirements that we provide. In order to do this, we recommend to create first a Python 3.10 virtual environment (with the program of chaise). Then use pip install the requirements with the command:
pip install -r requirements.txtThere are two options for the IL training. These options are:
- policy_head_training.py which uses the DataParallel to train the model.
- policy_head_training_accelerate.py which uses the Accelerate library to train the model. These library uses Distributed Data Parallel in the background. This is the recommended option!
Both these option can be configured using the configuration file IL_CIL_multiview_actor_critic.yaml.
We provide the script for an easier execution of the training (works only for the 2nd option with the Accelerate library). In order to launch the training run:
bash train/launch_policy_head_training_accelerate.shAlso you can provide flags of the python script in this bash script, for example:
bash train/launch_policy_head_training_accelerate.sh --clean --all-weightsThe data that we used in this phase are the data provided in the CIL++ repository. Specially, we used the part 7-14 for the training. Also, we used to part 6 for the evaluation, but we recommend to use other or more data for a proper evaluation.
For the RL we used Carla 0.9.15. You can use the GitHub releases in order to download this version, or take a look at the Carla website for installation instructions.
NOTE: To avoid adding files to the path for the Carla, we use the pip
package for the Carla and we copied the agents folder in this repository.
For the RL training we use the RLlib library. For the RL training we support two algorithms:
- Proximal Policy Optimization (PPO) provided by the RLlib and enhanced with more options by us.
- Phasic Policy Gradient (PPG) we implemented this algorithm on to of the PPO
implementation of the RLlib. If we want to see how we impemented it, take a
look at the PPG related files in the
trainfolder.
The PPO algorithm can be configured from the train_ppo_config.yaml file. This file controls the algorithm related option for the training. Take a look at the file for the training options.
In order to run the training you can use the command:
RAY_DEDUP_LOGS=0 PYTHONPATH=. python3 train/train_ppo.pyNOTE: Take a look bellow for option for the environment.
As in PPO algorithm, the PPG algorithm can be configured from the train_ppg_config.yaml file. This file controls the algorithm related option for the training. Take a look at the file for the training options.
We should mention that this is a "hacked" version of the algorithm just to work in our needs for this project.
In order to run the training you can use the command:
RAY_DEDUP_LOGS=0 PYTHONPATH=. python3 train/train_ppg.pyNOTE: Take a look bellow for option for the environment.
For the environment configuration we use a different file environment_conf.yaml. This configuration file controls options such as the reward, the scenarios or routes the we train the agent, server options and more.
We support 3 different option for routes-scenarios for training the agent. These are:
- Routes: These are the routes that the scenario
runner provides. These
are predefined routes with various challenges for the agent. It can
configured using the option
run_type: route - Scenarios: These are also provided by the scenario
runner. It can be used
by the option
run_type: scenario. We didn't used this option for training, but if you are interested you can use it. - Free ride: In these case the agent ride in the from a random generated
route. This can be configured with
run_type: free ride(or any value exceptrouteandscenario).
For all these option, in the configuration file you find many option in order to configured the training in your needs.
In the RL training there are two options for running the Carla server.
- Spawn the Carla server by yourself. To use this option you must set
use_carla_launcher: falsein the environment configuration file. In this setting the server ports must start from the given port (in the configuration file, specified by theport's value, and be separated by 4. For example, if we have 2 servers andport: 2000, then the two servers will run using the ports 2000 and 2004. - Let the Carla Launcher handle the
environment spawning. In order to use this option you must:
- Set
use_carla_launcher: truein the environment config. - Pass a shell command that spawns a Carla server with the
carla_launch_scriptkey in the environment config. This script gets as the first argument the port of the server and as an optional second argument the GPU index to spawn the server (to use this, you must give the number of GPUs with thenum_deviceskey). We provide the launch_carla_server.sh example script. You can modify it for your needs and use it (carla_launch_script: "bash train/launch_carla_server.sh").
- Set
We evaluated the models in the Leaderboard 1.0 using the Carla version 0.9.15. The evaluated the following models:
- CIL++: The CIL++ model.
- CIL++ (stochastic): Our stochastic version of the CIL++ model. This model has a stochastic output and is trained on the same data as the CIL++.
- RL: The RL fine tuned model.
You can get the model weights in the drive. The following models are provided:
- CIL++ (stochastic): Named
CIL_multiview_actor_critic_stochastic.pth. - RL: Named
CIL_multiview_actor_critic_ppg.pth.
We used the Leaderboard 1.0 test routes for the evaluation.
| **Metric ** | CIL++ | CIL++ (stochastic) | RL |
|---|---|---|---|
| Avg. driving score↑ | 2.593 | 3.047 | 10.019 |
| Avg. route completion↑ | 10.932 | 8.293 | 14.484 |
| Avg. infraction penalty↑ | 0.404 | 0.461 | 0.654 |
| Collisions with pedestrians↓ | 0.0 | 0.0 | 0.0 |
| Collisions with vehicles↓ | 256.214 | 247.457 | 52.842 |
| Collisions with layout↓ | 411.01 | 461.453 | 255.232 |
| Red lights infractions↓ | 9.34 | 0.0 | 7.975 |
| Stop sign infractions↓ | 5.767 | 0.0 | 0.0 |
| Off-road infractions↓ | 253.666 | 332.031 | 186.474 |
| Route deviations↓ | 0.0 | 104.91 | 76.737 |
| Route timeouts↓ | 0.0 | 0.0 | 0.0 |
| Agent blocked↓ | 398.996 | 362.588 | 222.645 |
We used the longest6 test.
| **Metric ** | CIL++ | CIL++ (stochastic) | RL |
|---|---|---|---|
| Avg. driving score↑ | 2.674 | 1.837 | 5.69 |
| Avg. route completion↑ | 9.052 | 4.881 | 11.182 |
| Avg. infraction penalty↑ | 0.357 | 0.4 | 0.494 |
| Collisions with pedestrians↓ | 5.29 | 16.767 | 0.0 |
| Collisions with vehicles↓ | 262.602 | 392.410 | 105.94 |
| Collisions with layout↓ | 820.255 | 760.246 | 501.88 |
| Red lights infractions↓ | 27.854 | 117.091 | 115.987 |
| Stop sign infractions↓ | 7.647 | 0.0 | 3.617 |
| Off-road infractions↓ | 543.098 | 559.018 | 330.344 |
| Route deviations↓ | 42.459 | 138.670 | 101.009 |
| Route timeouts↓ | 0.0 | 7.055 | 0.0 |
| Agent blocked↓ | 543.86 | 657.512 | 357.555 |
Please cite our work if you found it useful:
@inproceedings{10.1145/3688671.3688778,
author = {Kochliaridis, Vasileios and Kostinoudis, Evaggelos and Vlahavas, Ioannis},
title = {Optimizing Pretrained Transformers for Autonomous Driving},
year = {2024},
isbn = {9798400709821},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3688671.3688778},
doi = {10.1145/3688671.3688778},
abstract = {Vision-based end-to-end driving systems have demonstrated impressive capabilities through the utilization of large Transformer architectures. More specifically, researchers have combined Transformers with Imitation Learning, in order to construct agents that learn to map navigation states to actions from large datasets created by experts. Although this approach usually works well, it relies on specific datasets and expert actions, and thus achieving limited generalization capability, which can be quite catastrophic in uncertain navigation environments, such as urban areas. To overcome this limitation, we further expand the training process of the agent by applying the Phasic Policy Gradient algorithm, a Deep Reinforcement Learning (DRL) method that improves its generalization capability by enabling it to explore and interact with the environment. We further enhance our approach by integrating a custom reward function that penalizes the weaknesses of the pretrained agent, alongside with additional DRL techniques to enhance its efficiency and accelerate convergence. Our experimental results in the CARLA simulation environment demonstrate that our approach not only achieves robustness in comparison to previous approaches, but also shows potential for wider application in similar navigation scenarios.},
booktitle = {Proceedings of the 13th Hellenic Conference on Artificial Intelligence},
articleno = {22},
numpages = {9},
keywords = {Transformers, Deep Reinforcement Learning, Autonomous Navigation, Image Representations},
location = {
},
series = {SETN '24}
}
This repository contains code from various sources: