diff --git a/learning/imitation/iil-dagger/README.md b/learning/imitation/iil-dagger/README.md index ccbffdaf..31797883 100644 --- a/learning/imitation/iil-dagger/README.md +++ b/learning/imitation/iil-dagger/README.md @@ -1,39 +1,119 @@ -# Imitation Learning using Dataset Aggregation +# Imitation Learning ## Introduction -In this baseline we train a small squeezenet model on expert trajectories to simply clone the behaviour of the expert. -Using only the expert trajectories would result in a model unable to recover from non-optimal positions ,Hence we use a technique called DAgger a dataset aggregation technique with mixed policies between expert and model. -This technique of random mixing would help the model learn a more general trajectory than the optimal one provided by the expert alone. -## Quickstart -1) Clone this [repo](https://github.com/duckietown/gym-duckietown): +In this baseline we train a small squeezenet model on expert trajectories to simply clone the behavior of the expert. +Using only the expert trajectories would result in a model unable to recover from non-optimal positions; Instead, we use a technique called DAgger: a dataset aggregation technique with mixed policies between expert and model. - $ git clone https://github.com/duckietown/gym-duckietown.git +## Quick start -2) Change into the directory: +Use the jupyter notebook notebook.ipynb to quickly start training and testing the imitation learning Dagger. - $ cd gym-duckietown +## Detailed Steps -3) Install the package: +### Clone the repo - $ pip3 install -e . +Clone this [repo](https://github.com/duckietown/gym-duckietown): -4) Start training: +$ git clone https://github.com/duckietown/gym-duckietown.git - $ python -m learning.imitation.iil-dagger.train +$ cd gym-duckietown -5) Test the trained agent specifying the saved model: +### Installing Packages - $ python -m learning.imitation.pytorch-v2.test --model-path ![path] +$ pip3 install -e . +## Training -## Acknowledgement -- We started from previous work done by Manfred Díaz as a boilerplate and we would like to thank him for his full support with code and answering our questions +$ python -m learning.imitation.iil-dagger.train + +### Arguments + +* --episode: number of episodes +* --horizon: number of steps per episode +* --learning-rate: index of learning rate from list [1e-1, 1e-2, 1e-3, 1e-4, 1e-5] +* --decay: mixing decay between expert and learner [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95] +* --save-path: directory used to save output model +* --map-name: name of the map used during the training +* --num-outputs: specify number of outputs from the learner model 1 to predict only angular velocity with fixed speed and 2 to predict both of them +* --domain-rand: flag to enable domain randomization to rbe able to transfer trained model to real world. +* --randomize-map: randomize training maps on reset + +## Testing + +$ python -m learning.imitation.iil-dagger.test + +### Arguments + +* --model-path: path of the model to be tested +* --episode: number of episodes +* --horizon: number of steps per episode + +## Submitting +Use [Pytorch RL Template](https://github.com/duckietown/challenge-aido_LF-template-pytorch) and replace model with the model trained in model/squeezenet.py +and use the following code snippet to convert speed and angular velocity to pwm left and right. +``` Python +velocity, omega = self.compute_action(self.current_image) + +# assuming same motor constants k for both motors +k_r = 27.0 +k_l = 27.0 +gain = 1.0 +trim = 0.0 + +# adjusting k by gain and trim +k_r_inv = (gain + trim) / k_r +k_l_inv = (gain - trim) / k_l +wheel_dist = 0.102 +radius=0.0318 + +omega_r = (velocity + 0.5 * omega * wheel_dist) / radius +omega_l = (velocity - 0.5 * omega * wheel_dist) / radius + +# conversion from motor rotation rate to duty cycle +u_r = omega_r * k_r_inv +u_l = omega_l * k_l_inv + +# limiting output to limit, which is 1.0 for the duckiebot +pwm_right = max(min(u_r, 1), -1) +pwm_left = max(min(u_l, 1), -1) + +``` + +## Acknowledgment + +* We started from previous work done by Manfred Díaz as a boilerplate, and we would like to thank him for his full support with code and answering our questions. ## Authors -- [Mostafa ElAraby ](https://www.mostafaelaraby.com/) - - [Linkedin](https://linkedin.com/in/mostafaelaraby) -- Ramon Emiliani - - [Linkedin](https://www.linkedin.com/in/ramonemiliani) + +* [Mostafa ElAraby ](https://www.mostafaelaraby.com/) + + [Linkedin](https://linkedin.com/in/mostafaelaraby) +* Ramon Emiliani + + [Linkedin](https://www.linkedin.com/in/ramonemiliani) + ## References -- Implementation idea and code skeleton based on Diaz Cabrera, Manfred Ramon (2018)Interactive and Uncertainty-aware Imitation Learning: Theory and Applications. Masters thesis, Concordia University. + +``` + +@phdthesis{diaz2018interactive, + title={Interactive and Uncertainty-aware Imitation Learning: Theory and Applications}, + author={Diaz Cabrera, Manfred Ramon}, + year={2018}, + school={Concordia University} +} + +@inproceedings{ross2011reduction, + title={A reduction of imitation learning and structured prediction to no-regret online learning}, + author={Ross, St{\'e}phane and Gordon, Geoffrey and Bagnell, Drew}, + booktitle={Proceedings of the fourteenth international conference on artificial intelligence and statistics}, + pages={627--635}, + year={2011} +} + +@article{iandola2016squeezenet, + title={SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size}, + author={Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt}, + journal={arXiv preprint arXiv:1602.07360}, + year={2016} +} +``` diff --git a/learning/imitation/iil-dagger/learner/neural_network_policy.py b/learning/imitation/iil-dagger/learner/neural_network_policy.py index 9f1dabf4..634299b9 100644 --- a/learning/imitation/iil-dagger/learner/neural_network_policy.py +++ b/learning/imitation/iil-dagger/learner/neural_network_policy.py @@ -148,7 +148,7 @@ def _transform(self, observations, expert_actions): ] ) - observations = [compose_obs(observation).numpy() for observation in observations] + observations = [compose_obs(observation).cpu().numpy() for observation in observations] try: # scaling velocity to become in 0-1 range which is multiplied by max speed to get actual vel # also scaling steering angle to become in range -1 to 1 to make it easier to regress @@ -158,7 +158,7 @@ def _transform(self, observations, expert_actions): ] except: pass - expert_actions = [torch.tensor(expert_action).numpy() for expert_action in expert_actions] + expert_actions = [torch.tensor(expert_action).cpu().numpy() for expert_action in expert_actions] return observations, expert_actions diff --git a/learning/imitation/iil-dagger/model/squeezenet.py b/learning/imitation/iil-dagger/model/squeezenet.py index 6ff957bb..4ffc519d 100644 --- a/learning/imitation/iil-dagger/model/squeezenet.py +++ b/learning/imitation/iil-dagger/model/squeezenet.py @@ -38,7 +38,7 @@ def __init__(self, num_outputs=2, max_velocity=0.7, max_steering=np.pi / 2): self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu") self.model = models.squeezenet1_1() self.num_outputs = num_outputs - self.max_velocity_tensor = torch.tensor(max_velocity).to(self._device) + self.max_velocity_tensor = torch.tensor([max_velocity]).to(self._device) self.max_steering = max_steering # using a subset of full squeezenet for input image features @@ -117,12 +117,12 @@ def predict(self, *args): output = self.model(images) if self.num_outputs == 1: omega = output - v_tensor = self.max_velocity_tensor.clone() + v_tensor = self.max_velocity_tensor.clone().unsqueeze(1) else: v_tensor = output[:, 0].unsqueeze(1) omega = output[:, 1].unsqueeze(1) * self.max_steering output = torch.cat((v_tensor, omega), 1).squeeze().detach() - return output + return output.cpu().numpy() if __name__ == "__main__": diff --git a/learning/imitation/iil-dagger/notebook.ipynb b/learning/imitation/iil-dagger/notebook.ipynb new file mode 100644 index 00000000..a0e54afa --- /dev/null +++ b/learning/imitation/iil-dagger/notebook.ipynb @@ -0,0 +1,254 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Dagger.ipynb", + "provenance": [], + "collapsed_sections": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duckietown/gym-duckietown/tree/daffy/learning/imitation/iil-dagger/notebook.ipynb)" + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Obe3xVH5N4f6" + }, + "source": [ + "## Preparing Code and dependencies" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y7B4akyzOBJQ" + }, + "source": [ + "branch = \"daffy\" #@param ['master', 'daffy']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "88TrRDRLOXLF", + "outputId": "64bc658f-676e-4c42-f168-9aa01d26dd74", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "! git clone --branch {branch} https://github.com/mostafaelaraby/gym-duckietown" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "IdLCXQT1Oy9U", + "outputId": "26d707af-8bc6-4435-f2c8-acb29f281236", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "!pip install -e gym-duckietown/." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "XTGEeqIrO5HZ" + }, + "source": [ + "import os \n", + "os.chdir('gym-duckietown')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PZCERj2COqJH" + }, + "source": [ + "### Virtual Display Setup" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Gd7rzmy5OlyJ", + "outputId": "32eb5436-b959-428c-860b-ba34b7954c37", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "!apt-get install python-opengl -y\n", + "!apt install xvfb -y\n", + "!apt-get install x11-utils" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_vOFpsrQOpv7", + "outputId": "e53357c3-896e-4cbd-c29d-975b38d211e3", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "!pip install pyvirtualdisplay\n", + "!pip install piglet" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "T2Ajwp-HPDKY" + }, + "source": [ + "from pyvirtualdisplay import Display\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "\n", + "# This code creates a virtual display to draw game images on. \n", + "# If you are running locally, just ignore it\n", + "import os\n", + "def create_display():\n", + " display = Display(visible=0, size=(1400, 900))\n", + " display.start()\n", + " if type(os.environ.get(\"DISPLAY\")) is not str or len(os.environ.get(\"DISPLAY\"))==0:\n", + " !bash ../xvfb start\n", + " %env DISPLAY=:1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eiq9plyiO7LQ" + }, + "source": [ + "## Imitation Learning Dagger" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dx6qgGm8QMMZ" + }, + "source": [ + "## Training" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xyHbCPyURVG3" + }, + "source": [ + "learning_rates = ['1e-1', '1e-2', '1e-3', '1e-4', '1e-5']\n", + "mixing_decays = ['0.5', '0.6', '0.7', '0.8', '0.85', '0.9', '0.95']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "augg6SvSQOR_" + }, + "source": [ + "save_path = \"imitation_baseline\" #@param {type: \"string\"}\n", + "episode = 10 # @param {type: \"integer\"}\n", + "horizon = 128 # @param {type: \"integer\"}\n", + "learning_rate = \"1e-3\" # @param ['1e-1', '1e-2', '1e-3', '1e-4', '1e-5']\n", + "decay = \"0.7\" # @param ['0.5', '0.6', '0.7', '0.8', '0.85', '0.9', '0.95']\n", + "map_name = \"loop_empty\" #@param {type: \"string\"}\n", + "# number of outputs can be 2 to predict omega and velocity\n", + "# or 1 to fix velocity and predict only omega\n", + "num_outputs = 2 # @param {type: \"integer\"} \n", + "learning_rate = learning_rates.index(learning_rate)\n", + "decay = mixing_decays.index(decay)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "DmXN_kBVO9xX", + "outputId": "05ae09c7-fe9c-4697-8a2a-8413989c12f9", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "create_display()\n", + "!python -m learning.imitation.iil-dagger.train --save-path {save_path} --episode {episode} --horizon {horizon} --learning-rate {learning_rate} --decay {decay} --map-name {map_name} --num-outputs {num_outputs}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yHQCqfoPP9Gj" + }, + "source": [ + "## Testing Imitation Model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ewy5rVJxmGk-" + }, + "source": [ + "map_name = \"loop_empty\" #@param {type: \"string\"}\n", + "episode = 10 # @param {type: \"integer\"}\n", + "horizon = 128 # @param {type: \"integer\"}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "DWYXt0mQP_CW" + }, + "source": [ + "create_display()\n", + "!python -m learning.imitation.iil-dagger.test --model-path {os.path.join(save_path, \"model.pt\")} --num-outputs {num_outputs} --map-name {map_name} --episode {episode} --horizon {horizon}" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/learning/imitation/iil-dagger/train.py b/learning/imitation/iil-dagger/train.py index 4b707d70..b7f023fd 100644 --- a/learning/imitation/iil-dagger/train.py +++ b/learning/imitation/iil-dagger/train.py @@ -13,7 +13,10 @@ def launch_env(map_name, randomize_maps_on_reset=False, domain_rand=False): environment = DuckietownEnv( - domain_rand=domain_rand, max_steps=math.inf, map_name=map_name, randomize_maps_on_reset=False + domain_rand=domain_rand, + max_steps=math.inf, + map_name=map_name, + randomize_maps_on_reset=False, ) return environment @@ -25,12 +28,14 @@ def teacher(env, max_velocity): def process_args(): parser = argparse.ArgumentParser() parser.add_argument("--episode", "-i", default=10, type=int) - parser.add_argument("--horizon", "-r", default=64, type=int) + parser.add_argument("--horizon", "-r", default=128, type=int) parser.add_argument("--learning-rate", "-l", default=2, type=int) parser.add_argument("--decay", "-d", default=2, type=int) parser.add_argument("--save-path", "-s", default="iil_baseline", type=str) parser.add_argument("--map-name", "-m", default="loop_empty", type=str) parser.add_argument("--num-outputs", "-n", default=2, type=int) + parser.add_argument("--domain-rand", "-dr", action="store_true") + parser.add_argument("--randomize-map", "-rm", action="store_true") return parser @@ -50,13 +55,19 @@ def process_args(): if not (os.path.isdir(config.save_path)): os.makedirs(config.save_path) # launching environment - environment = launch_env(config.map_name) + environment = launch_env( + config.map_name, + domain_rand=config.domain_rand, + randomize_maps_on_reset=config.randomize_map, + ) task_horizon = config.horizon task_episode = config.episode model = Squeezenet(num_outputs=config.num_outputs, max_velocity=max_velocity) - policy_optimizer = torch.optim.Adam(model.parameters(), lr=learning_rates[config.learning_rate]) + policy_optimizer = torch.optim.Adam( + model.parameters(), lr=learning_rates[config.learning_rate] + ) dataset = MemoryMapDataset(25000, (3, *input_shape), (2,), config.save_path) learner = NeuralNetworkPolicy(