Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imitation learning with dagger #906

Open
wants to merge 390 commits into
base: master
Choose a base branch
from
Open

Imitation learning with dagger #906

wants to merge 390 commits into from

Conversation

akashvelu
Copy link
Contributor

Pull request information

  • Status: ready to merge
  • Kind of changes: new feature
  • Related PR or issue: ? (optional)

Description

Adds functionality to do imitation learning (with DAgger), to train a model to imitate an expert.

Comment on lines 1 to 12
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$" />
<orderEntry type="jdk" jdkName="Python 3.6 (flow)" jdkType="Python SDK" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="PyDocumentationSettings">
<option name="format" value="PLAIN" />
<option name="myDocStringFormat" value="Plain" />
</component>
</module>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please remove this file.

Comment on lines 1 to 5
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to commit this file?

Comment on lines 1 to 8
"""Multi-agent I-210 example.
Trains a non-constant number of agents, all sharing the same policy, on the
highway with ramps network.
"""
import os
import numpy as np

from ray.tune.registry import register_env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems identical to existing code?

"""
# Implementation in Tensorflow

def __init__(self, veh_id, action_network, multiagent, car_following_params=None, time_delay=0.0, noise=0, fail_safe=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring so we can know what action_network is.

Comment on lines 27 to 28
with tf.variable_scope(policy_scope, reuse=tf.AUTO_REUSE):
self.build_network()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need an AUTO_REUSE here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put an AUTO_REUSE here so that the same variables will be reused when the graph is rerun (so copies of the variables (weights/biases) don't get recreated)

self.action_predictions = pred_action
print("TYPE: ", type(self.obs_placeholder))

if self.inject_noise == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: conventionally you don't need to check a bool like this

Defines input, output, and training placeholders for neural net
"""
self.obs_placeholder = tf.placeholder(shape=[None, self.obs_dim], name="obs", dtype=tf.float32)
self.action_placeholder = tf.placeholder(shape=[None, self.action_dim], name="action", dtype=tf.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for stochastic algorithms, they are parametrized by a mean and standard deviation of a gaussian that you sample from. It'd be cool to add this as an option here so we can use PPO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This current implementation can be used for deterministic algorithms like DDPG and TD3 which is great

Comment on lines 80 to 81
if len(observation.shape)<=1:
observation = observation[None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good check!

# network expects an array of arrays (matrix); if single observation (no batch), convert to array of arrays
if len(observation.shape)<=1:
observation = observation[None]
ret_val = self.sess.run([self.action_predictions], feed_dict={self.obs_placeholder: observation})[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make it clear here that this is returning 1 accel and will not operate correctly if you pass a batch

brentgryffindor and others added 15 commits May 25, 2020 16:58
* deleting unworking params from SumoChangeLaneParams

* deleted unworking params, sublane working in highway
:

* moved imports inside functions

* Apply suggestions from code review

* bug fixes

* bug fix

Co-authored-by: Aboudy Kreidieh <[email protected]>
* added function to kernel/vehicle to get number of not departed vehiles

* fixed over indentation of the docstring

* indentation edit

* pep8

Co-authored-by: AboudyKreidieh <[email protected]>
* changed _departed_ids, and _arrived_ids in the update function

* fixed bug in get_departed_ids and get_arrived_ids
liljonnystyle and others added 30 commits July 1, 2020 18:58
Time-Space Diagram greyed regions
Add accel penalty, stop penalty, mpg reward, and ability to compute reward for any vehicles upstream of you (i.e. make you less greedy and more social)
* New energy class to inventory multiple energy models

Co-authored-by: Joy Carpio <[email protected]>
* Add time-space diagram plotting to experiment.py
* prereq dict added to query

* prereq checking mechanism implemented, not tested yet

* prereq checking tested

* change to more flexible filter handling

* make safety_rate and safety_max_value floats

* ignore nulls in fact_top_scores

* fix typo

* remove unneeded import

* replace uneccessary use of list to set

* add queries to pre-bin histogram data

* fix the serialization issue with set, convert to list before write as json

* fix query

* fix query

* fixed query bug

Co-authored-by: liljonnystyle <[email protected]>
* update tacoma power demand query, meters/Joules -> mpg conversion
* fix some implementation errors in energy models

* pull i210_dev and fix flake8
* implement HighwayNetwork for Time-Space Diagrams (#979)

* fixed h-baselines bug (#982)

* Replicated changes in 867. Done bug (#980)

* Aimsun changes minus reset

* removed crash attribute

* tensorflow 1.15.2

* merge custom output and failsafes to master (#981)

* add write_to_csv() function to master

* include pipeline README.md

* add data pipeline __init__

* add experiment.py changes

* add write_to_csv() function to master

* change warning print to ValueError message

* update to new update_accel methods

* add display_warnings boolean

* add get_next_speed() function to base vehicle class

* revert addition of get_next_speed

* merge custom output and failsafes to master

* add write_to_csv() function to master

* add display_warnings boolean

* add get_next_speed() function to base vehicle class

* revert addition of get_next_speed

* revert change to get_feasible_action call signature

* change print syntax to be python3.5 compliant

* add tests for new failsafe features

* smooth default to True

* rearrange raise exception for test coverage

* moved simulation logging to the simulation kernel (#991)

* add 210 edgestarts for backwards compatibility (#985)

* fastforward PR 989

* fix typo

* Requirements update (#963)

* updated requirements.txt and environment.yml

* Visualizer tests fixes

* remove .func

* move all miles_per_* rewards to instantaneous_mpg

* update reward fns to new get_accel() method

* made tests faster

* some fixes to utils

* change the column order, modify the pipeline to use SUMO emission file

* write metadata to csv

* change apply_acceleration smoothness setting

* make save_csv return the file paths

Co-authored-by: AboudyKreidieh <[email protected]>
Co-authored-by: liljonnystyle <[email protected]>
Co-authored-by: Kathy Jang <[email protected]>
Co-authored-by: Nathan Lichtlé <[email protected]>
Co-authored-by: akashvelu <[email protected]>
Co-authored-by: Brent Zhao <[email protected]>
* refactor tsd to allow for axes offsets

* update time-space plotter unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants