New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Imitation Learning #973

Open

akashvelu wants to merge 70 commits into i210_dev from akash-dagger

Contributor

akashvelu commented Jun 18, 2020

Pull request information

Status: ? ready to merge
Kind of changes: adds imitation learning features
Related PR or issue: None

Description

? (general description)

akashvelu and others added 30 commits

March 26, 2020 10:11


          dagger initial implementation

50b017a


          Changed environment to be single agent RL

1fc027d


          cleaned up code

d01aeb5


          check for None or Nan action before adding to buffer

920dd73


          Fixed dimension bug

630a100


          Added testing for imitation learning with dagger, more code cleanup

722f439


          Code cleanup, added testing/eval for imitation learning

e9d7634


          Moved file to within controller class

83a7887


          Renamed directory, code cleanup, evaluation script

c668336


          Delete dagger.iml

f54eebc


          initial multiagent imitation learning implementation

eb7b3a2


          Merge branch 'akash-dagger' of https://github.com/flow-project/flow i…

4f1b2ad

…nto akash-dagger


          Merge branch 'i210_dev' of https://github.com/flow-project/flow into …

c36f010

…akash-dagger


          Added multiagent capabilities for imitation learning

cb4cae8


          Multiagent changes, added stochastic policies

23e2ba3


          Bug fixes to stochastic policies and singlagent with multiple RL vehi…

f924d9c

…cles


          Delete Untitled.ipynb

0b08c33


          Removed files

21ee5ce


          Merge branch 'akash-dagger' of https://github.com/flow-project/flow i…

75716d3

…nto akash-dagger


          minor cleanup

cffc33d


          Merge branch 'master' of https://github.com/flow-project/flow into ak…

6c0c590

…ash-dagger


          Pulled updated changes

61f9a3a


          Bug fixes for stochastic policies

b4f844f


          Merge branch 'i210_dev' of https://github.com/flow-project/flow into …

37f2c2e

…akash-dagger


          Ported to Keras, initial implementation of loading to RLLib

db0442b


          Merge branch 'i210_dev' of https://github.com/flow-project/flow into …

39ad373

…akash-dagger


          Bug fixes for starting training from imitation model

ed065b3


          Minor cleanup

cc0aa32


          Minor cleanup

3a2e135


          Removed usage of rllib.utils.freamwork

288a1cf

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/custom_ppo.py Outdated

+                  make_policy_optimizer=choose_policy_optimizer,
+                  validate_config=validate_config,
+                  after_optimizer_step=update_kl,
+                  after_train_result=warn_about_bad_reward_scales)

Member

eugenevinitsky Jun 25, 2020

missing blank line at end of file

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_controller.py Outdated

Comment on lines 27 to 29

+                      self.action_network = action_network  # neural network which specifies action to take
+                      self.multiagent = multiagent # whether env is multiagent or singleagent
+                      self.veh_id = veh_id # vehicle id that controller is controlling

Member

eugenevinitsky Jun 25, 2020

comments not needed, should be in doc-string

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_controller.py Outdated

Comment on lines 54 to 58

+                          try:
+                              rl_ids = env.get_sorted_rl_ids()
+                          except:
+                              print("Error caught: no get_sorted_rl_ids function, using get_rl_ids instead")
+                              rl_ids = env.k.vehicle.get_rl_ids()

Member

eugenevinitsky Jun 25, 2020

you can use has_attr to do this instead of a try except

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_controller.py Outdated

+                      """
+                      # observation is a dictionary for multiagent envs, list for singleagent envs
+                      if self.multiagent:
+                          observation = env.get_state()[self.veh_id]

Member

eugenevinitsky Jun 25, 2020

what if you're on a no control edge and your ID isn't in the state?

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

Comment on lines 6 to 7

		from flow.controllers.imitation_learning.utils_tensorflow import *
		from flow.controllers.imitation_learning.keras_utils import *

Member

eugenevinitsky Jun 25, 2020

import * not recommended; loading files you don't need?

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

+                      # load network if specified, or construct network
+                      if load_model:
+                          self.load_network(load_path)

Member

eugenevinitsky Jun 25, 2020

nit: no space between if ad else

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

Comment on lines 122 to 123

		if len(observation.shape)<=1:
		observation = observation[None]

Member

eugenevinitsky Jun 25, 2020

nice!

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

+                          summary = tf.Summary(value=[tf.Summary.Value(tag="Variance norm", simple_value=variance_norm), ])
+                          self.writer.add_summary(summary, global_step=self.action_steps)
+                          cov_matrix = np.diag(var[0])

Member

eugenevinitsky Jun 25, 2020

why var[0]?

Member

eugenevinitsky Jun 25, 2020

Magic number means you should write what the expected dimension of this object is. This makes it so at least the magic number is understandable

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

Comment on lines 234 to 246

+                      # build layers for policy
+                      for i in range(num_layers):
+                          size = self.model.layers[i + 1].output.shape[1].value
+                          activation = tf.keras.activations.serialize(self.model.layers[i + 1].activation)
+                          curr_layer = tf.keras.layers.Dense(size, activation=activation, name="policy_hidden_layer_{}".format(i + 1))(curr_layer)
+                      output_layer_policy = tf.keras.layers.Dense(self.model.output.shape[1].value, activation=None, name="policy_output_layer")(curr_layer)
+                      # build layers for value function
+                      curr_layer = input
+                      for i in range(num_layers):
+                          size = self.fcnet_hiddens[i]
+                          curr_layer = tf.keras.layers.Dense(size, activation="tanh", name="vf_hidden_layer_{}".format(i+1))(curr_layer)
+                      output_layer_vf = tf.keras.layers.Dense(1, activation=None, name="vf_output_layer")(curr_layer)

Member

eugenevinitsky Jun 25, 2020

watch out; you're implicitly assuming that vf_share_layers is never true. This should be warned about somewhere

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitation_trainer.py Outdated

+                      """
+                      env_name = config['env']
+                      # agent_cls = get_agent_class(config['env_config']['run'])

Member

eugenevinitsky Jun 25, 2020

nit, remove

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/keras_utils.py Outdated

Comment on lines 111 to 112

		def compare_weights(ppo_model, imitation_path):
		imitation_model = tf.keras.models.load_model(imitation_path, custom_objects={'nll_loss': negative_log_likelihood_loss(0.5)})

Member

eugenevinitsky Jun 25, 2020

great call to put this in

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/ppo_model.py Outdated

+                      (outputs, state)
+                          Tuple, first element is policy output, second element state
+                      """
+                      # print(self.base_model.get_weights())

Member

eugenevinitsky Jun 25, 2020

nit: remove

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/ppo_model.py Outdated

Comment on lines 96 to 98

+                  def forward(self, input_dict, state, seq_lens):
+                      """
+                      Overrides parent class's method. Used to pass a input through model and get policy/vf output.

Member

eugenevinitsky Jun 25, 2020

is it necessary to override this function?

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/replay_buffer.py Outdated

Comment on lines 7 to 10

		""" Replay buffer class to store state, action, expert_action, reward, next_state, terminal tuples"""

		def __init__(self, max_size=100000):

Member

eugenevinitsky Jun 25, 2020

doc_string

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/replay_script.py Outdated

+                  action_dim = (1,)[0]
+                  sess = create_tf_session()
+                  action_network = ImitatingNetwork(sess, action_dim, obs_dim, None, None, None, None, load_existing=True, load_path='/Users/akashvelu/Documents/models2/')

Member

eugenevinitsky Jun 25, 2020

not sure what this file is for; seems to be the equivalent of a unit test? If so, maybe make it a unit test?

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/run.py Outdated

Comment on lines 1 to 4

+              import os
+              import time
+              import numpy as np
+              from flow.controllers.imitation_learning.trainer import Trainer

Member

eugenevinitsky Jun 25, 2020

header to make clear what this file is for

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/train_with_imitation.py Outdated

Comment on lines 1 to 2

		from flow.controllers.imitation_learning.run import *
		from examples.train import *

Member

eugenevinitsky Jun 25, 2020

nit: this file should probably not be in the controllers folder

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/trainer.py Outdated

Comment on lines 1 to 2

		import time
		from collections import OrderedDict

Member

eugenevinitsky Jun 25, 2020

nit: I think a lot of these files should not be in this folder.

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/utils_tensorflow.py Outdated

Comment on lines 1 to 2

		import numpy as np
		import tensorflow as tf

Member

eugenevinitsky Jun 25, 2020

seems like this file could be used to simplify some of your other files?

eugenevinitsky reviewed

View reviewed changes

flow/visualize/visualizer_rllib.py Outdated

Comment on lines 88 to 89

		# TODO(akashvelu): remove this
		# print("NEW CONFIGGG: ", config['env_config']['run'])

Member

eugenevinitsky Jun 25, 2020

nit: remove

eugenevinitsky reviewed

View reviewed changes

flow/visualize/visualizer_rllib.py Outdated

Comment on lines 180 to 181

		agent.import_model('/Users/akashvelu/Desktop/combined_test3/ppo_model.h5', 'av')

Member

eugenevinitsky Jun 25, 2020

nit: remove

eugenevinitsky requested changes

View reviewed changes

Member

eugenevinitsky left a comment

Really good; main things are minor nits and that we need to figure out the right folder to place this / integrate the imitation into train.py

akashvelu added 8 commits

June 25, 2020 16:20


          Code cleanup

9dd65c8


          test files synced to i210_dev

739c2ca


          Cleanup code

ddce32e


          Handle case with vehicle in no-control edge

4e6302e


          Add learning rate as a parameter, override import_from_h5 method usin…

29eb5a0

…g setattr


          Move imitation to algorithms folder

6c68800


          Merge i210 into branch

d73612f


          Revert model architecture and # rollouts to previous defaults

6aca7c5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

eugenevinitsky eugenevinitsky requested changes

AboudyKreidieh Awaiting requested review from AboudyKreidieh AboudyKreidieh is a code owner

cathywu Awaiting requested review from cathywu cathywu is a code owner

kanaadp Awaiting requested review from kanaadp kanaadp is a code owner

Labels

None yet