Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imitation Learning #973

Open
wants to merge 70 commits into
base: i210_dev
Choose a base branch
from
Open

Imitation Learning #973

wants to merge 70 commits into from

Conversation

akashvelu
Copy link
Contributor

Pull request information

  • Status: ? ready to merge
  • Kind of changes: adds imitation learning features
  • Related PR or issue: None

Description

? (general description)

akashvelu and others added 30 commits March 26, 2020 10:11
make_policy_optimizer=choose_policy_optimizer,
validate_config=validate_config,
after_optimizer_step=update_kl,
after_train_result=warn_about_bad_reward_scales)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing blank line at end of file

Comment on lines 27 to 29
self.action_network = action_network # neural network which specifies action to take
self.multiagent = multiagent # whether env is multiagent or singleagent
self.veh_id = veh_id # vehicle id that controller is controlling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments not needed, should be in doc-string

Comment on lines 54 to 58
try:
rl_ids = env.get_sorted_rl_ids()
except:
print("Error caught: no get_sorted_rl_ids function, using get_rl_ids instead")
rl_ids = env.k.vehicle.get_rl_ids()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use has_attr to do this instead of a try except

"""
# observation is a dictionary for multiagent envs, list for singleagent envs
if self.multiagent:
observation = env.get_state()[self.veh_id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you're on a no control edge and your ID isn't in the state?

Comment on lines 6 to 7
from flow.controllers.imitation_learning.utils_tensorflow import *
from flow.controllers.imitation_learning.keras_utils import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import * not recommended; loading files you don't need?

# load network if specified, or construct network
if load_model:
self.load_network(load_path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no space between if ad else

Comment on lines 122 to 123
if len(observation.shape)<=1:
observation = observation[None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

summary = tf.Summary(value=[tf.Summary.Value(tag="Variance norm", simple_value=variance_norm), ])
self.writer.add_summary(summary, global_step=self.action_steps)

cov_matrix = np.diag(var[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why var[0]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number means you should write what the expected dimension of this object is. This makes it so at least the magic number is understandable

Comment on lines 234 to 246
# build layers for policy
for i in range(num_layers):
size = self.model.layers[i + 1].output.shape[1].value
activation = tf.keras.activations.serialize(self.model.layers[i + 1].activation)
curr_layer = tf.keras.layers.Dense(size, activation=activation, name="policy_hidden_layer_{}".format(i + 1))(curr_layer)
output_layer_policy = tf.keras.layers.Dense(self.model.output.shape[1].value, activation=None, name="policy_output_layer")(curr_layer)

# build layers for value function
curr_layer = input
for i in range(num_layers):
size = self.fcnet_hiddens[i]
curr_layer = tf.keras.layers.Dense(size, activation="tanh", name="vf_hidden_layer_{}".format(i+1))(curr_layer)
output_layer_vf = tf.keras.layers.Dense(1, activation=None, name="vf_output_layer")(curr_layer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watch out; you're implicitly assuming that vf_share_layers is never true. This should be warned about somewhere

"""

env_name = config['env']
# agent_cls = get_agent_class(config['env_config']['run'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, remove

Comment on lines 111 to 112
def compare_weights(ppo_model, imitation_path):
imitation_model = tf.keras.models.load_model(imitation_path, custom_objects={'nll_loss': negative_log_likelihood_loss(0.5)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great call to put this in

(outputs, state)
Tuple, first element is policy output, second element state
"""
# print(self.base_model.get_weights())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

Comment on lines 96 to 98
def forward(self, input_dict, state, seq_lens):
"""
Overrides parent class's method. Used to pass a input through model and get policy/vf output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to override this function?

Comment on lines 7 to 10
""" Replay buffer class to store state, action, expert_action, reward, next_state, terminal tuples"""

def __init__(self, max_size=100000):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc_string

action_dim = (1,)[0]

sess = create_tf_session()
action_network = ImitatingNetwork(sess, action_dim, obs_dim, None, None, None, None, load_existing=True, load_path='/Users/akashvelu/Documents/models2/')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this file is for; seems to be the equivalent of a unit test? If so, maybe make it a unit test?

Comment on lines 1 to 4
import os
import time
import numpy as np
from flow.controllers.imitation_learning.trainer import Trainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header to make clear what this file is for

Comment on lines 1 to 2
from flow.controllers.imitation_learning.run import *
from examples.train import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this file should probably not be in the controllers folder

Comment on lines 1 to 2
import time
from collections import OrderedDict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think a lot of these files should not be in this folder.

Comment on lines 1 to 2
import numpy as np
import tensorflow as tf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this file could be used to simplify some of your other files?

Comment on lines 88 to 89
# TODO(akashvelu): remove this
# print("NEW CONFIGGG: ", config['env_config']['run'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

Comment on lines 180 to 181
agent.import_model('/Users/akashvelu/Desktop/combined_test3/ppo_model.h5', 'av')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

Copy link
Member

@eugenevinitsky eugenevinitsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good; main things are minor nits and that we need to figure out the right folder to place this / integrate the imitation into train.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants