Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test ReuseParams with different variable names #448

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Zettelkasten
Copy link
Member

The test case from #447, currently failing.

@albertz
Copy link
Member

albertz commented Feb 22, 2021

So, we basically have outlined a possible solution for this in the discussion in #447.

Basically, what we want (self being the layer):

  • Let's assume you have param = self.add_param(tf.get_variable(name)) (or sth like that).
  • param is what you would directly use in this code, for whatever purpose, e.g. matmul in LinearLayer.
  • self.params[name] is param should be true after this call. This implies:
    • self.params includes potential non-owned vars (shared/reuse) and/or potential transformations (transpose, weight noise, weight norm, etc) (i.e. not the original variable, but just some tensor).
  • Weight noise, weight norm and other transformations, and also L2 losses etc, should only be applied if the tensor or variable is owned by the layer (determined by looking at the namespace).

So we need to change the logic of add_param to implement this logic for self.params. The difficulty is to infer the name. This should be doable by extending var_creation_scope:

  • In case there is no reuse/sharing or transformation, no custom getter, let's not change anything. Then add_param would also keep the code for this simple case.
  • With a custom getter, we can grab name, and store it somewhere.

We might need a separate way to get a list of variables of a layer. (Or do we really? What would be the use case?) Or we could also simply iterate through all existing variables (via the global collections) and check for the namespace if we need to filter.

Btw, you might ask, why do we need self.add_param(tf.get_variable(name)), and why not simply do sth like self.create_param(name, ...) instead, and not use tf.get_variable. This is because we want to be able to use other TF code, which makes use of tf.get_variable. E.g. when we use the original LSTM by TF, or other things. We want to support that. (Btw, if this was unclear, you might also extend the documentation in this PR for that.)

Actually, when this is external code, note that it could create multiple variables, i.e. multiple calls to get_variable. We should be careful to handle that correctly. Maybe the whole logic of adding it to self.params should be in var_creation_scope. Or we can keep it in add_param but call to that in var_creation_scope.

Are you going to try to implement this? (I'm somewhat short on time right now, not sure when I get to this.)

@Zettelkasten
Copy link
Member Author

Are you going to try to implement this? (I'm somewhat short on time right now, not sure when I get to this.)

Thanks for your detailed notes! Yes, I can certainly give it a try implementing this. If I run into issues/open questions I of course will let you know. Thanks!

@Zettelkasten Zettelkasten force-pushed the frithjof-reuseparams-different-names branch from bf0adbd to 561de92 Compare February 23, 2021 15:07
@@ -836,14 +836,18 @@ def add_param(self, param, custom_update=None, trainable=None, saveable=None, ax
custom_update.set_on_var(param)
if axes_split_info:
tf_util.set_param_axes_split_info(param, axes_split_info)
if self.reuse_params:
name_scope_prefix = self.reuse_params.get_absolute_name_scope_prefix(base_layer=self, param=param)
if getattr(param, "RETURNN_layer_map_name", None) is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be _RETURNN_layer_map_name.

Copy link
Member

@albertz albertz Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, I think you don't need this at all, and this approach only leads to problems. (Edit Or maybe not. Maybe ignore my comment for now.)

base_layer=base_layer, reuse_layer=self.reuse_layer, name=param_name, getter=getter, full_name=name, **kwargs)
# The name of the variable created by custom_func might not match param_name.
# We store it here for LayerBase.add_param.
variable.RETURNN_layer_map_name = param_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested _RETURNN_layer_map_name because this actually should be a map (dict). This cannot be the name directly, because this param might be used by multiple layers, under different names. So I thought about a mapping layer -> name. However, I'm now thinking, it might even be used by only one single layer under different names...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I see, I kind of thought that there are no unrelated calls to custom_getter before the corresponding add_param is called. That's of course not so nice (and probably breaks if e.g. a custom_func calls get_variable multiple times or so).
Maybe we can make it a layer -> list[name] dict (and then always use the last entry in add_param?) I don't really know a good solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then always use the last entry ...

Such heuristics are never nice. They mostly work, but in some rare cases, or when we do sth more crazy, they suddenly fail. And then this is very annoying to debug. (Unfortunately, we have a couple of such heuristics, which have worked for the initial use case, but then later on failed, and caused quite some extra work. "Technical debt" is the keyword for this...)

But I think we don't need that here.

We can know when we are in the direct get_variable call from the layer. (Via var_creation_scope.)
And in there, we have the original name, and we can get the param, and we can directly do self.params[name] = param, and add_param doesn't need to do anything anymore.

And when var_creation_scope does not create a custom getter, this implies there will be no special handling. Then add_param can catch this.

@Zettelkasten Zettelkasten force-pushed the frithjof-reuseparams-different-names branch from 561de92 to caf7313 Compare February 23, 2021 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants