-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test ReuseParams with different variable names #448
base: master
Are you sure you want to change the base?
Conversation
So, we basically have outlined a possible solution for this in the discussion in #447. Basically, what we want (
So we need to change the logic of
We might need a separate way to get a list of variables of a layer. (Or do we really? What would be the use case?) Or we could also simply iterate through all existing variables (via the global collections) and check for the namespace if we need to filter. Btw, you might ask, why do we need Actually, when this is external code, note that it could create multiple variables, i.e. multiple calls to Are you going to try to implement this? (I'm somewhat short on time right now, not sure when I get to this.) |
Thanks for your detailed notes! Yes, I can certainly give it a try implementing this. If I run into issues/open questions I of course will let you know. Thanks! |
bf0adbd
to
561de92
Compare
returnn/tf/layers/base.py
Outdated
@@ -836,14 +836,18 @@ def add_param(self, param, custom_update=None, trainable=None, saveable=None, ax | |||
custom_update.set_on_var(param) | |||
if axes_split_info: | |||
tf_util.set_param_axes_split_info(param, axes_split_info) | |||
if self.reuse_params: | |||
name_scope_prefix = self.reuse_params.get_absolute_name_scope_prefix(base_layer=self, param=param) | |||
if getattr(param, "RETURNN_layer_map_name", None) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be _RETURNN_layer_map_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also, I think you don't need this at all, and this approach only leads to problems. (Edit Or maybe not. Maybe ignore my comment for now.)
returnn/tf/layers/base.py
Outdated
base_layer=base_layer, reuse_layer=self.reuse_layer, name=param_name, getter=getter, full_name=name, **kwargs) | ||
# The name of the variable created by custom_func might not match param_name. | ||
# We store it here for LayerBase.add_param. | ||
variable.RETURNN_layer_map_name = param_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested _RETURNN_layer_map_name
because this actually should be a map (dict). This cannot be the name directly, because this param might be used by multiple layers, under different names. So I thought about a mapping layer -> name. However, I'm now thinking, it might even be used by only one single layer under different names...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I see, I kind of thought that there are no unrelated calls to custom_getter
before the corresponding add_param
is called. That's of course not so nice (and probably breaks if e.g. a custom_func
calls get_variable
multiple times or so).
Maybe we can make it a layer -> list[name] dict (and then always use the last entry in add_param
?) I don't really know a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then always use the last entry ...
Such heuristics are never nice. They mostly work, but in some rare cases, or when we do sth more crazy, they suddenly fail. And then this is very annoying to debug. (Unfortunately, we have a couple of such heuristics, which have worked for the initial use case, but then later on failed, and caused quite some extra work. "Technical debt" is the keyword for this...)
But I think we don't need that here.
We can know when we are in the direct get_variable
call from the layer. (Via var_creation_scope
.)
And in there, we have the original name, and we can get the param, and we can directly do self.params[name] = param
, and add_param
doesn't need to do anything anymore.
And when var_creation_scope
does not create a custom getter, this implies there will be no special handling. Then add_param
can catch this.
561de92
to
caf7313
Compare
The test case from #447, currently failing.