Replies: 9 comments 27 replies
-
I'm still thinking about the common development pipeline. I think during development, you probably don't want to use any commit, but just the current file tree. Just for testing whether your logic works. At least I cannot really think about a better way. Do you have any suggestions about this? I'm also thinking a bit more where (and how) we would store the cloned repos / checkouts. Go by default stores them in
I already mentioned, I don't like version numbers too much, because they don't really tell you anything (except if you already know the package). I think a date + commit ref is more helpful. (Btw, Go uses (Btw, technical detail: You would probably not store all Git objects multiple times for any such Go directly uses the home directory, there is a So, back to the development workflow: Maybe you can also have one checkout not pinned to any version? Sth like
Or sth like this. (Btw, technical detail: |
Beta Was this translation helpful? Give feedback.
-
Btw, one motivation of this concept is that RETURNN really should be more like the stable base core framework, having (mostly) only stable features, and most experimental things would go into such external repos, such that you can easily play around with them, modify them, etc. That should be all modular that mixing it in whatever way is not problematic. In case you think for some aspects this is currently not possible, or difficult, or not convenient enough (e.g. having some custom optimizer, having more custom training loop, custom learning rate, custom dataset, whatever really), then consider this a bug, and open an issue about it. We should make RETURNN in a way that really everything should be flexible. |
Beta Was this translation helpful? Give feedback.
-
Another thing: The current suggestion is for single Python files (mods). But maybe we should also allow to import Python dirs (packages)? |
Beta Was this translation helpful? Give feedback.
-
Having a repo with higher-level building blocks - mostly parts of network definitions I guess - would be nice, I agree. How would this be maintained? Would this be a place to share code for everyone or more like a "peer-reviewed" set of extensions to core RETURNN? I wasn't familiar with go imports until now, seems to be a nice idea. But the more obvious alternative would just be that the user clones (or pip installs) the shared repo/module. Which wouldn't require any new import mechanism. So the advantage of having such imports is that version control is built in, right?
Because we don't rely on people not force pushing tags? We usually work with tags, so this would have been the natural way for us. But anyway, in a shared repo would anyone add the tags he needs? This might become messy. |
Beta Was this translation helpful? Give feedback.
-
Do I understand it correctly that when I import a config all parameters of the imported config are applied to my current config? If yes, an option for partial imports (like
I would not enforce to extend the ref by a date. Copying the git hash is quite easy but adding the correct date imho just adds additional hassle. Also, in most cases it may be more relevant whether there are updates to the repo since this version than the date itself (which could be added as an info when running the config). The rest sounds good to me so far. |
Beta Was this translation helpful? Give feedback.
-
Another thought: Such a But I think I know a way to also support auto-completion. We can create a (virtual) package hierarchy, like |
Beta Was this translation helpful? Give feedback.
-
One problem (or maybe not a problem): When you import multiple modules, and they maybe have other dependencies, you get a graph. A graph, because multiple modules might have the same module as a dependency. Consider this example. In this example, you might have the same module used multiple times under different versions. In Go, it would only be used once, and esp use the highest version (of all referenced versions). In RETURNN, I think we should not do this. Every |
Beta Was this translation helpful? Give feedback.
-
Ok, there is some initial implementation now, in (Meta: I wonder, there is no such thing as to "resolve" this discussion like there is for issues. But maybe this can be considered as solved for now. Unless we want to discuss further about some specific details.) |
Beta Was this translation helpful? Give feedback.
-
Sorry for joining late. One remark, I think the default folder should better be .returnn instead of returnn, otherwise people might be confused where this folder comes from when not being familiar with this function, but still using this import when running some configs they just took from somewhere. If it starts with a dot you directly now that this is some caching/config or other program specific local data. I tried using it, but run into a crash, from which I am not sure if the path is wrong, or there is another mistake: ~/experiments/librispeech_tts/config/__init__.py in <module>
11 from returnn.import_ import import_
12
---> 13 ljspeech_config = import_("github.com/rwth-i6/returnn-experiments", "2020-TTS-LJSpeech/tacotron2_ljspeech.config", "20210302-01094be")
global ljspeech_config = undefined
global import_ = <function import_ at 0x7f9a0a82c400>
14 print(ljspeech_config.optimizer)
15
~/src/returnn/returnn/import_/import_.py in import_(repo='github.com/rwth-i6/returnn-experiments', path='2020-TTS-LJSpeech/tacotron2_ljspeech.config', version='20210302-01094be')
20 # `module_name` has the side effect that `import_module` below will just work.
21 mod_name = module_name(repo=repo, repo_path=repo_path, path=path, version=version)
---> 22 return importlib.import_module(mod_name)
global importlib.import_module = <function import_module at 0x7f9a2a55c950>
mod_name = 'returnn_import.github_com.rwth_i6.returnn_experiments.v20210302133012_01094bef2761.2020_TTS_LJSpeech.tacotron2_ljspeech_config'
/work/tools/asr/python/3.7.1/lib/python3.7/importlib/__init__.py in import_module(name='returnn_import.github_com.rwth_i6.returnn_experi...f2761.2020_TTS_LJSpeech.tacotron2_ljspeech_config', package=None)
125 break
126 level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)
global _bootstrap._gcd_import = <function _gcd_import at 0x7f9a2a69ee18>
name = 'returnn_import.github_com.rwth_i6.returnn_experiments.v20210302133012_01094bef2761.2020_TTS_LJSpeech.tacotron2_ljspeech_config'
level = 0
package = None
128
129
/work/tools/asr/python/3.7.1/lib/python3.7/importlib/_bootstrap.py in _gcd_import(name='returnn_import.github_com.rwth_i6.returnn_experi...f2761.2020_TTS_LJSpeech.tacotron2_ljspeech_config', package=None, level=
0)
/work/tools/asr/python/3.7.1/lib/python3.7/importlib/_bootstrap.py in _find_and_load(name='returnn_import.github_com.rwth_i6.returnn_experi...f2761.2020_TTS_LJSpeech.tacotron2_ljspeech_config', import_=<function
_gcd_import>)
/work/tools/asr/python/3.7.1/lib/python3.7/importlib/_bootstrap.py in _find_and_load_unlocked(name='returnn_import.github_com.rwth_i6.returnn_experi...f2761.2020_TTS_LJSpeech.tacotron2_ljspeech_config', import_=
<function _gcd_import>)
ModuleNotFoundError: No module named 'returnn_import.github_com.rwth_i6.returnn_experiments.v20210302133012_01094bef2761.2020_TTS_LJSpeech.tacotron2_ljspeech_config' The checkout worked correctly, i confirmed that the file exist locally. |
Beta Was this translation helpful? Give feedback.
-
The idea came up a couple of times that it would be nice to be able to import external files into your config.
The external files would be Python files and could provide building blocks (e.g. a subnetwork dict, or some generic function like
make_trafo_block
, or alsomake_librispeech_dataset
, or whatever). It should be very easy to share our work among each other.(Some parts in RETURNN maybe would need to be extended, to make that easier. But that would be a separate discussion. And I think even right now, a lot is already possible that way.)
The import would be from Git repos. I imagine a very simplified version of Go imports. You would specify sth like
import("github.com/rwth-i6/returnn-experiments", "2019-librispeech-system/attention/base2.bs18k.config")
.(Maybe see related Go code: Git fetch etc.)
(We also might allow imports by local filename, like
import("", "/u/zeyer/....")
during development, but this should really be temporarily. And maybe also always expect this must be a Git repo.)The import should enforce explicit versioning, so that an update can never break anything. The user would explicitly specify what version he wants. E.g. like
import("github.com/rwth-i6/returnn-experiments", "2019-librispeech-system/attention/base2.bs18k.config", "dc363cb")
. (And that maybe also for local files, if we allow that.)The Git ref should really be a hash. We should not allow "HEAD" or "master" or so. We should even not allow Git tags. It should not be possible to break some configs by an update in the repo. The user would be responsible for updates.
However, we might extend it by meta information, such that the code becomes more readable. E.g. extend the ref by a date, so you could write "2019-04-04--dc363cb` or so. Maybe we should even enforce that.
There should be a local cache for the Git repos of the imports. See the related Go get logic. This can be shallow and thus would not take much space. However, we should maybe clarify how this would be organized. Should this be in
/var/tmp
like the native-op compile cache, or be more permanent in the home dir (like Go get).So, let's discuss some of the details, or the generic API here. Maybe other important ideas or aspects we should consider before we design and implement this.
Beta Was this translation helpful? Give feedback.
All reactions