-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement "native" TES executor #3
Comments
THanks for the suggestion. Indeed, there is some overhead in calling Snakemake again within the job. The benefit is the increased flexibility (works with any storage plugin, shell, as well as script, notebook and wrapper support, as well as all the different software deployment backends (conda, apptainer, later nix etc.).
All in all, yes I agree that it makes sense to go this way, I would just say it is not something we should do within a plugin but for Snakemake itself. |
One final thing to note: Snakemake supports grouping jobs together (DAG partitioning). In such a case, there is really no way around a mini-workflow invocation. This feature is very important for maximizing and fine-tuning the performance real world workflows on cluster and cloud, in particular in order to limit IO and network traffic. One could simply fallback to the current approach in such case. You should keep in mind though that this feature can be used quite abundantly for sophisticated production workflows. |
Thanks @johanneskoester, very useful feedback! To comment on your points:
If you are interested, we would be happy to discuss this in one of the upcoming TES meetings. Please let me know if you are interested and we will reach out to schedule something. |
Yes, lets meet! Very interesting thoughts. Just send me an email to the public email address.
All in all, I am so happy to see TES contributors getting interest in this plugin! Would be great if you could push it to the boundaries, and I am happy to extend or upgrade the plugin interface so that it fits the needs of TES even better. |
Perfect :) I will reach out with an invite after the holiday break. Frohe Festtage dir 🎄 |
Would be good to look into this @uniqueg. Thoughts? |
Yes, let's discuss, @vsmalladi. Sorry for not following up earlier @johanneskoester. Guess this will become even more relevant now that @vsmalladi and team will look into this for Microsoft's TES on Azure. |
Problem
The current GA4GH TES executor wraps every TES task in a Snakemake command, essentially making them 1-step Snakemake workflows. While this design choice aligned with that of other executors and provides a high degree of compatibility in terms of features supported by Snakemake, it comes at a considerable cost:
tesTask.executors[].command
is asnakemake
callconfigfile
to forego changing the workflow descriptors when using different remote storage providers was not supported when I tried; admittedly, those could be errors on my side); a native TES executor could deal with cloud storage insteadSolution
Implement a "native" TES executor, i.e., implement the executor in such a way that commands to be executed are not wrapped by Snakemake. Instead,
tesTask.executors[].command
should take the value of the command to be executed,tesTask.executors.image
should take on the value of the (Docker) image or Conda environment (for supported TES implementations) in which the command is to be executed, andtesTask.inputs[]
andtesTask.outputs[]
should contain the actual command inputs and outputs.@vsmalladi @MattMcL4475 @svedziok @vschnei @kellrott
The text was updated successfully, but these errors were encountered: