PoC: multi-provider buckets in Fusion tasks #6667
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is a PoC to validate that Nextflow and Fusion can work with buckets hosted in different providers. It allow to have task input files in different cloud storage schemes (AWS S3 and Azure in this PoC) without requiring to manage as foreign files. Instead of copying files to the task workdir, files are directly accessed from Fusion.
Two main changes have been applied, one to allow input files form :
Cloud scheme support and foreign file detection:
SUPPORTED_CLOUD_SCHEMESlist inFusionHelper(currently 's3' and 'az'), and updated theExecutor.isForeignFilemethod to properly recognize files from supported cloud schemes as local when Fusion is enabled. [1] [2]Fusion environment handling :
The following changes are just for testing purposes, not considering to merge to master as they are.
getEnvironmentFromPath(Path, FusionConfig)to theFusionEnvinterface and implemented it in both AWS and Azure Fusion environment providers. This enables environment variables to be set based on the specific scheme of each input file path, not just the overall task scheme. [1] [2] [3]FusionEnvProviderto aggregate environment variables from all relevantFusionEnvextensions for each external input file, using the newgetEnvironmentFromPathmethod.FusionScriptLauncherto track input files with schemes different from the task's main scheme ("external inputs") and to request environment variables for each of these from the relevant provider. This ensures that the correct authentication and configuration are available for files stored in different cloud providers. [1] [2] [3] [4]** Test pipeline
A pipeline has been created to check a external file is correctly mounted and accessible in a task managed by Fusion
*main.nf: