Skip to content

Conversation

@jorgee
Copy link
Contributor

@jorgee jorgee commented Dec 17, 2025

This pull request is a PoC to validate that Nextflow and Fusion can work with buckets hosted in different providers. It allow to have task input files in different cloud storage schemes (AWS S3 and Azure in this PoC) without requiring to manage as foreign files. Instead of copying files to the task workdir, files are directly accessed from Fusion.

Two main changes have been applied, one to allow input files form :

Cloud scheme support and foreign file detection:

  • Introduced a SUPPORTED_CLOUD_SCHEMES list in FusionHelper (currently 's3' and 'az'), and updated the Executor.isForeignFile method to properly recognize files from supported cloud schemes as local when Fusion is enabled. [1] [2]

Fusion environment handling :

The following changes are just for testing purposes, not considering to merge to master as they are.

  • Added a new method getEnvironmentFromPath(Path, FusionConfig) to the FusionEnv interface and implemented it in both AWS and Azure Fusion environment providers. This enables environment variables to be set based on the specific scheme of each input file path, not just the overall task scheme. [1] [2] [3]
  • Updated FusionEnvProvider to aggregate environment variables from all relevant FusionEnv extensions for each external input file, using the new getEnvironmentFromPath method.
  • Modified FusionScriptLauncher to track input files with schemes different from the task's main scheme ("external inputs") and to request environment variables for each of these from the relevant provider. This ensures that the correct authentication and configuration are available for files stored in different cloud providers. [1] [2] [3] [4]

** Test pipeline

A pipeline has been created to check a external file is correctly mounted and accessible in a task managed by Fusion

*main.nf:

process list {
        input:
                path in_file
        script:
        """
        ls -l $in_file
        cat $in_file
        """
}

workflow {
        list (file(params.input))
}
  • nextflow.config
profiles{
        amazon {
                process.executor = 'awsbatch'
                process.queue = 'TowerForge-xxxxx'
                workDir = 's3://jorgee-eu-west1-test1/work/'
        }
        azure {
                process.executor = 'azurebatch'
                workDir = 'az://test-nf-jorgee/work/'
        }
        google {
                process.executor = 'google-batch'
                workDir = 'gs://nf-test-jorgee/work/'
        }
}
aws.region = 'eu-west-1'
aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'

google.project = 'xxx'
google.location = 'europe-west1'

azure {
  storage {
    accountName = "nfazurestore"
    accountKey = "xxxxx"
  }
  batch {
    location = 'westeurope'
    accountName = 'nfbatchtest'
    accountKey = 'xxxxx'
    autoPoolMode = true
  }
}
process.container = 'quay.io/nextflow/bash'
wave.enabled=true
fusion.enabled=true
fusion.exportStorageCredentials=true
  • Tested cases:
$ nextflow run main.nf -profile azure --input s3://jorgee-eu-west1-test1/greetings.csv
$ nextflow run main.nf -profile google --input s3://jorgee-eu-west1-test1/greetings.csv
$ nextflow run main.nf -profile google --input az://test-nf-jorgee/greetings.csv
$ nextflow run main.nf -profile amazon --input az://test-nf-jorgee/greetings.csv

@netlify
Copy link

netlify bot commented Dec 17, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 45729d8
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69428533c0b20b0008f502ae

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants