Skip to content

Feature request: per-process cap on “in-flight outputs” #6638

@asuq

Description

@asuq

New feature

Hello Nextflow maintainers,
thank you very much for developing Nextflow and for the continuous effort in maintaining it.

I would like to propose a new process-level control to prevent large intermediate outputs from accumulating in work/ when an upstream process is much faster than the downstream one.

In short: a per-process cap on “in-flight outputs” (outputs already produced by a process but not yet fully consumed/drained by downstream), to provide backpressure and bound peak disk usage.

Use case

Main scenario is HPC (or any quota-limited environment) where work/ is on a filesystem with strict disk quota.

Typical pattern:

  • Process A: fast, produces very large output per sample (e.g. 10–100 GB)
  • Process B: slow (or low parallelism), consumes that output
  • With ~100 samples, Process A can finish quickly and produce many huge outputs, while Process B is still processing only a few.
  • These outputs must stay in work/ until downstream is finished, so temporary disk usage grows a lot and can exceed quota.

I also use nf-boost to minimise work/, but it cannot remove those intermediates early because they are still required by downstream tasks. So the peak usage problem still happens.

Why existing feature do not solve it:

  • executor.queueSize is global; it does not enforce fairness between processes (fast upstream can fill it first).
  • Reducing upstream maxForks is possible but often requires manual tuning and can reduce throughput; also it does not address submission starvation in a general way.
  • nf-boost helps only when files are no longer needed; it cannot prevent accumulation when downstream is slow.

Related discussions:

Suggested implementation

Add a per-process directive (name is flexible) that limits how many outputs from that process can be “in flight” at the same time.

Example (illustrative):

  process {
    withName: PROCESS_A {
      maxInFlight = 5
    }
  }

Proposed semantics:

  • For a process with maxInFlight = N, Nextflow should allow at most N tasks worth of outputs to be produced and remain “pending” for downstream.
  • If the limit is reached, new tasks of that process should not start / not be submitted, until downstream has progressed enough that some previously produced outputs are no longer pending (e.g. all dependent downstream tasks for those outputs have completed successfully).
  • This is not asking to delete files unsafely: it is only restricting ahead-of-time production to keep disk usage bounded.

Implementation building blocks:

  • Maintain an “in-flight counter” per process (or per output channel edge), increment when a task of that process completes and materialises its outputs.
  • Decrement when all downstream tasks that consume that output are completed (for simple linear A→B it is just when the corresponding B task finishes; for branching, when all dependants finish).
  • Integrate this check into the task dispatch / submission logic so that producers can be paused when they outrun consumers.

Environment:

  • Nextflow version: v25.10.2
  • Executor: slurm
  • nf-boost version: v0.6.0

Thanks again for your time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions