Feature request: per-process cap on “in-flight outputs”

## New feature

Hello Nextflow maintainers,
thank you very much for developing Nextflow and for the continuous effort in maintaining it.

I would like to propose a new process-level control to prevent *large intermediate outputs* from accumulating in `work/` when an upstream process is much faster than the downstream one.

In short: a **per-process cap on “in-flight outputs”** (outputs already produced by a process but not yet fully consumed/drained by downstream), to provide backpressure and bound peak disk usage.

## Use case 

Main scenario is HPC (or any quota-limited environment) where `work/` is on a filesystem with strict disk quota.

### Typical pattern:
- **Process A**: fast, produces very large output per sample (e.g. 10–100 GB)
- **Process B**: slow (or low parallelism), consumes that output
- With ~100 samples, Process A can finish quickly and produce many huge outputs, while Process B is still processing only a few.
- These outputs must stay in `work/` until downstream is finished, so temporary disk usage grows a lot and can exceed quota.

I also use `nf-boost` to minimise `work/`, but it cannot remove those intermediates early because they are still required by downstream tasks. So the peak usage problem still happens.

### Why existing feature do not solve it:
- `executor.queueSize` is global; it does not enforce fairness between processes (fast upstream can fill it first).
- Reducing upstream `maxForks` is possible but often requires manual tuning and can reduce throughput; also it does not address submission starvation in a general way.
- `nf-boost` helps only when files are no longer needed; it cannot prevent accumulation when downstream is slow.

### Related discussions:
- #2468 (huge intermediate files; merging processes often suggested but not always desirable)
- #1754 (ideas about earlier cleanup / downstream start; same pain point but different direction)

## Suggested implementation 

Add a per-process directive (name is flexible) that limits how many outputs from that process can be “in flight” at the same time.

Example (illustrative):
```
  process {
    withName: PROCESS_A {
      maxInFlight = 5
    }
  }
```

### Proposed semantics:
- For a process with `maxInFlight = N`, Nextflow should allow at most **N tasks worth of outputs** to be produced and remain “pending” for downstream.
- If the limit is reached, new tasks of that process should not start / not be submitted, until downstream has progressed enough that some previously produced outputs are no longer pending (e.g. all dependent downstream tasks for those outputs have completed successfully).
- This is not asking to delete files unsafely: it is only restricting ahead-of-time production to keep disk usage bounded.

### Implementation building blocks:
- Maintain an “in-flight counter” per process (or per output channel edge), increment when a task of that process completes and materialises its outputs.
- Decrement when all downstream tasks that consume that output are completed (for simple linear A→B it is just when the corresponding B task finishes; for branching, when all dependants finish).
- Integrate this check into the task dispatch / submission logic so that producers can be paused when they outrun consumers.

## Environment:
- Nextflow version: v25.10.2
- Executor: slurm
- nf-boost version: v0.6.0

Thanks again for your time.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: per-process cap on “in-flight outputs” #6638

New feature

Use case

Typical pattern:

Why existing feature do not solve it:

Related discussions:

Suggested implementation

Proposed semantics:

Implementation building blocks:

Environment:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: per-process cap on “in-flight outputs” #6638

Description

New feature

Use case

Typical pattern:

Why existing feature do not solve it:

Related discussions:

Suggested implementation

Proposed semantics:

Implementation building blocks:

Environment:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions