Skip to content

Feature request: Allow customizable task hashing #6683

@kaizhang

Description

@kaizhang

New feature

During workflow development, minor changes to a process script (e.g., adding debug echoes, refactoring commands, or commenting lines) frequently invalidate the task cache, even when inputs and outputs remain functionally identical. This forces unnecessary re-execution of tasks on every -resume run, significantly slowing down iterative development.

It would be very useful to have a way to customize how task hashes are computed, particularly to optionally exclude certain components (like the process script content) from the hash calculation in specific scenarios.

Use case

  • Rapid prototyping and debugging of pipelines.
  • Prevents re-running long tasks just because a cat or echo was added for inspection.
  • Complements existing caching mechanisms without breaking reproducibility in final runs.

Suggested implementation

Introduce a new process directive (or extend the existing cache directive) that allows users to control which elements contribute to the task hash. Examples:

process myProcess {
    cache { task ->
        // Return a custom object/map that contributes to the hash
        [name: task.process, inputs: task.inputs]  // explicitly exclude script, container, etc.
    }
    // ...
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions