[Do not merge!] Pseudo PR for first release #11

mashehu · 2025-09-30T07:37:59Z

Do not merge! This is a PR of dev compared to the TEMPLATE branch for whole-pipeline reviewing purposes. Changes should be made to dev and this PR should not be merged! The actual release PR is at

docs/usage.md

mashehu · 2025-11-12T07:43:23Z

And I had another longer discussion with other core members about nested params. I see the usefulness in your setup, but given that they are not an officially supported feature of nextflow and currently will stop working with the strict syntax in nextflow 26.04 I think you need to rewrite them to a flat list and organize them via parameter groups.

quentinblampey · 2025-11-12T08:03:40Z

What do you mean by "organize them via parameter groups" @mashehu?

Another important feature of this nested params is that we can pass kwargs to every reader. Since we support about 10 different technology, it is convenient to be able to pass any technology-specific keyword argument if needed. It shouldn't always be used, but in certain scenarios, it can be useful.

If I want to flatten this, I'll then need to (i) write every kwarg in the schema (potentially 30+ additional params) and (ii) make sure I update the schema and the pipeline every time there is a new kwarg in the spatialdata readers (which may happen every month or so).

nextflow_schema.json

mashehu · 2025-11-12T08:27:21Z

with "parameter groups" I meant how we group them for documentation (not for any pipeline logic) as top-level objects in nextflow_schema.json, e.g. like you have "sopa config".

Your kwargs sound like what we use ext.args for https://nf-co.re/docs/contributing/components/ext_args, I think. Would that be an option for you?

quentinblampey · 2025-11-12T08:55:57Z

with "parameter groups" I meant how we group them for documentation (not for any pipeline logic) as top-level objects in nextflow_schema.json, e.g. like you have "sopa config".

Okay I see
Actually, I still think it would be very error-prone to flatten every param, it will add a lot of complexity and maintenance burden. Do you think there is no workaround? Is it sure that this feature will be dropped? I imagine I'm not the only one using this feature (on slack, multiple people agreed it's great to support it)

Your kwargs sound like what we use ext.args for https://nf-co.re/docs/contributing/components/ext_args, I think. Would that be an option for you?

Yes I guess it could work, but it also adds some complexity for both the user (kwargs would live in another place than the other args) and the developers (I'll need to merge the ext.args with the params before formatting them to pass them to the Sopa CLI)

mashehu · 2025-11-12T09:08:11Z

not sure what you mean with "'ll need to merge the ext.args with the params before formatting them to pass them to the Sopa CLI". we usually append a ${args} in the script command that can be filled with ext.args, see for example https://github.com/nf-core/modules/blob/master/modules/nf-core/spaceranger/count/main.nf#L52

quentinblampey · 2025-11-12T15:16:54Z

Yes, sorry I was talking about the way I provide kwargs to the Sopa command line. It's really specific to Sopa, and not to nextflow.
Currently, I can just pass the dictionary of args/kwargs to a process and they will be formatted automatically for Sopa. If the kwargs are now out of the params, I'll need to provide the params and the ext.args to the function that formats them.
Again, I think it's 100% doable technically, it's just a matter of convenience.

mashehu · 2025-11-12T15:35:12Z

you mean how you do it with your ArgsCLI() util function? or what special formatting do you need?

quentinblampey · 2025-11-12T16:11:53Z

Yes exactly, this ArgsCLI function

mashehu · 2025-11-13T10:13:45Z

you can do this in a config file instead, e.g.:

process {
    withName: 'AGGREGATE' {
        ext.args = { 
            def args = "--method ${params.aggregate_method} --min-intensity ${params.min_intensity}"
            if (params.technology == 'visium_hd') {
                args += " --dataset-id ${meta.id}"
            }
            args
        }
    }
}

mahesh-panchal · 2025-11-13T10:20:14Z

you can do this in a config file instead, e.g.:

process {
    withName: 'AGGREGATE' {
        ext.args = { 
            def args = "--method ${params.aggregate_method} --min-intensity ${params.min_intensity}"
            if (params.technology == 'visium_hd') {
                args += " --dataset-id ${meta.id}"
            }
            args
        }
    }
}

Use lists instead to form args. It's more maintainable.

process {
    withName: 'AGGREGATE' {
        ext.args = { 
            [ 
                "--method ${params.aggregate_method}", 
                "--min-intensity ${params.min_intensity}",
                params.technology == 'visium_hd' ? "--dataset-id ${meta.id}" : "",
            ].minus("").join(" ")
        }
    }
}

quentinblampey · 2025-11-13T11:03:51Z

This would indeed work, but to become exhaustive, it becomes more complex:

sopa.aggregate has many more arguments. Although the default should be fine with most cases, sometimes we may want to pass a specific optional argument. With the current implementation, we can just add the arg in the right params groups, and it will be passed properly to Sopa. With the proposed implementation, it would require writing every single param. This represents only about 10 params, so it's okay, but other processes are much more complex.
Some processes, e.g., TO_SPATIALDATA, support about 10 different technologies, each of which comes with up to 10 parameters. Therefore, this process alone would involve a range of 20 to 100 additional parameters, as well as a complex validation. For instance, we would need to ensure that we always use parameters dedicated to the same technology and that they don't overlap with those of another technology. Instead, with the current implementation, we simply receive a nice error log handled by the corresponding reader if we don't provide the correct arguments.
For cellpose segmentation, depending on whether we use a CPU or a GPU, we'll use different Docker images. Currently, the same image is used, but since cellpose v4 is extremely slow on a CPU, I plan to use a cellpose v3 image when running on CPU, and a cellpose v4 image on a GPU. Since the cellpose version has different parameter sets, we'll have two groups of parameters depending on the Docker image. With the proposed implementation, we'll need to check which cellpose version we use, ensure we make an exhaustive list of all parameters for all cellpose versions, and ensure we use the parameters from the right version. Again, with the current implementation, we just pass the arguments to sopa, which handles it internally and gives a nice error log if we don't provide the correct ones.
The above example is just for Cellpose, but we also support Baysor (about 20 extra params), proseg, comseg, and Stardist. They all come with many parameters, and they may be updated in the future. This means that, at every update, we'll need to check again all the parameters of all of these tools to make sure they are up to date. These packages evolve quickly, and the current implementation is flexible enough to handle nicely a version update.
By flattening the parameters, some names become confusing, since some similar parameter names may be used by different tools. Therefore, to avoid confusion, we'll need to specify the tool name. Some parameters will become very long, like segmentation_baysor_estimate_scale_from_centers. Having a few such parameters is fine, but having hundreds of them becomes very verbose.

More generally, Sopa itself is already heavily tested, and it depends on SpatialData, which is also heavily tested. By directly providing the parameters, we kind of "trust" the underlying methods, and add on top additional nf-core tests (currently, 5 test configs). If we flatten everything, we introduce potential sources of error. Given the complexity of flattening everything, I'm fairly certain it will create many errors and increase maintenance complexity. Honestly, I would really appreciate if we could find a solution without flattening everything, else I don't know how I would be able to maintain all this 😊

pinin4fjords · 2025-11-13T11:43:37Z

@quentinblampey - I understand your concerns, but the nested parameter approach isn't compatible with Nextflow 26.04's strict syntax enforcement. The standard nf-core pattern (prefixed flat parameters + ext.args) actually addresses your points:

Re: 100+ parameters and maintenance:

You don't need to expose every parameter in the schema. Expose only the common ones with prefixed params, then add extra_baysor_args, extra_cellpose_args, etc. for power users. See nf-core/rnaseq's extra_*_args pattern. Or, power users can also override ext.args entirely using withName selectors in their own configs.

When tools update, users can immediately use new parameters without you changing anything:

process {
    withName: 'BAYSOR_SEGMENTATION' {
        ext.args = { [
            params.baysor_scale ? "--scale ${params.baysor_scale}" : '',
            params.extra_baysor_args ?: ''
        ].join(' ').trim() }
    }
}

Re: parameter name conflicts:

Prefixes solve this. baysor_scale, cellpose_scale, stardist_scale. See differentialabundance's prefixed parameters (deseq2_, limma_, gsea_*).

Re: delegating validation to Sopa:

You can still do this. Build JSON in ext.args and pass it through, appending extra_*_args to preserve the pattern:

withName: 'BAYSOR_SEGMENTATION' {
    ext.args = { 
        def config = [
            scale: params.baysor_scale,
            min_molecules_per_cell: params.baysor_min_molecules
        ]
        "--config '${groovy.json.JsonOutput.toJson(config)}' ${params.extra_baysor_args ?: ''}"
    }
}

Your Sopa library still validates everything, you get clean error messages, and the schema stays flat and compliant. The extra_*_args pattern means you're not locked into maintaining every possible parameter.

quentinblampey · 2025-11-14T11:51:13Z

Hi @pinin4fjords, thanks for your answers and for your understanding, I appreciate!

I have a few questions regarding this:

Mix of static and dynamic kwargs

For the TO_SPATIALDATA process, I provide some kwargs to the sopa convert CLI. Some of these kwargs may be used by a specific spatialdata_io reader, which has itself some kwargs groups.

Therefore, I could have the following nested params:

reader = [
    technology: "visium_hd",
    kwargs: [
        imread_kwargs: [
            page: 0,
        ],
    ],
]

To flatten them, a user would need to write the complex line below.

withName: 'TO_SPATIALDATA' {
    ext.args = {
        '''--technology visium_hd --kwargs "{'imread_kwargs': {'page': 0}"'''
    }
}

Question 1: is there a possibility to avoid this complex quote usage (mix of ''', " and ')? Ideally, I want users to not require learning this syntax, I want it to be as intuitive as the nested params above.

Then, in the TO_SPATIALDATA process, I need to add some sample-specific values to the --kwargs. But, since the kwargs are hardcoded as a string, I now need to make it a map. Assuming I can create a function to convert the string into a map, I can then create:

kwargs: [
    imread_kwargs: [
         page: 0,
    ],
]

to then add some sample-specific values:

kwargs = [
    fullres_image_file: "/path/to/image",
    dataset_id: "DATASET_ID",
    imread_kwargs: [
        page: 0,
    ],
]

and then convert it back to the following string given another conversion function:

'''--kwargs "{fullres_image_file: '/path/to/image', dataset_id: 'DATASET_ID', 'imread_kwargs': {'page': 0}"'''

Question 2: I think this is not very clean, and I'm unsure how to write such a conversion function in a very robust manner. How could I avoid that with the flattened approach? I could also try to directly update the string itself instead of doing a double-conversion, but I think it would also be relatively complex to have a robust function to handle that.

General concern regarding compatibility

Users can run Sopa via:

the Python API, e.g. on a HPC, with some specific optimization backends such as Dask
using the Sopa CLI
using snakemake
now, using nextflow

Since there are many different ways to run Sopa, I try to make everything as compatible as possible, so that it feels very natural to move from one usage to another. This is one of the promises of Sopa: you can switch easily from one segmentation tool to another, from one technology to another, and from one usage-mode/platform to another.

The nested approach allowed to keep the same configuration files in both snakemake and nextflow. By moving to the flattened approach, we break the compatibility promise.

Question 3: How to ensure easy adoption of users who want to move to nf-core/sopa without a steep learning curve and complex nextflow-specific syntax learning?

pinin4fjords · 2025-11-14T12:30:05Z

Re: your three questions:

1. Complex quote usage (your visium_hd example): You should expose the most commonly changed parameters as pipeline params (like visium_hd_imread_page). Users configure these normally:

nextflow run nf-core/sopa --visium_hd_imread_page 5 --technology visium_hd

The pipeline handles the JSON complexity internally via ext.args:

withName: 'TO_SPATIALDATA' {
    ext.args = { 
        def kwargs = [imread_kwargs: [page: params.visium_hd_imread_page ?: 0]]
        "--technology ${params.technology} --kwargs '${groovy.json.JsonOutput.toJson(kwargs)}'"
    }
}

For edge cases where users need advanced configuration beyond what you expose, they can override ext.args in their own config or use extra_to_spatialdata_args. Users shouldn't need to write custom configs routinely.

2. Merging runtime values: Your ext.args closure has access to both params and meta. Following nf-core conventions, pass sample-specific values through the meta map:

withName: 'TO_SPATIALDATA' {
    ext.args = { 
        def kwargs = [
            imread_kwargs: [page: params.visium_hd_imread_page ?: 0],
            file_path: meta.file_path
        ]
        "--technology ${params.technology} --kwargs '${groovy.json.JsonOutput.toJson(kwargs)}'"
    }
}

3. Cross-platform compatibility: Maintaining Snakemake/Nextflow config compatibility isn't something nf-core supports. Each workflow manager has its own conventions. The nested parameters break in Nextflow 26.04 regardless. You could provide a config converter or clear documentation for users migrating between platforms, but the Nextflow pipeline needs to follow Nextflow/nf-core patterns.

quentinblampey · 2025-11-14T13:03:59Z

Regarding your answers:

1. Complex quote usage
I can't pass visium_hd_imread_page as a pipeline parameter because it will break for any technology that is not Visium-HD. One obvious fix is to add a check: if technology == "visium_hd" then add the parameter, else not. Some readers may not have imread_kwargs at all, so I would also need to check if I can pass imread_kwargs.
This logic, although simple, will become more complex knowing that we support about 10 different readers, each with a different combination of args / kwargs that are not always cross-technology compatible.

2. Merging runtime values
Thanks, I didn't know we had access to meta in the ext.args!
Just to confirm, this withName logic should be in a profile, correct? If so, I have another issue, as I can't just simply use groovy.json.JsonOutput.toJson. Indeed, the Sopa CLI uses Typer, so a boolean variable such as prior: false should be converted to --no-prior (notice the no).
This is why I made an ArgsCLI function, but I can't import this in the profile, no?
Again, I can reproduce this logic directly inside the ext.args I guess, but I'm afraid it would become unreadable.

3. Cross-platform compatibility
Okay thanks for your answer

NB: sorry for being annoying, I also understand your concerns, I'm just trying to find a solution that is convenient for everyone.

pinin4fjords · 2025-11-14T14:08:31Z

@quentinblampey - I think I understand your concerns better now. Let me address each point with some potential solutions (recognising that there are many ways to skin this particular cat):

1. Complex quote usage / Technology-specific parameters

You're concerned about the conditional logic needed when visium_hd_imread_page doesn't apply to other technologies. One solution is to add adapter functions to your utility subworkflow that handle this centrally:

// subworkflows/local/utils_nfcore_sopa_pipeline/main.nf (or similar)

def buildReadConfig(params) {
    def config = [technology: params.technology]
    
    // Technology-specific kwargs - all the conditional logic lives here
    if (params.technology == 'visium_hd' && params.visium_hd_imread_page != null) {
        config.kwargs = [imread_kwargs: [page: params.visium_hd_imread_page]]
    }
    else if (params.technology == 'cosmx' && params.cosmx_custom_param != null) {
        config.kwargs = [custom_param: params.cosmx_custom_param]
    }
    // ... other technologies
    
    return config
}

def buildSegmentationConfig(params) {
    def config = [:]
    
    if (params.cellpose_diameter != null) {
        config.cellpose = [
            diameter: params.cellpose_diameter,
            channels: params.cellpose_channels,
            flow_threshold: params.cellpose_flow_threshold,
            cellprob_threshold: params.cellpose_cellprob_threshold,
            min_area: params.cellpose_min_area
        ]
    }
    
    if (params.baysor_scale != null) {
        config.baysor = [
            config: [
                segmentation: [
                    scale: params.baysor_scale,
                    min_molecules_per_cell: params.baysor_min_molecules
                ]
            ]
        ]
    }
    
    return config
}

// Your existing ArgsCLI stays here too
def ArgsCLI(Map params, String contains = null, List keys = null) {
    // ... your existing implementation
}

Yes, this adds conditional logic for the 10 technologies, but it's centralized in one place. Everywhere you currently use params.read, you'd use buildReadConfig(params) instead.

2. Merging runtime values / Where does ArgsCLI go?

ext.args goes in conf/modules.config, NOT in your profile configs. Your profiles just set flat params:

// conf/predefined/cosmx_cellpose.config (PROFILE - just sets flat params)
params {
  technology = 'cosmx'
  cellpose_diameter = 60
  cellpose_channels = ['DNA']
  cellpose_flow_threshold = 2
  cellpose_cellprob_threshold = -6
  cellpose_min_area = 2000
}

// conf/modules.config (where ext.args and your ArgsCLI logic goes)
process {
    withName: 'TO_SPATIALDATA' {
        ext.args = { 
            def readConfig = buildReadConfig(params)
            // Merge runtime values from meta here
            if (meta.file_path) {
                readConfig.kwargs = readConfig.kwargs ?: [:]
                readConfig.kwargs.file_path = meta.file_path
            }
            ArgsCLI([read: readConfig])
        }
    }
    
    withName: 'SEGMENTATION_CELLPOSE' {
        ext.args = {
            def segConfig = buildSegmentationConfig(params)
            ArgsCLI([segmentation: segConfig])
        }
    }
}

Your ArgsCLI function (which handles the Typer boolean conversion like prior: false → --no-prior) stays in your utility subworkflow and can be called from modules.config.

The refactoring needed:

Flatten the profiles
Flatten nextflow_schema.json
Add adapter functions like buildReadConfig() and buildSegmentationConfig()
Update workflow code to use the adapters instead of accessing params.read directly

The architecture is: profiles set flat params → adapters rebuild nested structures → ArgsCLI converts to CLI args. The nested parameters in the schema break in Nextflow 26.04 and need to be flattened.

pinin4fjords · 2025-11-14T14:11:30Z

Note: I'm on leave after today, so I'm handing this back to other community members for further discussion.

quentinblampey · 2025-11-14T16:01:43Z

1.
Okay, I see. Basically, this approach is equivalent to (i) flattening every parameter and (ii) then reconstructing the params as a nested Map? It feels counterintuitive to flatten a map to then construct the original map. We kind of return to my original idea, but add a lot of extra logic to perform the flattening and the "re-nesting".

2.
But, in that case, it means we can't update some kwargs from a profile? We wanted to have the possibility to add some kwargs depending on the profile, but if we use ext.args only in the modules, how can we be profile-specific? Sorry for the naive question, I'm not sure to follow...

Anyway, even if we did that, I'm not sure to see how it would work, because (as far as I understand), I can't import ArgsCLI in my config files, no? I tried and got:

Unexpected input: 'from'

Or maybe it's a more recent nextflow feature?

quentinblampey · 2025-12-02T12:14:06Z

Hi,
Just a follow-up regarding the above discussion: does anyone have an idea on how to solve the flattening issue (see my last comment/questions)?

Important! Template update for nf-core/tools v3.5.1

quentinblampey · 2025-12-08T08:48:01Z

I think I have found some solutions for a few small blockers on my side. The last remaining question (and then I'll be able to start flattening everything) is: regarding the part 2 of this comment, do you confirm that we can't import this buildSegmentationConfig function inside the conf/modules.config? If so, where should we move this logic?

quentinblampey · 2025-12-12T13:45:55Z

I started to do the flattening, and I fell into new challenges. For one specific method of annotation (more specifically, the fluorescence method), we need a map whose keys are not always the same.

How could I flatten this, knowing that we never have the same panel of channel/protein names?

params {
    ...

    annotation = [
        method: "fluorescence",
        args: [
            marker_cell_dict: [
                CK: "Tumoral cell", // see CK here, or CD3, CD20, etc...
                CD3: "T cell",
                CD20: "B cell",
            ]
        ],
    ]
}

Another question: in the nextflow_schema, we define groups of parameters, but can we access these groups later on? For instance, I made a group named Explorer, and I would like to extract all keys of this group during runtime, if possible. It's not crucial, but it would avoid the need to list every single parameter that needs to be "nested back into a map" before sending it to the CLI.

quentinblampey added 30 commits February 27, 2025 13:58

template repo

96e7091

update schema

cf67583

try running toy dataset

2299ee5

wip

528435d

cellpose is running

8d2cb5e

minor cleanup / test

754e2ec

start using meta

68fa9df

working v0

8fbf215

move cellpose to a subworkflow

64280a4

add baysor

6a1bcb8

make logic between cellpose and baysor

43c7954

can use --config and remove hardcoded publishdir

111c9a7

use label process

fc592eb

minor readme update

7e63060

add conda env files

9d8db7f

fix conda env for cellpose

f4cf634

fix inputs validation (configfile)

7bd5177

clean versions + right explorer outputs

330b2de

add jeffquinnmsk/sopa container

d8166c5

remove image for baysor

a4b9075

try c51 config

e2cab8c

add slurm profile

a9c9918

try validation path

8b1a841

try symlink

4dcaa63

use symlink

0965bc7

publish at the end

8aa086d

use publish dir mode copy

26c2a82

re-run baysor with process_high

99c18de

terminate config

a286369

try errorStrategy = 'ignore'

a00cca6

mashehu commented Nov 11, 2025

View reviewed changes

docs/usage.md Outdated Show resolved Hide resolved

remove params-file details + move profiles to the right docs section

2675374

mashehu commented Nov 12, 2025

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

quentinblampey added 3 commits November 14, 2025 15:26

update to sopa==2.1.9

1cd3ec6

update snapshot

7224750

fix snapshots 2

c991e41

quentinblampey and others added 3 commits December 5, 2025 11:59

Merge branch 'dev' into nf-core-template-merge-3.5.1

517f8ae

Merge pull request #14 from nf-core/nf-core-template-merge-3.5.1

3f3d00c

Important! Template update for nf-core/tools v3.5.1

use sopa==2.1.10 and update snapshots

2b846f8

[Do not merge!] Pseudo PR for first release #11

Are you sure you want to change the base?

[Do not merge!] Pseudo PR for first release #11

Conversation

mashehu commented Sep 30, 2025

Uh oh!

Uh oh!

mashehu commented Nov 12, 2025

Uh oh!

quentinblampey commented Nov 12, 2025

Uh oh!

Uh oh!

mashehu commented Nov 12, 2025

Uh oh!

quentinblampey commented Nov 12, 2025

Uh oh!

mashehu commented Nov 12, 2025

Uh oh!

quentinblampey commented Nov 12, 2025

Uh oh!

mashehu commented Nov 12, 2025

Uh oh!

quentinblampey commented Nov 12, 2025

Uh oh!

mashehu commented Nov 13, 2025

Uh oh!

mahesh-panchal commented Nov 13, 2025

Uh oh!

quentinblampey commented Nov 13, 2025

Uh oh!

pinin4fjords commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quentinblampey commented Nov 14, 2025

Mix of static and dynamic kwargs

General concern regarding compatibility

Uh oh!

pinin4fjords commented Nov 14, 2025

Uh oh!

quentinblampey commented Nov 14, 2025

Uh oh!

pinin4fjords commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pinin4fjords commented Nov 14, 2025

Uh oh!

quentinblampey commented Nov 14, 2025

Uh oh!

quentinblampey commented Dec 2, 2025

Uh oh!

quentinblampey commented Dec 8, 2025

Uh oh!

quentinblampey commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pinin4fjords commented Nov 13, 2025 •

edited

Loading

pinin4fjords commented Nov 14, 2025 •

edited

Loading

quentinblampey commented Dec 12, 2025 •

edited

Loading