Skip to content

Conversation

@mashehu
Copy link
Collaborator

@mashehu mashehu commented Sep 30, 2025

Do not merge! This is a PR of dev compared to the TEMPLATE branch for whole-pipeline reviewing purposes. Changes should be made to dev and this PR should not be merged! The actual release PR is at

@mashehu
Copy link
Collaborator Author

mashehu commented Nov 12, 2025

And I had another longer discussion with other core members about nested params. I see the usefulness in your setup, but given that they are not an officially supported feature of nextflow and currently will stop working with the strict syntax in nextflow 26.04 I think you need to rewrite them to a flat list and organize them via parameter groups.

@quentinblampey
Copy link
Collaborator

What do you mean by "organize them via parameter groups" @mashehu?

Another important feature of this nested params is that we can pass kwargs to every reader. Since we support about 10 different technology, it is convenient to be able to pass any technology-specific keyword argument if needed. It shouldn't always be used, but in certain scenarios, it can be useful.

If I want to flatten this, I'll then need to (i) write every kwarg in the schema (potentially 30+ additional params) and (ii) make sure I update the schema and the pipeline every time there is a new kwarg in the spatialdata readers (which may happen every month or so).

@mashehu
Copy link
Collaborator Author

mashehu commented Nov 12, 2025

with "parameter groups" I meant how we group them for documentation (not for any pipeline logic) as top-level objects in nextflow_schema.json, e.g. like you have "sopa config".

Your kwargs sound like what we use ext.args for https://nf-co.re/docs/contributing/components/ext_args, I think. Would that be an option for you?

@quentinblampey
Copy link
Collaborator

with "parameter groups" I meant how we group them for documentation (not for any pipeline logic) as top-level objects in nextflow_schema.json, e.g. like you have "sopa config".

Okay I see
Actually, I still think it would be very error-prone to flatten every param, it will add a lot of complexity and maintenance burden. Do you think there is no workaround? Is it sure that this feature will be dropped? I imagine I'm not the only one using this feature (on slack, multiple people agreed it's great to support it)

Your kwargs sound like what we use ext.args for https://nf-co.re/docs/contributing/components/ext_args, I think. Would that be an option for you?

Yes I guess it could work, but it also adds some complexity for both the user (kwargs would live in another place than the other args) and the developers (I'll need to merge the ext.args with the params before formatting them to pass them to the Sopa CLI)

@mashehu
Copy link
Collaborator Author

mashehu commented Nov 12, 2025

not sure what you mean with "'ll need to merge the ext.args with the params before formatting them to pass them to the Sopa CLI". we usually append a ${args} in the script command that can be filled with ext.args, see for example https://github.com/nf-core/modules/blob/master/modules/nf-core/spaceranger/count/main.nf#L52

@quentinblampey
Copy link
Collaborator

Yes, sorry I was talking about the way I provide kwargs to the Sopa command line. It's really specific to Sopa, and not to nextflow.
Currently, I can just pass the dictionary of args/kwargs to a process and they will be formatted automatically for Sopa. If the kwargs are now out of the params, I'll need to provide the params and the ext.args to the function that formats them.
Again, I think it's 100% doable technically, it's just a matter of convenience.

@mashehu
Copy link
Collaborator Author

mashehu commented Nov 12, 2025

you mean how you do it with your ArgsCLI() util function? or what special formatting do you need?

@quentinblampey
Copy link
Collaborator

Yes exactly, this ArgsCLI function

@mashehu
Copy link
Collaborator Author

mashehu commented Nov 13, 2025

you can do this in a config file instead, e.g.:

process {
    withName: 'AGGREGATE' {
        ext.args = { 
            def args = "--method ${params.aggregate_method} --min-intensity ${params.min_intensity}"
            if (params.technology == 'visium_hd') {
                args += " --dataset-id ${meta.id}"
            }
            args
        }
    }
}

@mahesh-panchal
Copy link
Member

you can do this in a config file instead, e.g.:

process {
    withName: 'AGGREGATE' {
        ext.args = { 
            def args = "--method ${params.aggregate_method} --min-intensity ${params.min_intensity}"
            if (params.technology == 'visium_hd') {
                args += " --dataset-id ${meta.id}"
            }
            args
        }
    }
}

Use lists instead to form args. It's more maintainable.

process {
    withName: 'AGGREGATE' {
        ext.args = { 
            [ 
                "--method ${params.aggregate_method}", 
                "--min-intensity ${params.min_intensity}",
                params.technology == 'visium_hd' ? "--dataset-id ${meta.id}" : "",
            ].minus("").join(" ")
        }
    }
}

@quentinblampey
Copy link
Collaborator

This would indeed work, but to become exhaustive, it becomes more complex:

  1. sopa.aggregate has many more arguments. Although the default should be fine with most cases, sometimes we may want to pass a specific optional argument. With the current implementation, we can just add the arg in the right params groups, and it will be passed properly to Sopa. With the proposed implementation, it would require writing every single param. This represents only about 10 params, so it's okay, but other processes are much more complex.
  2. Some processes, e.g., TO_SPATIALDATA, support about 10 different technologies, each of which comes with up to 10 parameters. Therefore, this process alone would involve a range of 20 to 100 additional parameters, as well as a complex validation. For instance, we would need to ensure that we always use parameters dedicated to the same technology and that they don't overlap with those of another technology. Instead, with the current implementation, we simply receive a nice error log handled by the corresponding reader if we don't provide the correct arguments.
  3. For cellpose segmentation, depending on whether we use a CPU or a GPU, we'll use different Docker images. Currently, the same image is used, but since cellpose v4 is extremely slow on a CPU, I plan to use a cellpose v3 image when running on CPU, and a cellpose v4 image on a GPU. Since the cellpose version has different parameter sets, we'll have two groups of parameters depending on the Docker image. With the proposed implementation, we'll need to check which cellpose version we use, ensure we make an exhaustive list of all parameters for all cellpose versions, and ensure we use the parameters from the right version. Again, with the current implementation, we just pass the arguments to sopa, which handles it internally and gives a nice error log if we don't provide the correct ones.
  4. The above example is just for Cellpose, but we also support Baysor (about 20 extra params), proseg, comseg, and Stardist. They all come with many parameters, and they may be updated in the future. This means that, at every update, we'll need to check again all the parameters of all of these tools to make sure they are up to date. These packages evolve quickly, and the current implementation is flexible enough to handle nicely a version update.
  5. By flattening the parameters, some names become confusing, since some similar parameter names may be used by different tools. Therefore, to avoid confusion, we'll need to specify the tool name. Some parameters will become very long, like segmentation_baysor_estimate_scale_from_centers. Having a few such parameters is fine, but having hundreds of them becomes very verbose.

More generally, Sopa itself is already heavily tested, and it depends on SpatialData, which is also heavily tested. By directly providing the parameters, we kind of "trust" the underlying methods, and add on top additional nf-core tests (currently, 5 test configs). If we flatten everything, we introduce potential sources of error. Given the complexity of flattening everything, I'm fairly certain it will create many errors and increase maintenance complexity. Honestly, I would really appreciate if we could find a solution without flattening everything, else I don't know how I would be able to maintain all this 😊

@pinin4fjords
Copy link
Member

pinin4fjords commented Nov 13, 2025

@quentinblampey - I understand your concerns, but the nested parameter approach isn't compatible with Nextflow 26.04's strict syntax enforcement. The standard nf-core pattern (prefixed flat parameters + ext.args) actually addresses your points:

Re: 100+ parameters and maintenance:

You don't need to expose every parameter in the schema. Expose only the common ones with prefixed params, then add extra_baysor_args, extra_cellpose_args, etc. for power users. See nf-core/rnaseq's extra_*_args pattern. Or, power users can also override ext.args entirely using withName selectors in their own configs.

When tools update, users can immediately use new parameters without you changing anything:

process {
    withName: 'BAYSOR_SEGMENTATION' {
        ext.args = { [
            params.baysor_scale ? "--scale ${params.baysor_scale}" : '',
            params.extra_baysor_args ?: ''
        ].join(' ').trim() }
    }
}

Re: parameter name conflicts:

Prefixes solve this. baysor_scale, cellpose_scale, stardist_scale. See differentialabundance's prefixed parameters (deseq2_, limma_, gsea_*).

Re: delegating validation to Sopa:

You can still do this. Build JSON in ext.args and pass it through, appending extra_*_args to preserve the pattern:

withName: 'BAYSOR_SEGMENTATION' {
    ext.args = { 
        def config = [
            scale: params.baysor_scale,
            min_molecules_per_cell: params.baysor_min_molecules
        ]
        "--config '${groovy.json.JsonOutput.toJson(config)}' ${params.extra_baysor_args ?: ''}"
    }
}

Your Sopa library still validates everything, you get clean error messages, and the schema stays flat and compliant. The extra_*_args pattern means you're not locked into maintaining every possible parameter.

@quentinblampey
Copy link
Collaborator

Hi @pinin4fjords, thanks for your answers and for your understanding, I appreciate!

I have a few questions regarding this:

Mix of static and dynamic kwargs

For the TO_SPATIALDATA process, I provide some kwargs to the sopa convert CLI. Some of these kwargs may be used by a specific spatialdata_io reader, which has itself some kwargs groups.

Therefore, I could have the following nested params:

reader = [
    technology: "visium_hd",
    kwargs: [
        imread_kwargs: [
            page: 0,
        ],
    ],
]

To flatten them, a user would need to write the complex line below.

withName: 'TO_SPATIALDATA' {
    ext.args = {
        '''--technology visium_hd --kwargs "{'imread_kwargs': {'page': 0}"'''
    }
}

Question 1: is there a possibility to avoid this complex quote usage (mix of ''', " and ')? Ideally, I want users to not require learning this syntax, I want it to be as intuitive as the nested params above.

Then, in the TO_SPATIALDATA process, I need to add some sample-specific values to the --kwargs. But, since the kwargs are hardcoded as a string, I now need to make it a map. Assuming I can create a function to convert the string into a map, I can then create:

kwargs: [
    imread_kwargs: [
         page: 0,
    ],
]

to then add some sample-specific values:

kwargs = [
    fullres_image_file: "/path/to/image",
    dataset_id: "DATASET_ID",
    imread_kwargs: [
        page: 0,
    ],
]

and then convert it back to the following string given another conversion function:

'''--kwargs "{fullres_image_file: '/path/to/image', dataset_id: 'DATASET_ID', 'imread_kwargs': {'page': 0}"'''

Question 2: I think this is not very clean, and I'm unsure how to write such a conversion function in a very robust manner. How could I avoid that with the flattened approach? I could also try to directly update the string itself instead of doing a double-conversion, but I think it would also be relatively complex to have a robust function to handle that.

General concern regarding compatibility

Users can run Sopa via:

  1. the Python API, e.g. on a HPC, with some specific optimization backends such as Dask
  2. using the Sopa CLI
  3. using snakemake
  4. now, using nextflow

Since there are many different ways to run Sopa, I try to make everything as compatible as possible, so that it feels very natural to move from one usage to another. This is one of the promises of Sopa: you can switch easily from one segmentation tool to another, from one technology to another, and from one usage-mode/platform to another.

The nested approach allowed to keep the same configuration files in both snakemake and nextflow. By moving to the flattened approach, we break the compatibility promise.

Question 3: How to ensure easy adoption of users who want to move to nf-core/sopa without a steep learning curve and complex nextflow-specific syntax learning?

@pinin4fjords
Copy link
Member

Re: your three questions:

1. Complex quote usage (your visium_hd example): You should expose the most commonly changed parameters as pipeline params (like visium_hd_imread_page). Users configure these normally:

nextflow run nf-core/sopa --visium_hd_imread_page 5 --technology visium_hd

The pipeline handles the JSON complexity internally via ext.args:

withName: 'TO_SPATIALDATA' {
    ext.args = { 
        def kwargs = [imread_kwargs: [page: params.visium_hd_imread_page ?: 0]]
        "--technology ${params.technology} --kwargs '${groovy.json.JsonOutput.toJson(kwargs)}'"
    }
}

For edge cases where users need advanced configuration beyond what you expose, they can override ext.args in their own config or use extra_to_spatialdata_args. Users shouldn't need to write custom configs routinely.

2. Merging runtime values: Your ext.args closure has access to both params and meta. Following nf-core conventions, pass sample-specific values through the meta map:

withName: 'TO_SPATIALDATA' {
    ext.args = { 
        def kwargs = [
            imread_kwargs: [page: params.visium_hd_imread_page ?: 0],
            file_path: meta.file_path
        ]
        "--technology ${params.technology} --kwargs '${groovy.json.JsonOutput.toJson(kwargs)}'"
    }
}

3. Cross-platform compatibility: Maintaining Snakemake/Nextflow config compatibility isn't something nf-core supports. Each workflow manager has its own conventions. The nested parameters break in Nextflow 26.04 regardless. You could provide a config converter or clear documentation for users migrating between platforms, but the Nextflow pipeline needs to follow Nextflow/nf-core patterns.

@quentinblampey
Copy link
Collaborator

Regarding your answers:

1. Complex quote usage
I can't pass visium_hd_imread_page as a pipeline parameter because it will break for any technology that is not Visium-HD. One obvious fix is to add a check: if technology == "visium_hd" then add the parameter, else not. Some readers may not have imread_kwargs at all, so I would also need to check if I can pass imread_kwargs.
This logic, although simple, will become more complex knowing that we support about 10 different readers, each with a different combination of args / kwargs that are not always cross-technology compatible.

2. Merging runtime values
Thanks, I didn't know we had access to meta in the ext.args!
Just to confirm, this withName logic should be in a profile, correct? If so, I have another issue, as I can't just simply use groovy.json.JsonOutput.toJson. Indeed, the Sopa CLI uses Typer, so a boolean variable such as prior: false should be converted to --no-prior (notice the no).
This is why I made an ArgsCLI function, but I can't import this in the profile, no?
Again, I can reproduce this logic directly inside the ext.args I guess, but I'm afraid it would become unreadable.

3. Cross-platform compatibility
Okay thanks for your answer

NB: sorry for being annoying, I also understand your concerns, I'm just trying to find a solution that is convenient for everyone.

@pinin4fjords
Copy link
Member

pinin4fjords commented Nov 14, 2025

@quentinblampey - I think I understand your concerns better now. Let me address each point with some potential solutions (recognising that there are many ways to skin this particular cat):

1. Complex quote usage / Technology-specific parameters

You're concerned about the conditional logic needed when visium_hd_imread_page doesn't apply to other technologies. One solution is to add adapter functions to your utility subworkflow that handle this centrally:

// subworkflows/local/utils_nfcore_sopa_pipeline/main.nf (or similar)

def buildReadConfig(params) {
    def config = [technology: params.technology]
    
    // Technology-specific kwargs - all the conditional logic lives here
    if (params.technology == 'visium_hd' && params.visium_hd_imread_page != null) {
        config.kwargs = [imread_kwargs: [page: params.visium_hd_imread_page]]
    }
    else if (params.technology == 'cosmx' && params.cosmx_custom_param != null) {
        config.kwargs = [custom_param: params.cosmx_custom_param]
    }
    // ... other technologies
    
    return config
}

def buildSegmentationConfig(params) {
    def config = [:]
    
    if (params.cellpose_diameter != null) {
        config.cellpose = [
            diameter: params.cellpose_diameter,
            channels: params.cellpose_channels,
            flow_threshold: params.cellpose_flow_threshold,
            cellprob_threshold: params.cellpose_cellprob_threshold,
            min_area: params.cellpose_min_area
        ]
    }
    
    if (params.baysor_scale != null) {
        config.baysor = [
            config: [
                segmentation: [
                    scale: params.baysor_scale,
                    min_molecules_per_cell: params.baysor_min_molecules
                ]
            ]
        ]
    }
    
    return config
}

// Your existing ArgsCLI stays here too
def ArgsCLI(Map params, String contains = null, List keys = null) {
    // ... your existing implementation
}

Yes, this adds conditional logic for the 10 technologies, but it's centralized in one place. Everywhere you currently use params.read, you'd use buildReadConfig(params) instead.

2. Merging runtime values / Where does ArgsCLI go?

ext.args goes in conf/modules.config, NOT in your profile configs. Your profiles just set flat params:

// conf/predefined/cosmx_cellpose.config (PROFILE - just sets flat params)
params {
  technology = 'cosmx'
  cellpose_diameter = 60
  cellpose_channels = ['DNA']
  cellpose_flow_threshold = 2
  cellpose_cellprob_threshold = -6
  cellpose_min_area = 2000
}

// conf/modules.config (where ext.args and your ArgsCLI logic goes)
process {
    withName: 'TO_SPATIALDATA' {
        ext.args = { 
            def readConfig = buildReadConfig(params)
            // Merge runtime values from meta here
            if (meta.file_path) {
                readConfig.kwargs = readConfig.kwargs ?: [:]
                readConfig.kwargs.file_path = meta.file_path
            }
            ArgsCLI([read: readConfig])
        }
    }
    
    withName: 'SEGMENTATION_CELLPOSE' {
        ext.args = {
            def segConfig = buildSegmentationConfig(params)
            ArgsCLI([segmentation: segConfig])
        }
    }
}

Your ArgsCLI function (which handles the Typer boolean conversion like prior: false--no-prior) stays in your utility subworkflow and can be called from modules.config.

The refactoring needed:

  • Flatten the profiles
  • Flatten nextflow_schema.json
  • Add adapter functions like buildReadConfig() and buildSegmentationConfig()
  • Update workflow code to use the adapters instead of accessing params.read directly

The architecture is: profiles set flat params → adapters rebuild nested structures → ArgsCLI converts to CLI args. The nested parameters in the schema break in Nextflow 26.04 and need to be flattened.

@pinin4fjords
Copy link
Member

Note: I'm on leave after today, so I'm handing this back to other community members for further discussion.

@quentinblampey
Copy link
Collaborator

1.
Okay, I see. Basically, this approach is equivalent to (i) flattening every parameter and (ii) then reconstructing the params as a nested Map? It feels counterintuitive to flatten a map to then construct the original map. We kind of return to my original idea, but add a lot of extra logic to perform the flattening and the "re-nesting".

2.
But, in that case, it means we can't update some kwargs from a profile? We wanted to have the possibility to add some kwargs depending on the profile, but if we use ext.args only in the modules, how can we be profile-specific? Sorry for the naive question, I'm not sure to follow...

Anyway, even if we did that, I'm not sure to see how it would work, because (as far as I understand), I can't import ArgsCLI in my config files, no? I tried and got:

Unexpected input: 'from'

Or maybe it's a more recent nextflow feature?

@quentinblampey
Copy link
Collaborator

Hi,
Just a follow-up regarding the above discussion: does anyone have an idea on how to solve the flattening issue (see my last comment/questions)?

@quentinblampey
Copy link
Collaborator

I think I have found some solutions for a few small blockers on my side. The last remaining question (and then I'll be able to start flattening everything) is: regarding the part 2 of this comment, do you confirm that we can't import this buildSegmentationConfig function inside the conf/modules.config? If so, where should we move this logic?

@quentinblampey
Copy link
Collaborator

quentinblampey commented Dec 12, 2025

I started to do the flattening, and I fell into new challenges. For one specific method of annotation (more specifically, the fluorescence method), we need a map whose keys are not always the same.

How could I flatten this, knowing that we never have the same panel of channel/protein names?

params {
    ...

    annotation = [
        method: "fluorescence",
        args: [
            marker_cell_dict: [
                CK: "Tumoral cell", // see CK here, or CD3, CD20, etc...
                CD3: "T cell",
                CD20: "B cell",
            ]
        ],
    ]
}

Another question: in the nextflow_schema, we define groups of parameters, but can we access these groups later on? For instance, I made a group named Explorer, and I would like to extract all keys of this group during runtime, if possible. It's not crucial, but it would avoid the need to list every single parameter that needs to be "nested back into a map" before sending it to the CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants