-
Notifications
You must be signed in to change notification settings - Fork 1
[Do not merge!] Pseudo PR for first release #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: TEMPLATE
Are you sure you want to change the base?
Conversation
|
And I had another longer discussion with other core members about nested params. I see the usefulness in your setup, but given that they are not an officially supported feature of nextflow and currently will stop working with the strict syntax in nextflow 26.04 I think you need to rewrite them to a flat list and organize them via parameter groups. |
|
What do you mean by "organize them via parameter groups" @mashehu? Another important feature of this nested params is that we can pass kwargs to every reader. Since we support about 10 different technology, it is convenient to be able to pass any technology-specific keyword argument if needed. It shouldn't always be used, but in certain scenarios, it can be useful. If I want to flatten this, I'll then need to (i) write every kwarg in the schema (potentially 30+ additional params) and (ii) make sure I update the schema and the pipeline every time there is a new kwarg in the |
|
with "parameter groups" I meant how we group them for documentation (not for any pipeline logic) as top-level objects in Your kwargs sound like what we use |
Okay I see
Yes I guess it could work, but it also adds some complexity for both the user (kwargs would live in another place than the other args) and the developers (I'll need to merge the |
|
not sure what you mean with "'ll need to merge the ext.args with the params before formatting them to pass them to the Sopa CLI". we usually append a |
|
Yes, sorry I was talking about the way I provide kwargs to the Sopa command line. It's really specific to Sopa, and not to nextflow. |
|
you mean how you do it with your |
|
Yes exactly, this |
|
you can do this in a config file instead, e.g.: |
Use lists instead to form args. It's more maintainable. |
|
This would indeed work, but to become exhaustive, it becomes more complex:
More generally, Sopa itself is already heavily tested, and it depends on SpatialData, which is also heavily tested. By directly providing the parameters, we kind of "trust" the underlying methods, and add on top additional nf-core tests (currently, 5 test configs). If we flatten everything, we introduce potential sources of error. Given the complexity of flattening everything, I'm fairly certain it will create many errors and increase maintenance complexity. Honestly, I would really appreciate if we could find a solution without flattening everything, else I don't know how I would be able to maintain all this 😊 |
|
@quentinblampey - I understand your concerns, but the nested parameter approach isn't compatible with Nextflow 26.04's strict syntax enforcement. The standard nf-core pattern (prefixed flat parameters + ext.args) actually addresses your points: Re: 100+ parameters and maintenance: You don't need to expose every parameter in the schema. Expose only the common ones with prefixed params, then add extra_baysor_args, extra_cellpose_args, etc. for power users. See nf-core/rnaseq's extra_*_args pattern. Or, power users can also override ext.args entirely using withName selectors in their own configs. When tools update, users can immediately use new parameters without you changing anything: Re: parameter name conflicts: Prefixes solve this. baysor_scale, cellpose_scale, stardist_scale. See differentialabundance's prefixed parameters (deseq2_, limma_, gsea_*). Re: delegating validation to Sopa: You can still do this. Build JSON in ext.args and pass it through, appending extra_*_args to preserve the pattern: Your Sopa library still validates everything, you get clean error messages, and the schema stays flat and compliant. The extra_*_args pattern means you're not locked into maintaining every possible parameter. |
|
Hi @pinin4fjords, thanks for your answers and for your understanding, I appreciate! I have a few questions regarding this: Mix of static and dynamic kwargsFor the Therefore, I could have the following nested params: To flatten them, a user would need to write the complex line below. Question 1: is there a possibility to avoid this complex quote usage (mix of Then, in the kwargs: [
imread_kwargs: [
page: 0,
],
]to then add some sample-specific values: kwargs = [
fullres_image_file: "/path/to/image",
dataset_id: "DATASET_ID",
imread_kwargs: [
page: 0,
],
]and then convert it back to the following string given another conversion function: '''--kwargs "{fullres_image_file: '/path/to/image', dataset_id: 'DATASET_ID', 'imread_kwargs': {'page': 0}"'''Question 2: I think this is not very clean, and I'm unsure how to write such a conversion function in a very robust manner. How could I avoid that with the flattened approach? I could also try to directly update the string itself instead of doing a double-conversion, but I think it would also be relatively complex to have a robust function to handle that. General concern regarding compatibilityUsers can run Sopa via:
Since there are many different ways to run Sopa, I try to make everything as compatible as possible, so that it feels very natural to move from one usage to another. This is one of the promises of Sopa: you can switch easily from one segmentation tool to another, from one technology to another, and from one usage-mode/platform to another. The nested approach allowed to keep the same configuration files in both snakemake and nextflow. By moving to the flattened approach, we break the compatibility promise. Question 3: How to ensure easy adoption of users who want to move to |
|
Re: your three questions: 1. Complex quote usage (your nextflow run nf-core/sopa --visium_hd_imread_page 5 --technology visium_hdThe pipeline handles the JSON complexity internally via withName: 'TO_SPATIALDATA' {
ext.args = {
def kwargs = [imread_kwargs: [page: params.visium_hd_imread_page ?: 0]]
"--technology ${params.technology} --kwargs '${groovy.json.JsonOutput.toJson(kwargs)}'"
}
}For edge cases where users need advanced configuration beyond what you expose, they can override 2. Merging runtime values: Your withName: 'TO_SPATIALDATA' {
ext.args = {
def kwargs = [
imread_kwargs: [page: params.visium_hd_imread_page ?: 0],
file_path: meta.file_path
]
"--technology ${params.technology} --kwargs '${groovy.json.JsonOutput.toJson(kwargs)}'"
}
}3. Cross-platform compatibility: Maintaining Snakemake/Nextflow config compatibility isn't something nf-core supports. Each workflow manager has its own conventions. The nested parameters break in Nextflow 26.04 regardless. You could provide a config converter or clear documentation for users migrating between platforms, but the Nextflow pipeline needs to follow Nextflow/nf-core patterns. |
|
Regarding your answers: 1. Complex quote usage 2. Merging runtime values 3. Cross-platform compatibility NB: sorry for being annoying, I also understand your concerns, I'm just trying to find a solution that is convenient for everyone. |
|
@quentinblampey - I think I understand your concerns better now. Let me address each point with some potential solutions (recognising that there are many ways to skin this particular cat): 1. Complex quote usage / Technology-specific parameters You're concerned about the conditional logic needed when // subworkflows/local/utils_nfcore_sopa_pipeline/main.nf (or similar)
def buildReadConfig(params) {
def config = [technology: params.technology]
// Technology-specific kwargs - all the conditional logic lives here
if (params.technology == 'visium_hd' && params.visium_hd_imread_page != null) {
config.kwargs = [imread_kwargs: [page: params.visium_hd_imread_page]]
}
else if (params.technology == 'cosmx' && params.cosmx_custom_param != null) {
config.kwargs = [custom_param: params.cosmx_custom_param]
}
// ... other technologies
return config
}
def buildSegmentationConfig(params) {
def config = [:]
if (params.cellpose_diameter != null) {
config.cellpose = [
diameter: params.cellpose_diameter,
channels: params.cellpose_channels,
flow_threshold: params.cellpose_flow_threshold,
cellprob_threshold: params.cellpose_cellprob_threshold,
min_area: params.cellpose_min_area
]
}
if (params.baysor_scale != null) {
config.baysor = [
config: [
segmentation: [
scale: params.baysor_scale,
min_molecules_per_cell: params.baysor_min_molecules
]
]
]
}
return config
}
// Your existing ArgsCLI stays here too
def ArgsCLI(Map params, String contains = null, List keys = null) {
// ... your existing implementation
}Yes, this adds conditional logic for the 10 technologies, but it's centralized in one place. Everywhere you currently use 2. Merging runtime values / Where does ArgsCLI go?
// conf/predefined/cosmx_cellpose.config (PROFILE - just sets flat params)
params {
technology = 'cosmx'
cellpose_diameter = 60
cellpose_channels = ['DNA']
cellpose_flow_threshold = 2
cellpose_cellprob_threshold = -6
cellpose_min_area = 2000
}
// conf/modules.config (where ext.args and your ArgsCLI logic goes)
process {
withName: 'TO_SPATIALDATA' {
ext.args = {
def readConfig = buildReadConfig(params)
// Merge runtime values from meta here
if (meta.file_path) {
readConfig.kwargs = readConfig.kwargs ?: [:]
readConfig.kwargs.file_path = meta.file_path
}
ArgsCLI([read: readConfig])
}
}
withName: 'SEGMENTATION_CELLPOSE' {
ext.args = {
def segConfig = buildSegmentationConfig(params)
ArgsCLI([segmentation: segConfig])
}
}
}Your The refactoring needed:
The architecture is: profiles set flat params → adapters rebuild nested structures → ArgsCLI converts to CLI args. The nested parameters in the schema break in Nextflow 26.04 and need to be flattened. |
|
Note: I'm on leave after today, so I'm handing this back to other community members for further discussion. |
|
1. 2. Anyway, even if we did that, I'm not sure to see how it would work, because (as far as I understand), I can't import Or maybe it's a more recent nextflow feature? |
|
Hi, |
Important! Template update for nf-core/tools v3.5.1
|
I think I have found some solutions for a few small blockers on my side. The last remaining question (and then I'll be able to start flattening everything) is: regarding the part 2 of this comment, do you confirm that we can't import this |
|
I started to do the flattening, and I fell into new challenges. For one specific method of annotation (more specifically, the How could I flatten this, knowing that we never have the same panel of channel/protein names? params {
...
annotation = [
method: "fluorescence",
args: [
marker_cell_dict: [
CK: "Tumoral cell", // see CK here, or CD3, CD20, etc...
CD3: "T cell",
CD20: "B cell",
]
],
]
}Another question: in the nextflow_schema, we define groups of parameters, but can we access these groups later on? For instance, I made a group named |
Do not merge! This is a PR of dev compared to the TEMPLATE branch for whole-pipeline reviewing purposes. Changes should be made to dev and this PR should not be merged! The actual release PR is at