Skip to content

Conversation

@SkyLexS
Copy link

@SkyLexS SkyLexS commented Nov 11, 2025

PR checklist

  • Integration of the gecco_convert module into the funcscan pipeline as requested in Enable GECCO optional features for integration with other tools #450
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/funcscan branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@SkyLexS SkyLexS self-assigned this Nov 11, 2025
@SkyLexS SkyLexS requested a review from jfy133 as a code owner November 11, 2025 12:47
@SkyLexS SkyLexS added the enhancement Improvement for existing functionality label Nov 11, 2025
@SkyLexS SkyLexS added the first-timers-only Good for newcomers label Nov 11, 2025
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise
Very good first PR @SkyLexS , well done

The code itself looks good, however I've left comments to improve a couple of places.

You will also need to update:

  • CHANGELOG.md: describe you changes for the next release's announcement
  • output.md : to describe the differnet output files we should see in the gecco/ results directory (presumable)
  • README.md: to add yourself as a contributor

"type": "boolean",
"default": false,
"description": "Enable conversion of contig FASTA files to GECCO-compatible format.",
"help_text": "GECCO requires contig FASTA files to be in a specific format. Enabling this option will convert the input FASTA files to the required format before running GECCO.\n\n> Modifies tool parameter(s):\n> - gecco convert: `--enable`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(blocking)
Unless I misunderstand #450 , this phrasing is confusing to me.

I thought (and the code says) you run GECCO_CONVERT after GECCO_RUN to convert the files to other formats for downstream processes, not to convert input files into one that GECCO accepts.

Could you please reprhase or re-explain to me the purposes of the GECCO_CONVERT module (I'm not that familiar1)

"type": "string",
"default": "clusters",
"description": "Mode for converting contig FASTA files for GECCO.",
"help_text": "Select the mode for converting contig FASTA files. 'default' uses standard conversion settings, while 'custom' allows for user-defined parameters.\n\n> Modifies tool parameter(s):\n> - gecco convert: `--mode`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(if-minor)

If there are different options, please use the pipe-separated list to indicate this (in the nf-core pipelines schema build interface, if you press the gear next to the parameter entry you should see the box to enter it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does custom mean here though - where does a user specify the additional paramaeters?

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.4.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise
Looking very close to being done @SkyLexS !

May seem like lots of comments but it's mostly replacing the parameter names to make sure they follow the same structure as the other pipeline parameters :)

docs/output.md Outdated
[deepBGC](https://github.com/Merck/deepbgc) detects BGCs in bacterial and fungal genomes using deep learning. DeepBGC employs a Bidirectional Long Short-Term Memory Recurrent Neural Network and a word2vec-like vector embedding of Pfam protein domains. Product class and activity of detected BGCs is predicted using a Random Forest classifier.

#### GECCO
#### GECCO & GECCO CONVERT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion

Suggested change
#### GECCO & GECCO CONVERT
#### GECCO

They are from the same package so should be fine as just GECCO

nextflow.config Outdated
Comment on lines 253 to 255
run_gecco_convert = false
gecco_convert_mode = 'clusters'
gecco_convert_format = 'gff'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo

Suggested change
run_gecco_convert = false
gecco_convert_mode = 'clusters'
gecco_convert_format = 'gff'
bgc_gecco_runconvert = false
bgc_gecco_convertmode = 'clusters'
bgc_gecco_convertformat = 'gff'

For consistency with the rest of the parameter system.

Make sure to update the parameters throughout the code base too

docs/output.md Outdated
- `*.region*.gbk`: Converted and aliased GenBank files so that they can be loaded by BiG-SLiCE
- `*.faa`: Amino-acid FASTA converted GenBank files of all the proteins in a cluster
- `*.fna`:Nucleotide sequence FASTA converted GenBank files from the cluster
**ONLY IF --run_gecco_convert**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion

Suggested change
**ONLY IF --run_gecco_convert**
- **GECCO CONVERT**
- `*.gff`: GFF3 converted cluster tables containing the position and metadata for all the predicted clusters (only if `--bgc_gecco_runconvert`)
- `*.region*.gbk`: Converted and aliased GenBank files so that they can be loaded by BiG-SLiCE (only if `--bgc_gecco_runconvert`)
- `*.faa`: Amino-acid FASTA converted GenBank files of all the proteins in a cluster (only if `--bgc_gecco_runconvert`)
- `*.fna`:Nucleotide sequence FASTA converted GenBank files from the cluster (only if `--bgc_gecco_runconvert`)

For the change of parameter name, see comment below.

docs/output.md Outdated
Comment on lines 475 to 476

[GECCO CONVERT] (https://gecco.embl.de) is an option in gecco which does file conversion into formats like GFF3, GenBank, or FASTA for further analysis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[GECCO CONVERT] (https://gecco.embl.de) is an option in gecco which does file conversion into formats like GFF3, GenBank, or FASTA for further analysis.
The additional GFF3, GenBank, or FASTA files from `--bgc_gecco_runconvert`, can be useful for additional further analysis of the BGC hits.

CHANGELOG.md Outdated
### `Added`

- [#500](https://github.com/nf-core/funcscan/pull/500) Updated pipeline template to nf-core/tools version 3.4.1 (by @jfy133)
- [#506](https://github.com/nf-core/funcscan/pull/506) Gecco convert (by @SkyLexS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick

Suggested change
- [#506](https://github.com/nf-core/funcscan/pull/506) Gecco convert (by @SkyLexS)
- [#506](https://github.com/nf-core/funcscan/pull/506) Added support GECCO convert for generation of additional files useful for downstream analysis (by @SkyLexS)

"description": "Enable GECCO file conversion.",
"help_text": "Converts GECCO output into formats like GFF3, GenBank, or FASTA for further analysis. Modifies tool parameter - gecco convert: `--run_gecco_convert true`"
},
"gecco_convert_mode": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"gecco_convert_mode": {
"bgc_gecco_convertmode": {

"description": "Select conversion mode.",
"help_text": "Either clusters or gbk folder output, depending on what is reformatted. Modifies tool parameter: - gecco convert: `--gecco_convert_mode `"
},
"gecco_convert_format": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"gecco_convert_format": {
"bgc_gecco_convertformat": {

"gecco_convert_mode": {
"type": "string",
"default": "clusters",
"description": "Select conversion mode.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Select conversion mode.",
"description": "Specify conversion mode for GECCO convert.",

"gecco_convert_format": {
"type": "string",
"default": "gff",
"description": "Set output format.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Set output format.",
"description": "Specify output format for GECCO convert.",

"type": "string",
"default": "gff",
"description": "Set output format.",
"help_text": "Choose output format: 'gff', 'fasta', or 'genbank'. Modifies tool parameter: - gecco convert: `--gecco_convert_format`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment from the first review has not been addressed yet #506 (comment)

If you need more guidance just ask on Slack :)

@jfy133
Copy link
Member

jfy133 commented Dec 1, 2025

@SkyLexS what's the status on this, it it ready for re-review? I see one of the commetns on output.md wasn't addressed yet

@SkyLexS SkyLexS requested a review from jfy133 December 1, 2025 18:06
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last thing missing now is including gecco convert in one of the test profiles, and have the corresponding tests/* nf-test and snapshot files updated accordingly!

And then I think we are ready!

"fa_icon": "fas fa-angle-double-right"
},

"bgc_gecco_convert": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this in the first review, these three should go within the exsiting bgc_gecco section, not in it's own

@amizeranschi amizeranschi requested a review from jfy133 December 8, 2025 17:43
@jfy133 jfy133 linked an issue Dec 9, 2025 that may be closed by this pull request
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SkyLexS ! This is looking very good now, however apparently my crappy couple of months work wise meant I've not been doing a very good at doing comprehensive reviews - which I feel very guilty about. Sorry about this.

So you don't have to go through another rigamole of reviews, I'm just going to make the changes directly myself so you can see what else was missing (mostly related to the tests).

So what you've done so far is high quality I just forgot a few extra things mostly nf-core specific that also need to be added (but this isn't your fault).

Self notes:

Missing explicit capture of output files in modules.config Didn't need

In addition, while you have updated the test_*.configs, you've not yet updated the nf-test assertions (in tests/*.nf.test) and then run nf-test test --tag <test_name> --profile +docker --update-snapshots to update snapshots themselves

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in my previous few commits

  • test_*config: I had to change the samplesheets used in the BGC tests to actually produce sufficient BGC hits for GECCO convert to actually run
  • utils_nfcore...: I added additional input validation checks so a gecco convert format cannot be requested with the wrong mode
  • bgc.nf: Added the versions mixing
  • *.nf.test *.snap Added new assertions to the test files and updated their snapshots accordingly
  • nextflow_schema.json: improved the descriptions, and used the correct enums

Please review my changes @SkyLexS and @amizeranschi , and if you're happy I think we are NOW ready to merge 🎉 :D

@jfy133
Copy link
Member

jfy133 commented Dec 21, 2025

Ok, I think I will need to update the bakta module. It was working locally at least and given it's failing on database downloading is not caused by the changes in this PR, so will merge.

Thanks for all the hard work @SkyLexS /@Gbs08407 /@amizeranschi !

@jfy133 jfy133 merged commit b0c0f23 into nf-core:dev Dec 21, 2025
121 of 135 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement for existing functionality first-timers-only Good for newcomers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable GECCO optional features for integration with other tools

5 participants