-
Notifications
You must be signed in to change notification settings - Fork 33
Gecco convert #506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gecco convert #506
Conversation
Release PR for 3.0.0
jfy133
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good first PR @SkyLexS , well done
The code itself looks good, however I've left comments to improve a couple of places.
You will also need to update:
CHANGELOG.md: describe you changes for the next release's announcementoutput.md: to describe the differnet output files we should see in thegecco/results directory (presumable)README.md: to add yourself as a contributor
nextflow_schema.json
Outdated
| "type": "boolean", | ||
| "default": false, | ||
| "description": "Enable conversion of contig FASTA files to GECCO-compatible format.", | ||
| "help_text": "GECCO requires contig FASTA files to be in a specific format. Enabling this option will convert the input FASTA files to the required format before running GECCO.\n\n> Modifies tool parameter(s):\n> - gecco convert: `--enable`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I misunderstand #450 , this phrasing is confusing to me.
I thought (and the code says) you run GECCO_CONVERT after GECCO_RUN to convert the files to other formats for downstream processes, not to convert input files into one that GECCO accepts.
Could you please reprhase or re-explain to me the purposes of the GECCO_CONVERT module (I'm not that familiar1)
nextflow_schema.json
Outdated
| "type": "string", | ||
| "default": "clusters", | ||
| "description": "Mode for converting contig FASTA files for GECCO.", | ||
| "help_text": "Select the mode for converting contig FASTA files. 'default' uses standard conversion settings, while 'custom' allows for user-defined parameters.\n\n> Modifies tool parameter(s):\n> - gecco convert: `--mode`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does custom mean here though - where does a user specify the additional paramaeters?
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.4.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
jfy133
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very close to being done @SkyLexS !
May seem like lots of comments but it's mostly replacing the parameter names to make sure they follow the same structure as the other pipeline parameters :)
docs/output.md
Outdated
| [deepBGC](https://github.com/Merck/deepbgc) detects BGCs in bacterial and fungal genomes using deep learning. DeepBGC employs a Bidirectional Long Short-Term Memory Recurrent Neural Network and a word2vec-like vector embedding of Pfam protein domains. Product class and activity of detected BGCs is predicted using a Random Forest classifier. | ||
|
|
||
| #### GECCO | ||
| #### GECCO & GECCO CONVERT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextflow.config
Outdated
| run_gecco_convert = false | ||
| gecco_convert_mode = 'clusters' | ||
| gecco_convert_format = 'gff' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| run_gecco_convert = false | |
| gecco_convert_mode = 'clusters' | |
| gecco_convert_format = 'gff' | |
| bgc_gecco_runconvert = false | |
| bgc_gecco_convertmode = 'clusters' | |
| bgc_gecco_convertformat = 'gff' |
For consistency with the rest of the parameter system.
Make sure to update the parameters throughout the code base too
docs/output.md
Outdated
| - `*.region*.gbk`: Converted and aliased GenBank files so that they can be loaded by BiG-SLiCE | ||
| - `*.faa`: Amino-acid FASTA converted GenBank files of all the proteins in a cluster | ||
| - `*.fna`:Nucleotide sequence FASTA converted GenBank files from the cluster | ||
| **ONLY IF --run_gecco_convert** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **ONLY IF --run_gecco_convert** | |
| - **GECCO CONVERT** | |
| - `*.gff`: GFF3 converted cluster tables containing the position and metadata for all the predicted clusters (only if `--bgc_gecco_runconvert`) | |
| - `*.region*.gbk`: Converted and aliased GenBank files so that they can be loaded by BiG-SLiCE (only if `--bgc_gecco_runconvert`) | |
| - `*.faa`: Amino-acid FASTA converted GenBank files of all the proteins in a cluster (only if `--bgc_gecco_runconvert`) | |
| - `*.fna`:Nucleotide sequence FASTA converted GenBank files from the cluster (only if `--bgc_gecco_runconvert`) |
For the change of parameter name, see comment below.
docs/output.md
Outdated
|
|
||
| [GECCO CONVERT] (https://gecco.embl.de) is an option in gecco which does file conversion into formats like GFF3, GenBank, or FASTA for further analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [GECCO CONVERT] (https://gecco.embl.de) is an option in gecco which does file conversion into formats like GFF3, GenBank, or FASTA for further analysis. | |
| The additional GFF3, GenBank, or FASTA files from `--bgc_gecco_runconvert`, can be useful for additional further analysis of the BGC hits. |
CHANGELOG.md
Outdated
| ### `Added` | ||
|
|
||
| - [#500](https://github.com/nf-core/funcscan/pull/500) Updated pipeline template to nf-core/tools version 3.4.1 (by @jfy133) | ||
| - [#506](https://github.com/nf-core/funcscan/pull/506) Gecco convert (by @SkyLexS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextflow_schema.json
Outdated
| "description": "Enable GECCO file conversion.", | ||
| "help_text": "Converts GECCO output into formats like GFF3, GenBank, or FASTA for further analysis. Modifies tool parameter - gecco convert: `--run_gecco_convert true`" | ||
| }, | ||
| "gecco_convert_mode": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "gecco_convert_mode": { | |
| "bgc_gecco_convertmode": { |
nextflow_schema.json
Outdated
| "description": "Select conversion mode.", | ||
| "help_text": "Either clusters or gbk folder output, depending on what is reformatted. Modifies tool parameter: - gecco convert: `--gecco_convert_mode `" | ||
| }, | ||
| "gecco_convert_format": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "gecco_convert_format": { | |
| "bgc_gecco_convertformat": { |
nextflow_schema.json
Outdated
| "gecco_convert_mode": { | ||
| "type": "string", | ||
| "default": "clusters", | ||
| "description": "Select conversion mode.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "description": "Select conversion mode.", | |
| "description": "Specify conversion mode for GECCO convert.", |
nextflow_schema.json
Outdated
| "gecco_convert_format": { | ||
| "type": "string", | ||
| "default": "gff", | ||
| "description": "Set output format.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "description": "Set output format.", | |
| "description": "Specify output format for GECCO convert.", |
nextflow_schema.json
Outdated
| "type": "string", | ||
| "default": "gff", | ||
| "description": "Set output format.", | ||
| "help_text": "Choose output format: 'gff', 'fasta', or 'genbank'. Modifies tool parameter: - gecco convert: `--gecco_convert_format`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment from the first review has not been addressed yet #506 (comment)
If you need more guidance just ask on Slack :)
|
@SkyLexS what's the status on this, it it ready for re-review? I see one of the commetns on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last thing missing now is including gecco convert in one of the test profiles, and have the corresponding tests/* nf-test and snapshot files updated accordingly!
And then I think we are ready!
nextflow_schema.json
Outdated
| "fa_icon": "fas fa-angle-double-right" | ||
| }, | ||
|
|
||
| "bgc_gecco_convert": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this in the first review, these three should go within the exsiting bgc_gecco section, not in it's own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SkyLexS ! This is looking very good now, however apparently my crappy couple of months work wise meant I've not been doing a very good at doing comprehensive reviews - which I feel very guilty about. Sorry about this.
So you don't have to go through another rigamole of reviews, I'm just going to make the changes directly myself so you can see what else was missing (mostly related to the tests).
So what you've done so far is high quality I just forgot a few extra things mostly nf-core specific that also need to be added (but this isn't your fault).
Self notes:
Missing explicit capture of output files in Didn't needmodules.config
In addition, while you have updated the test_*.configs, you've not yet updated the nf-test assertions (in tests/*.nf.test) and then run nf-test test --tag <test_name> --profile +docker --update-snapshots to update snapshots themselves
…k of invalid combinations
…gecco_convert_dis
jfy133
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in my previous few commits
test_*config: I had to change the samplesheets used in the BGC tests to actually produce sufficient BGC hits for GECCO convert to actually runutils_nfcore...: I added additional input validation checks so a gecco convert format cannot be requested with the wrong modebgc.nf: Added the versions mixing*.nf.test*.snapAdded new assertions to the test files and updated their snapshots accordinglynextflow_schema.json: improved the descriptions, and used the correct enums
Please review my changes @SkyLexS and @amizeranschi , and if you're happy I think we are NOW ready to merge 🎉 :D
|
Ok, I think I will need to update the bakta module. It was working locally at least and given it's failing on database downloading is not caused by the changes in this PR, so will merge. Thanks for all the hard work @SkyLexS /@Gbs08407 /@amizeranschi ! |
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).