-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrappers for ppanggolin all and ppanggolin msa #6645
wrappers for ppanggolin all and ppanggolin msa #6645
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful. Added a few comments.
tools/ppanggolin/.shed.yml
Outdated
homepage_url: https://ppanggolin.readthedocs.io/en/latest/ | ||
long_description: | | ||
Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors. | ||
remote_repository_url: https://github.com/labgem/PPanGGOLiN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should point to the IUC repo. Use this one for homepage_url
. The readthedocs URL might be cool to mention in the tool's help section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_repository_url: https://github.com/labgem/PPanGGOLiN | |
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/ppanggolin |
Thanks @bernt-matthias , I implemented your suggestions, does it look good to you ? I couldn't get the validator to work for checking the consistency of the input files format, I didn't find a way to loop through multiple datasets or dataset collection and didn't find an example of such a validator. I implemented this check in the section by raising an exception. Best, Thomas |
tools/ppanggolin/ppanggolin_all.xml
Outdated
#set extension_input_files = $file.ext | ||
#else: | ||
#if $file.ext != $extension_input_files: | ||
#raise Exception("All the genome files must be of similar format, either all genbank files or all fasta files.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if raising an exception is a good idea. Have you tried what happens in such a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools/ppanggolin/ppanggolin_all.xml
Outdated
#raise Exception("All the genome files must be of similar format, either all genbank files or all fasta files.") | ||
#end if | ||
#end if | ||
#set base_name = str($file.element_identifier).split('.')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if users have .
in the identifiers. Maybe ".".join(str($file.element_identifier).split('.')[:-1])
Hi @bernt-matthias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay and thanks for the ping. I gave the tool a final look and have a few minor suggestions and requests.
tools/ppanggolin/ppanggolin_all.xml
Outdated
#import os | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#import os | |
tools/ppanggolin/ppanggolin_all.xml
Outdated
mkdir -p "./tmp_ppanggolin/organism_list" && | ||
mkdir -p "./tmp_ppanggolin/ln_input_genomes" && | ||
|
||
echo -n > "./tmp_ppanggolin/organism_list/organism.list" && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo -n > "./tmp_ppanggolin/organism_list/organism.list" && | |
touch "./tmp_ppanggolin/organism_list/organism.list" && |
tools/ppanggolin/ppanggolin_all.xml
Outdated
#set extension_input_files = $file.ext | ||
#else: | ||
#if $file.ext != $extension_input_files: | ||
#raise Exception("All the genome files must be of similar format, either all genbank files or all fasta files.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#raise Exception("All the genome files must be of similar format, either all genbank files or all fasta files.") | |
#raise Exception("All the genome files must be of the same datatype, either all genbank files or all fasta files.") |
tools/ppanggolin/ppanggolin_all.xml
Outdated
#set base_name = ".".join(str($file.element_identifier).split('.')[:-1]) | ||
echo -e '${base_name}\t${file}' >> "./tmp_ppanggolin/organism_list/organism.list" && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just assume that the user prepared the collection such that the element_identifiers
are the desired sample names and do not make any changes. On collection creation users can choose to remove extensions.
#set base_name = ".".join(str($file.element_identifier).split('.')[:-1]) | |
echo -e '${base_name}\t${file}' >> "./tmp_ppanggolin/organism_list/organism.list" && | |
echo -e '${file.element_identifier}\t${file}' >> "./tmp_ppanggolin/organism_list/organism.list" && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bernt-matthias , I implemented all your recommendations except this one. This change uncovered a problem with PPanGGOLiN that happens for files that have a space in their name or for compressed files using Galaxy as the suffix " uncompressed" (the first char is a space) is sometimes added when choosing gzip files as input. My solution was to use :
#set base_name = str($file.element_identifier).replace(" ", "_")
echo -e '${base_name}\t${file}' >> "./tmp_ppanggolin/organism_list/organism.list" &&
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine. Maybe document this in the parameter/tool help.
tools/ppanggolin/ppanggolin_all.xml
Outdated
<section name="advanced_pangenome_output_files" title="Advanced pangenome output files" expanded="False"> | ||
<param name="advanced_pangenome_optional_files" type="select" label="Add the following pangenome output files in the Galaxy history" multiple="true" optional="true" display="checkboxes" > | ||
<!-- Basic files --> | ||
<option value="output_gene_presence_absence" selected="true">Gene presence absence</option> | ||
<option value="output_gene_families" selected="true">Gene families</option> | ||
<option value="output_mean_persistent_duplication" selected="true">Mean persistent duplication</option> | ||
<option value="output_matrix" selected="true">Matrix</option> | ||
<!-- plot files --> | ||
<option value="output_Ushaped_plot" selected="true">Ushaped plot</option> | ||
<option value="output_tile_plot" selected="true">Tile plot</option> | ||
<!-- graph files --> | ||
<option value="output_pangenomeGraph_light_gexf" selected="true">PanGenome Graph Light (Gexf)</option> | ||
<option value="output_pangenomeGraph_gexf" selected="true">PanGenome Graph (Gexf)</option> | ||
<option value="output_pangenomeGraph_json" selected="true">PanGenome Graph (Json)</option> | ||
<!-- advanced files --> | ||
<option value="output_summarize_spots" selected="true">Summarize spots</option> | ||
<option value="output_spots" selected="true">Spots</option> | ||
<option value="output_spot_borders" selected="true">Spot borders</option> | ||
<option value="output_border_protein_genes" selected="true">Border protein genes</option> | ||
<option value="output_modules_summary" selected="true">Modules summary</option> | ||
<option value="output_modules_in_genomes" selected="true">Modules in genomes</option> | ||
<option value="output_modules_spots" selected="true">Modules spots</option> | ||
<option value="output_modules_RGP_lists" selected="true">Modules RGP lists</option> | ||
<option value="output_functional_modules" selected="true">Functional modules</option> | ||
</param> | ||
</section> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<section name="advanced_pangenome_output_files" title="Advanced pangenome output files" expanded="False"> | |
<param name="advanced_pangenome_optional_files" type="select" label="Add the following pangenome output files in the Galaxy history" multiple="true" optional="true" display="checkboxes" > | |
<!-- Basic files --> | |
<option value="output_gene_presence_absence" selected="true">Gene presence absence</option> | |
<option value="output_gene_families" selected="true">Gene families</option> | |
<option value="output_mean_persistent_duplication" selected="true">Mean persistent duplication</option> | |
<option value="output_matrix" selected="true">Matrix</option> | |
<!-- plot files --> | |
<option value="output_Ushaped_plot" selected="true">Ushaped plot</option> | |
<option value="output_tile_plot" selected="true">Tile plot</option> | |
<!-- graph files --> | |
<option value="output_pangenomeGraph_light_gexf" selected="true">PanGenome Graph Light (Gexf)</option> | |
<option value="output_pangenomeGraph_gexf" selected="true">PanGenome Graph (Gexf)</option> | |
<option value="output_pangenomeGraph_json" selected="true">PanGenome Graph (Json)</option> | |
<!-- advanced files --> | |
<option value="output_summarize_spots" selected="true">Summarize spots</option> | |
<option value="output_spots" selected="true">Spots</option> | |
<option value="output_spot_borders" selected="true">Spot borders</option> | |
<option value="output_border_protein_genes" selected="true">Border protein genes</option> | |
<option value="output_modules_summary" selected="true">Modules summary</option> | |
<option value="output_modules_in_genomes" selected="true">Modules in genomes</option> | |
<option value="output_modules_spots" selected="true">Modules spots</option> | |
<option value="output_modules_RGP_lists" selected="true">Modules RGP lists</option> | |
<option value="output_functional_modules" selected="true">Functional modules</option> | |
</param> | |
</section> | |
<param name="advanced_pangenome_optional_files" type="select" label="Add the following pangenome output files in the Galaxy history" multiple="true" optional="true" display="checkboxes" > | |
<!-- Basic files --> | |
<option value="output_gene_presence_absence" selected="true">Gene presence absence</option> | |
<option value="output_gene_families" selected="true">Gene families</option> | |
<option value="output_mean_persistent_duplication" selected="true">Mean persistent duplication</option> | |
<option value="output_matrix" selected="true">Matrix</option> | |
<!-- plot files --> | |
<option value="output_Ushaped_plot" selected="true">Ushaped plot</option> | |
<option value="output_tile_plot" selected="true">Tile plot</option> | |
<!-- graph files --> | |
<option value="output_pangenomeGraph_light_gexf" selected="true">PanGenome Graph Light (Gexf)</option> | |
<option value="output_pangenomeGraph_gexf" selected="true">PanGenome Graph (Gexf)</option> | |
<option value="output_pangenomeGraph_json" selected="true">PanGenome Graph (Json)</option> | |
<!-- advanced files --> | |
<option value="output_summarize_spots" selected="true">Summarize spots</option> | |
<option value="output_spots" selected="true">Spots</option> | |
<option value="output_spot_borders" selected="true">Spot borders</option> | |
<option value="output_border_protein_genes" selected="true">Border protein genes</option> | |
<option value="output_modules_summary" selected="true">Modules summary</option> | |
<option value="output_modules_in_genomes" selected="true">Modules in genomes</option> | |
<option value="output_modules_spots" selected="true">Modules spots</option> | |
<option value="output_modules_RGP_lists" selected="true">Modules RGP lists</option> | |
<option value="output_functional_modules" selected="true">Functional modules</option> | |
</param> |
Du you think that the typical user really needs all outputs (all seem to be selected). If yes, fine for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed the topic of the many output files with the creator of PPanGGOLiN and he thinks it is better to leave the default output files are they are now.
tools/ppanggolin/ppanggolin_msa.xml
Outdated
<tool id="ppanggolin_msa" name="PPanGGOLiN msa" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.0"> | ||
<description>computes msa of pangenome's gene families</description> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we capitalize MSA in the name and description.
@@ -0,0 +1,11 @@ | |||
name: ppanggolin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the shed.yml file to a suite? Like here: https://github.com/galaxyproject/tools-iuc/blob/main/tools/ampvis2/.shed.yml
Hi @bernt-matthias , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go?
yes, good to go :) |
FOR CONTRIBUTOR: