Releases: jolespin/veba
Releases · jolespin/veba
VEBA_v1.3.0
Release v1.3.0:
-
VEBA
Modules:- Added
profile-pathway.py
module and associated scripts for buildingHUMAnN
databases from de novo genomes and annotations. Essentially, a reads-based functional profiling method viaHUMAnN
using binned genomes as the database. - Added
marker_gene_clustering.py
script which identifies core marker proteins that are present in all genomes within a genome cluster (i.e., pangenome) and unique to only that genome cluster. Clusters in either protein or nucleotide space. - Added
module_completion_ratios.py
script which calculates KEGG module completion ratios for genomes and pangenomes. Automatically run in backend ofannotate.py
. - Updated
annotate.py
andmerge_annotations.py
to provide better annotations for clustered proteins. - Added
merge_genome_quality.py
andmerge_taxonomy_classifications.py
which compiles genome quality and taxonomy, respectively, for all organisms. - Added BGC clustering in protein and nucleotide space to
biosynthetic.py
. Also, produces prevalence tables that can be used for further clustering of BGCs. - Added
pangenome_core_sequences
incluster.py
writes both protein and CDS sequences for each genome cluster. - Added PDF visualization of newick trees in
phylogeny.py
.
- Added
-
VEBA
Database (VDB_v5.2
):- Added
CAZy
- Added
MicrobeAnnotator-KEGG
- Added
**Release v1.3.0 Details**
- Update
annotate.py
andmerge_annotations.py
to handleCAZy
. They also properly address clustered protein annotations now. - Added
module_completion_ratio.py
script which is a fork ofMicrobeAnnotator
ko_mapper.py
. Also included a database Zenodo: 10020074 which will be included inVDB_v5.2
- Added a checkpoint for
tRNAscan-SE
inbinning-prokaryotic.py
andeukaryotic_gene_modeling_wrapper.py
. - Added
profile-pathway.py
module andVEBA-profile_env
environments which is a wrapper aroundHUMAnN
for the custom database created fromannotate.py
andcompile_custom_humann_database_from_annotations.py
- Added
GenoPype version
to log output - Added
merge_genome_quality.py
which combinesCheckV
,CheckM2
, andBUSCO
results. - Added
compile_custom_humann_database_from_annotations.py
which compiles aHUMAnN
protein database table from the output ofannotate.py
and taxonomy classifications. - Added functionality to
merge_taxonomy_classifications.py
to allow for--no_domain
and--no_header
which will serve as input tocompile_custom_humann_database_from_annotations.py
- Added
marker_gene_clustering.py
script which gets core marker genes unique to each SLC (i.e., pangenome).average_number_of_copies_per_genome
to protein clusters. - Added
--minimum_core_prevalence
inglobal_clustering.py
,local_clustering.py
, andcluster.py
which indicates prevalence ratio of protein clusters in a SLC will be considered core. Also remove--no_singletons
fromcluster.py
to avoid complications with marker genes. Relabeled--input
to--genomes_table
in clustering scripts/module. - Added a check in
coverage.py
to see if themapped.sorted.bam
files are created, if they are then skip them. Not yet implemented for GNU parallel option. - Changed default representative sequence format from table to fasta for
mmseqs2_wrapper.py
. - Added
--nucleotide_fasta_output
toantismash_genbank_to_table.py
which outputs the actual BGC DNA sequence. Changed--fasta_output
to--protein_fasta_output
and added output tobiosynthetic.py
. Changed BGC component identifiers to[bgc_id]_[position_in_bgc]|[start]:[end]([strand])
to match withMetaEuk
identifiers. Changedbgc_type
toprotocluster_type
.biosynthetic.py
now supports GFF files fromMetaEuk
(exon and gene features not supported byantiSMASH
). Fixed error related toantiSMASH
adding CDS (i.e.,allorf_[start]_[end]
) that are not in GFF soantismash_genbank_to_table.py
failed in those cases. - Added
ete3
toVEBA-phylogeny_env.yml
and automatically renders trees to PDF. - Added presets for
MEGAHIT
using the--megahit_preset
option. - The change for using
--mash_db
withGTDB-Tk
violated the assumption that all prokaryotic classifications had amsa_percent
field which caused the cluster-level taxonomy to fail.compile_prokaryotic_genome_cluster_classification_scores_table.py
fixes this by usesfastani_ani
as the weight when genomes were classified using ANI andmsa_percent
for everything else. Initial error caused unclassified prokaryotic for all cluster-level classifications. - Fixed small error where empty gff files with an asterisk in the name were created for samples that didn't have any prokaryotic MAGs.
- Fixed critical error where descriptions in header were not being removed in
eukaryota.scaffolds.list
and did not remove eukaryotic scaffolds inseqkit grep
soDAS_Tool
output eukaryotic MAGs inidentifier_mapping.tsv
and__DASTool_scaffolds2bin.no_eukaryota.txt
- Fixed
krona.html
inbiosynthetic.py
which was being created incorrectly fromcompile_krona.py
script. - Create
pangenome_core_sequences
inglobal_clustering.py
andlocal_clustering.py
which writes both protein and CDS sequences for each SLC. Also made default incluster.py
to NOT do local clustering switching--no_local_clustering
to--local_clustering
. pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
inbiosynthetic.py
whenDiamond
finds multiple regions in one hit that matches. Added--sort_by
and--ascending
toconcatenate_dataframes.py
along with automatic detection and removal of duplicate indices. Also added--sort_by bitscore
inbiosynthetic.py
.- Added core pangenome and singleton hits to clustering output
- Updated
--megahit_memory
default from 0.9 to 0.99 - Fixed error in
genomad_taxonomy_wrapper.py
whereviral_taxonomy.tsv
should have beentaxonomy.tsv
. - Fixed minor error in
assembly.py
that was preventing users from usingSPAdes
programs that were notspades.py
,metaspades.py
, orrnaspades.py
that was the result of using an incorrect string formatting. - Updated
bowtie2
in preprocess, assembly, and mapping modules. Updatedfastp
andfastq_preprocessor
in preprocess module.
VEBA_v1.2.0
Release v1.2.0:
- Fixed minor error in
binning-prokaryotic.py
where the--veba_database
argument wasn't utilized and only the environment variableVEBA_DATABASE
could be used. - Updated the Docker images to have
/volumes/input
,/volumes/output
, and/volumes/database
directories to mount. - Replaced
prodigal
withpyrodigal
as it is faster and under active development. - Added support for missing classifications in
compile_krona.py
andconsensus_genome_classification.py
. - Updated
GTDB-Tk
from version2.1.3
→2.3.0
andGTDB
from versionr202_v2
→r214
. Changed${VEBA_DATABASE}/Classify/GTDBTk
→${VEBA_DATABASE}/Classify/GTDB
. Addedgtdb_r214.msh
toGTDB
database for ANI screening. - Added pangenome and singularity tables to
cluster.py
(and associated global/local clustering scripts) to output automatically. - Added
compile_gff.py
to merge CDS, rRNA, and tRNA GFF files. Used inbinning-prokaryotic.py
andbinning-viral.py
.binning-eukaryotic.py
uses the source of this in the backend offilter_busco_results.py
. Includes GC content for contigs and various tags. - Updated
BUSCO v5.3.2 -> v5.4.3
which changes the json output structure and made the appropriate changes infilter_busco_results.py
. - Added
eukaryotic_gene_modeling_wrapper.py
which 1) splits nuclear, mitochondrial, and plastid genomes; 2) performs gene modeling viaMetaEuk
andPyrodigal
; 3) performs rRNA detection viaBARRNAP
; 4) performs tRNA detection viatRNAscan-SE
; 5) merges processed GFF files; and 5) calculates sequences statistics. - Added
gene_biotype=protein_coding
toP(y)rodigal(-GV)
GFF output. - Added
VFDB
toannotate.py
and database. - Compiled and pushed
gtdb_r214.msh
mash file to Zenodo:8048187 which is now used by default inclassify-prokaryotic.py
. It is now included inVDB_v5.1
. - Cleaned up global and local clustering intermediate files. Added pangenome tables and singelton information to outputs.
VEBA_v1.1.2
Release v1.1.2
- Created Docker images for all modules
- Replaced all absolute path symlinks with relative symlinks
- Changed
prokaryotic_taxonomy.tsv
andprokaryotic_taxonomy.clusters.tsv
inclassify-prokaryotic.py
(along with eukaryotic and viral) files totaxonomy.tsv
andtaxonomy.clusters.tsv
for uniformity. - Updating all symlinks to relative links (also in
fastq_preprocessor
) to prepare for dockerization and updating all environments to use updated GenoPype 2023.4.13. - Changed
nr
touniref
inannotate.py
and addedpropagate_annotations_from_representatives.py
script while simplifyingmerge_annotations_and_taxonomy.py
tomerge_annotations.py
and excluding taxonomy operations. - Changed
nr
toUniRef90
andUniRef50
inVDB_v5
- Changed
orfs_to_orthogroups.tsv
toproteins_to_orthogroups.tsv
for consistency with thecluster.py
module. Will eventually find some consitency withscaffolds_to_bins/scaffolds_to_mags
but this will be later. - Added a
scaffolds_to_mags.tsv
in the clustering output. - Added
convert_counts_table.py
which converts a counts table (and metadata) to Pandas pickle, Anndata h5ad, or Biom hdf5 - Fixed output directory for
mapping.py
which now usesoutput_directory/${NAME}
structure likebinning-*.py
. - Removed "python" prefix for script calls and now uses shebang in script for executable. Also added single paranthesis around script filepath (e.g.,
'[script_filepath]'
) to escape characters/spaces in filepath. - Added support for
index.py
to accept individual--references [file.fasta]
and--gene_models [file.gff]
. - Added
stdin
support forscaffolds_to_bins.py
along with the ability to input genome tables [id_genome][filepath]. Also added progress bars. - As a result of issues/22,
assembly.py
,assembly-sequential.py
,binning-*.py
, andmapping.py
will use-p --countReadPairs
forfeatureCounts
and updatessubread 2.0.1 -> subread 2.0.3
. Forbinning-*.py
, long reads can be used with the--long_reads
flag. - Updated
cluster.py
and associatedglobal_clustering.py
/local_clustering.py
scripts to usemmseqs2_wrapper.py
which now automatically outputs representative sequences. - Added
check_fasta_duplicates.py
script that gives0
and1
exit codes for fasta without and with duplicates, respectively. Addedreformat_representative_sequences.py
to reformat representative sequences fromMMSEQS2
into either a table or fasta file where the identifers are cluster labels. Removed--dbtype
from[global/local]_clustering.py
. Removed appended prefix for.graph.pkl
anddict.pkl
inedgelist_to_clusters.py
. Addedmmseqs2_wrapper.py
andhmmer_wrapper.py
scripts. - Added an option to
merge_generalized_mapping.py
to include the sample index in a filepath and also an option to remove empty features (useful for Salmon). Added anexecutable='/bin/bash'
option to thesubprocess.Popen
calls inGenoPype
to address issues/23. - Added
genbanks/[id_genome]/
to output directory ofbiosynthetic.py
which has symlinks to all the BGC genbanks fromantiSMASH
.
VEBA_v1.1.1
Minor updates from v1.1.0.
- Most important update includes fixing a broken
VEBA-binning-viral.yml
install recipe which had package conflicts for aria2 30e8b0a. - Fixes on conda-related environment variables in the install scripts.
- Added MIBiG to database and
annotate.py
- Added a composite label for annotations in
annotate.py
- Added
--dastool_minimum_score
tobinning-prokaryotic.py
module - Added a wrapper around
STAR
aligner - Updated
merge_generalized_mapping.py
script to take in BAM files instead of being dependent on a specific directory. - Added option to have no header in
subst_table.py
VEBA_v1.1.0
Release v1.1.0
-
Modules:
-
annotate.py
- Added
NCBIfam-AMRFinder
AMR domain annotations - Added
AntiFam
contimination annotations - Uses
taxopy
instead ofete3
in backend withmerge_annotations_and_score_taxonomy.py
- Added
-
assembly.py
- Added a
transcripts_to_genes.py
script which creates agenes_to_transcripts.tsv
table that can be used withTransDecoder
.
- Added a
-
binning-prokaryotic.py
- Updated
CheckM
→CheckM2
. This removes the dependency ofGTDB-Tk
and EXTREMELY REDUCES compute resource requirements (e.g., memory and time) asCheckM2
automatically handles candidate phyla radiation. With this, several backend scripts were deprecated. This cleans up the binning pipeline and error messages SUBSTANTIALLY. - Uses
binning_wrapper.py
for all binning. This makes it easier to add new binning algorithms in the future (e.g.,VAMB
). Also, check out the new multi-split binning functionality described below. - Added
--skip_concoct
in addition to the already existing--skip_maxbin2
option asMaxBin2
takes very long when there's a lot of contigs andCONCOCT
takes a long time when there are a lot of samples (i.e., BAM files).MetaBAT2
is not optional.
- Updated
-
binning-viral.py
- Complete rewrite of this module which now uses
geNomad
as the default binning algorithm but still supportsVirFinder
. - If
VirFinder
is used, thegenomad annotate
is run via thegenomad_taxonomy_wrapper.py
script included in the update. - Updated
Prodigal
→Prodigal-GV
to handle additional viral genetic codes.
- Complete rewrite of this module which now uses
-
biosynthetic.py
- Introduces
component_id
andbgc_id
which are unique, pareseable, and informative. For example,component_id = SRR17458614__CONCOCT__P.2__9|NODE_3319_length_2682_cov_2.840502|region001_1|2-2681(+)
contains the uniquebgc_id
(i.e.,SRR17458614__CONCOCT__P.2__9|NODE_3319_length_2682_cov_2.840502|region001
), shows that it is the 1st gene in the cluster (the_1
inregion001_1
), and the gene start/end/strand. Thebgc_id
is composed of thegenome_id|contig_id|region_id
.
- Introduces
-
classify-prokaryotic.py
- Updated
GTDB-Tk v2.1.1
→GTDB-Tk v2.2.3
. For now,--skip_ani_screen
is the only option because of this thread. However,--mash_db
may be an option in the near future. - Added functionality to classify prokaryotic genomes that were not binned via
VEBA
which is available with the--genomes
option (--prokaryotic_binning_directory
is still available which can leverage existing intermediate files).
- Updated
-
classify-eukaryotic.py
- Added functionality to classify eukaryotic genomes that were not binned via
VEBA
which is available with the--genomes
option (--eukaryotic_binning_directory
is still available which can leverage existing intermediate files). This is implemented by using theeukaryota_odb10
markers from theVEBA Microeukaryotic Database
to substantially improve performance and decrease resources required for gene models.
- Added functionality to classify eukaryotic genomes that were not binned via
-
classify-viral.py
- Complete rewrite of this module which does not rely on (deprecated) intermediate files from
CheckV
. - Uses taxonomy generated from
geNomad
andconsensus_genome_classification_unranked.py
(a wrapper aroundtaxopy
) that can handle the chaotic taxonomy of viruses. - Added functionality to classify viral genomes that were not binned via
VEBA
which is available with the--genomes
option (--viral_binning_directory
is still available which can leverage existing intermediate files).
- Complete rewrite of this module which does not rely on (deprecated) intermediate files from
-
cluster.py
- Complete rewrite of this module which now uses
MMSEQS2
as the orthogroup detection algorithm instead ofOrthoFinder
.OrthoFinder
is overkill for creating protein clusters and it generates thousands of intermediate files (e.g., fasta, alignments, trees, etc.) which substantially increases the compute time.MMSEQS2
has very similar performance with a fraction of the resources and compute time. Clustered the entire Plastisphere dataset on a local machine in ~30 minutes compared to several days on a HPC. - Now that the resources are minimal, clustering is performed at global level as before (i.e., all samples in the dataset) and now at the local level, optionally but ON by default, which clusters all genomes within a sample. Accompanying wrapper scripts are
global_clustering.py
andlocal_clustering.py
. - The genomic and functional feature compression ratios (FCR) (described here]) are now calculated automatically. The calculation is
1 - number_of_clusters/number_of_features
which can easily be converted into an unsupervised biodiversity metric. This is calculated at the global (original implementation) and local levels. - Input is now a table with the following columns:
[organism_type]<tab>[id_sample]<tab>[id_mag]<tab>[genome]<tab>[proteins]
and is generated easily with thecompile_genomes_table.py
script. This allows clustering to be performed for prokaryotes, eukaryotes, and viruses all at the same time. - SLC-specific orthogroups (SSO) are now refered to as SLC-specific protein clusters (SSPC).
- Support zfilling (e.g.,
zfill=3, SLC7 → SLC007
) for genomic and protein clusters. - Deprecated
fastani_to_clusters.py
to now use the more generalizableedgelist_to_clusters.py
which is used for both genomic and protein clusters. This also outputs aNetworkX
graph and a pickled dictionary{"cluster_a":{"component_1", "component_2", ..., "component_n"}}
- Complete rewrite of this module which now uses
-
phylogeny.py
- Updated
MUSCLE
tov5
which has-align
and-super5
algorithms which are now accessible with--alignment_algorithm
. Cannot usestdin
so now the fasta files are not gzipped. Themerge_msa.py
now output uncompressed fasta as default and can output gzipped with the--gzip
flag.
- Updated
-
-
VEBA Database
:VDB_v3.1
→VDB_v4
- Updated
CheckV DB v1.0
→CheckV DB v1.5
- Added
geNomad DB v1.2
- Added
CheckM2 DB
- Removed
CheckM DB
- Removed
taxa.sqlite
andtaxa.sqlite.traverse.pkl
- Added
reference.eukaryota_odb10.list
and correspondingMMSEQS2
database (i.e.,microeukaryotic.eukaryota_odb10
) - Added
NCBIfam-AMRFinder
marker set for annotation - Added
AntiFam
marker set for contamination - Marker sets HMMs are now all gzipped (previously could not gzip because
CheckM
CPR workflow)
- Updated
-
Scripts:
-
Added:
append_geneid_to_transdecoder_gff.py
bowtie2_wrapper.py
compile_genomes_table.py
consensus_genome_classification_unranked.py
cut_table.py
cut_table_by_column_labels.py
drop_missing_values.py
edgelist_to_clusters.py
filter_checkm2_results.py
genomad_taxonomy_wrapper.py
global_clustering.py
local_clustering.py
partition_multisplit_bins.py
scaffolds_to_clusters.py
scaffolds_to_samples.py
transcripts_to_genes.py
transdecoder_wrapper.py
(Note: Requires separate environment to run due to dependency conflicts)
-
Updated:
antismash_genbanks_to_table.py
- Added option to output biosynthetic gene cluster (BGC) fasta. Adds unique (and parseable) BGC identifiers making the output much more useful.binning_wrapper.py
- This binning wrapper now includes functionality to use multi-split binning (i.e., concatenated contigs from different assemblies, map all reads to the contigs, bin all together, and then parition bins by sample). This concept AFAIK was first introduced in theVAMB
paper.compile_reads_table.py
- Minimal change but now the extension excludes the.
to make usage more consistent with other tools.consensus_genome_classification.py
- Changed the output to match that ofconsensus_genome_classification_unranked.py
.filter_checkv_results.py
- Option to use taxonomy and viral summaries generated bygeNomad
.scaffolds_to_bins.py
- Support for getting scaffolds to bins for a list of genomes via--genomes
argument while maintaining original support with--binning_directory
argument.subset_table.py
- Added option to set index column and to drop duplicates.virfinder_wrapper.r
- Used to beVirFinder_wrapper.R
. This now has an option to use FDR values instead of P values.merge_annotations_and_score_taxonomy.py
- Completely rewritten. Usestaxopy
instead ofete3
.merge_msa.py
- Output uncompressed protein fasta files by default and can compress with--gzip
flag.
-
Deprecated:
adjust_genomes_for_cpr.py
filter_checkm_results.py
fastani_to_clusters.py
partition_orthogroups.py
partition_clusters.py
compile_viral_classifications.py
build_taxa_sqlite.py
-
-
Miscellaneous:
- Updated environments and now add versions to environments.
- Added
mamba
to installation to speed up. - Added
transdecoder_wrapper.py
which is a wrapper aroundTransDecoder
with direct support forDiamond
andHMMSearch
homology searches. Also includesappend_geneid_to_transdecoder_gff.py
which is run in the backend to clean up the GFF file and make them compatible with what is output byProdigal
andMetaEuk
runs ofVEBA
. - Added support for using
n_jobs -1
to use all available threads (similar toscikit-learn
methodology).
VEBA_v1.0.4
Release v1.0.4
- Added
biopython
toVEBA-assembly_env
which is needed when runningMEGAHIT
as the scaffolds are rewritten and an error was raised. aea51c3 - Updated Microeukaryotic protein database to exclude a few higher eukaryotes that were present in database, changed naming scheme to hash identifiers (from
cat reference.faa | seqkit fx2tab -s -n > id_to_hash.tsv
). Switching database from FigShare to Zenodo. Uses database versionVDB_v3
which has the updated microeukaryotic protein database (VDB-Microeukaryotic_v2
) 0845ba6
VEBA_v1.0.3e
If you have 1.0.3 ≤ version < 1.0.3e, you can update easily on Patch Fix #1
Release v1.0.3e
- Patch fix for
install_veba.sh
whereinstall/environments/VEBA-assembly_env.yml
raised a compatibilty error when creating theVEBA-assembly_env
environment c2ab957 - Patch fix for
VirFinder_wrapper.R
where__version__ =
variable was throwing an R error when runningbinning-viral.py
module. 19e8f38 - Patch fix for
filter_busco_results.py
where an error arose that produced emptyidentifier_mapping.metaeuk.tsv
subset tables. 359e4569 - Patch fix for
compile_metaeuk_identifiers.py
where a Python error arised when duplicate gene identifiers were present. c248527 - Patch fix for
install_veba.sh
whereinstall/environments/VEBA-preprocess_env.yml
raised a compatibilty error when creating theVEBA-preprocess_env
environment 8ed6eea
- Added
biosynthetic.py
module which runs antiSMASH and converts genbank files to tabular format. 6c0ed82 - Added
megahit
support forassembly.py
module (not yet available inassembly-sequential.py
). 6c0ed82 - Changed
-P/--spades_program
to-P/--program
forassembly.py
. 6c0ed82 - Replaced penultimate step in
binning-prokaryotic.py
to useadjust_genomes_for_cpr.py
instead of the extremely long series of bash commands. This will make it easier to diagnose errors in this critical step. 6c0ed82 - Added support for contig descriptions and added MAG identifier in fasta files in
binning-eukaryotic.py
. Now uses themetaeuk_wrapper.py
script for theMetaEuk
step. 6c0ed82 - Added separate option of
--run_metaplasmidspades
forassembly-sequential.py
instead of making it mandatory (now it just runsbiosyntheticSPAdes
andmetaSPAdes
by default). 6c0ed82 - Added
--use_mag_as_description
inparition_gene_models.py
script to include the MAG identifier in the contig description of the fasta header which is default inbinning-prokaryotic.py
. 6c0ed82 - Added
adjust_genomes_for_cpr.py
script to easier run and understand the CPR adjustment step ofbinning-prokaryotic.py
. 6c0ed82 - Added support for fasta header descriptions in
binning-prokaryotic.py
. 6c0ed82 - Added functionality to
replace_fasta_descriptions.py
script to be able to use a string for replacing fasta headers in addition to the original functionality. 6c0ed82
VEBA_v1.0.2a
Release v1.0.2a
Not to be confused with v1.0.2 which is deprecated
- Updated GTDB-Tk in
VEBA-binning-prokaryotic_env
from1.x
to2.x
(this version uses much less memory): f3507dd - Updated the GTDB-Tk database from
R202
toR207_v2
to be compatible with GTDB-Tk v2.x: f3507dd - Updated the GRCh38 no-alt analysis set to T2T CHM13v2.0 for the default human reference: 5ccb4e2
- Added an experimental
amplicon.py
module for short-read ASV detection via the DADA2 workflow of QIIME2: cd4ed2b - Added additional functionality to
compile_reads_table.py
to handle advanced parsing of samples from fastq directories while also maintaining support for parsing filenames fromveba_output/preprocess
: cd4ed2b - Added
sra-tools
toVEBA-preprocess_env
: f3507dd - Fixed symlinks to scripts for
install_veba.sh
: d1fad03 - Added missing
CHECKM_DATA_PATH
environment variable toVEBA-binning-prokaryotic_env
andVEBA-classify_env
: d1fad03 ⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)
Module Versions:
amplicon.py __version__ = "2022.10.24"
annotate.py __version__ = "2021.7.8"
assembly.py __version__ = "2022.03.25"
binning-eukaryotic.py __version__ = "2022.10.20"
binning-prokaryotic.py __version__ = "2022.10.25"
binning-viral.py __version__ = "2022.7.13"
classify-eukaryotic.py __version__ = "2022.7.8"
classify-prokaryotic.py __version__ = "2022.06.07"
classify-viral.py __version__ = "2022.7.13"
cluster.py __version__ = "2022.10.16"
coverage.py __version__ = "2022.06.03"
index.py __version__ = "2022.02.17"
mapping.py __version__ = "2022.8.17"
phylogeny.py __version__ = "2022.06.22"
preprocess.py __version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py __version__ = "2021.06.19"
scripts/binning_wrapper.py __version__ = "2022.04.11"
scripts/build_taxa_sqlite.py __version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py __version__ = "2021.08.20"
scripts/compile_binning.py __version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py __version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py __version__ = "2022.03.18"
scripts/compile_reads_table.py __version__ = "2022.10.24"
scripts/compile_scaffold_identifiers.py __version__ = "2022.02.23"
scripts/compile_viral_classifications.py __version__ = "2022.03.08"
scripts/concatenate_dataframes.py __version__ = "2022.03.24"
scripts/concatenate_fasta.py __version__ = "2022.02.17"
scripts/concatenate_gff.py __version__ = "2022.02.17"
scripts/consensus_domain_classification.py __version__ = "2022.02.28"
scripts/consensus_genome_classification.py __version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py __version__ = "2022.02.02"
scripts/determine_trim_position.py __version__ = "2022.8.11"
scripts/fasta_to_saf.py __version__ = "2021.04.04"
scripts/fasta_utility.py __version__ = "2021.07.31"
scripts/fastani_to_clusters.py __version__ = "2021.11.16"
scripts/fastq_position_statistics.py __version__ = "2022.10.24"
scripts/filter_busco_results.py __version__ = "2022.04.04"
scripts/filter_checkm_results.py __version__ = "2022.03.28"
scripts/filter_checkv_results.py __version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py __version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py __version__ = "2022.7.14"
scripts/genome_spatial_coverage.py __version__ = "2022.08.17"
scripts/groupby_table.py __version__ = "2022.08.17"
scripts/hmmer_to_proteins.py __version__ = "2021.08.03"
scripts/insert_column_to_table.py __version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py __version__ = "2021.08.25"
scripts/merge_busco_json.py __version__ = "2022.03.10"
scripts/merge_contig_mapping.py __version__ = "2022.06.27"
scripts/merge_fastq_statistics.py __version__ = "2022.03.08"
scripts/merge_gtdbtk.py __version__ = "2022.03.24"
scripts/merge_msa.py __version__ = "2022.06.21"
scripts/merge_orf_mapping.py __version__ = "2021.03.27"
scripts/metaeuk_wrapper.py __version__ = "2022.08.27"
scripts/partition_clusters.py __version__ = "2021.08.12"
scripts/partition_gene_models.py __version__ = "2021.08.24"
scripts/partition_hmmsearch.py __version__ = "2022.06.20"
scripts/partition_multisplit_bins.py __version__ = "2022.04.08"
scripts/partition_orthogroups.py __version__ = "2022.04.01"
scripts/partition_unbinned.py __version__ = "2021.08.05"
scripts/replace_fasta_descriptions.py __version__ = "2022.9.1"
scripts/scaffolds_to_bins.py __version__ = "2021.03.26"
scripts/subset_table.py __version__ = "2022.04.20"
scripts/subset_table_by_column.py __version__ = "2022.04.20"
VEBA_v1.0.1
Small patch fix:
- Fixed the fatal binning-eukaryotic.py error: 7c5addf
- Fixed the minor file naming in cluster.py: 5803845
- Removes left-over human genome tar.gz during database download/config: 5803845
⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)
Module Versions:
annotate.py __version__ = "2021.7.8"
assembly.py __version__ = "2022.03.25"
binning-eukaryotic.py __version__ = "2022.10.20"
binning-prokaryotic.py __version__ = "2022.7.8"
binning-viral.py __version__ = "2022.7.13"
classify-eukaryotic.py __version__ = "2022.7.8"
classify-prokaryotic.py __version__ = "2022.06.07"
classify-viral.py __version__ = "2022.7.13"
cluster.py __version__ = "2022.10.16"
coverage.py __version__ = "2022.06.03"
index.py __version__ = "2022.02.17"
mapping.py __version__ = "2022.8.17"
phylogeny.py __version__ = "2022.06.22"
preprocess.py __version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py __version__ = "2021.06.19"
scripts/binning_wrapper.py __version__ = "2022.04.11"
scripts/build_taxa_sqlite.py __version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py __version__ = "2021.08.20"
scripts/compile_binning.py __version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py __version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py __version__ = "2022.03.18"
scripts/compile_reads_table.py __version__ = "2021.7.18"
scripts/compile_scaffold_identifiers.py __version__ = "2022.02.23"
scripts/compile_viral_classifications.py __version__ = "2022.03.08"
scripts/concatenate_dataframes.py __version__ = "2022.03.24"
scripts/concatenate_fasta.py __version__ = "2022.02.17"
scripts/concatenate_gff.py __version__ = "2022.02.17"
scripts/consensus_domain_classification.py __version__ = "2022.02.28"
scripts/consensus_genome_classification.py __version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py __version__ = "2022.02.02"
scripts/fasta_to_saf.py __version__ = "2021.04.04"
scripts/fasta_utility.py __version__ = "2021.07.31"
scripts/fastani_to_clusters.py __version__ = "2021.06.16"
scripts/filter_busco_results.py __version__ = "2022.04.04"
scripts/filter_checkm_results.py __version__ = "2022.03.28"
scripts/filter_checkv_results.py __version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py __version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py __version__ = "2022.7.14"
scripts/genome_spatial_coverage.py __version__ = "2022.08.17"
scripts/groupby_table.py __version__ = "2022.08.17"
scripts/hmmer_to_proteins.py __version__ = "2021.08.03"
scripts/insert_column_to_table.py __version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py __version__ = "2021.08.25"
scripts/merge_busco_json.py __version__ = "2022.03.10"
scripts/merge_contig_mapping.py __version__ = "2022.06.27"
scripts/merge_fastq_statistics.py __version__ = "2022.03.08"
scripts/merge_gtdbtk.py __version__ = "2022.03.24"
scripts/merge_msa.py __version__ = "2022.06.21"
scripts/merge_orf_mapping.py __version__ = "2021.03.27"
scripts/metaeuk_wrapper.py __version__ = "2022.08.27"
scripts/partition_clusters.py __version__ = "2021.08.12"
scripts/partition_gene_models.py __version__ = "2021.08.24"
scripts/partition_hmmsearch.py __version__ = "2022.06.20"
scripts/partition_multisplit_bins.py __version__ = "2022.04.08"
scripts/partition_orthogroups.py __version__ = "2022.04.01"
scripts/partition_unbinned.py __version__ = "2021.08.05"
scripts/scaffolds_to_bins.py __version__ = "2021.03.26"
scripts/subset_table.py __version__ = "2022.04.20"
scripts/subset_table_by_column.py __version__ = "2022.04.20"
VEBA_v1.0.0
Version released for manuscript submission.
⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)
Module Versions:
annotate.py __version__ = "2021.7.8"
assembly.py __version__ = "2022.03.25"
binning-eukaryotic.py __version__ = "2022.7.8"
binning-prokaryotic.py __version__ = "2022.7.8"
binning-viral.py __version__ = "2022.7.13"
classify-eukaryotic.py __version__ = "2022.7.8"
classify-prokaryotic.py __version__ = "2022.06.07"
classify-viral.py __version__ = "2022.7.13"
cluster.py __version__ = "2022.06.04"
coverage.py __version__ = "2022.06.03"
index.py __version__ = "2022.02.17"
mapping.py __version__ = "2022.8.17"
phylogeny.py __version__ = "2022.06.22"
preprocess.py __version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py __version__ = "2021.06.19"
scripts/binning_wrapper.py __version__ = "2022.04.11"
scripts/build_taxa_sqlite.py __version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py __version__ = "2021.08.20"
scripts/compile_binning.py __version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py __version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py __version__ = "2022.03.18"
scripts/compile_reads_table.py __version__ = "2021.7.18"
scripts/compile_scaffold_identifiers.py __version__ = "2022.02.23"
scripts/compile_viral_classifications.py __version__ = "2022.03.08"
scripts/concatenate_dataframes.py __version__ = "2022.03.24"
scripts/concatenate_fasta.py __version__ = "2022.02.17"
scripts/concatenate_gff.py __version__ = "2022.02.17"
scripts/consensus_domain_classification.py __version__ = "2022.02.28"
scripts/consensus_genome_classification.py __version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py __version__ = "2022.02.02"
scripts/fasta_to_saf.py __version__ = "2021.04.04"
scripts/fasta_utility.py __version__ = "2021.07.31"
scripts/fastani_to_clusters.py __version__ = "2021.06.16"
scripts/filter_busco_results.py __version__ = "2022.04.04"
scripts/filter_checkm_results.py __version__ = "2022.03.28"
scripts/filter_checkv_results.py __version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py __version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py __version__ = "2022.7.14"
scripts/genome_spatial_coverage.py __version__ = "2022.08.17"
scripts/groupby_table.py __version__ = "2022.08.17"
scripts/hmmer_to_proteins.py __version__ = "2021.08.03"
scripts/insert_column_to_table.py __version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py __version__ = "2021.08.25"
scripts/merge_busco_json.py __version__ = "2022.03.10"
scripts/merge_contig_mapping.py __version__ = "2022.06.27"
scripts/merge_fastq_statistics.py __version__ = "2022.03.08"
scripts/merge_gtdbtk.py __version__ = "2022.03.24"
scripts/merge_msa.py __version__ = "2022.06.21"
scripts/merge_orf_mapping.py __version__ = "2021.03.27"
scripts/metaeuk_wrapper.py __version__ = "2022.08.27"
scripts/partition_clusters.py __version__ = "2021.08.12"
scripts/partition_gene_models.py __version__ = "2021.08.24"
scripts/partition_hmmsearch.py __version__ = "2022.06.20"
scripts/partition_multisplit_bins.py __version__ = "2022.04.08"
scripts/partition_orthogroups.py __version__ = "2022.04.01"
scripts/partition_unbinned.py __version__ = "2021.08.05"
scripts/scaffolds_to_bins.py __version__ = "2021.03.26"
scripts/subset_table.py __version__ = "2022.04.20"
scripts/subset_table_by_column.py __version__ = "2022.04.20"