Skip to content

Releases: jolespin/veba

VEBA_v1.3.0

27 Oct 21:13
bb683ab
Compare
Choose a tag to compare

Release v1.3.0:

  • VEBA Modules:

    • Added profile-pathway.py module and associated scripts for building HUMAnN databases from de novo genomes and annotations. Essentially, a reads-based functional profiling method via HUMAnN using binned genomes as the database.
    • Added marker_gene_clustering.py script which identifies core marker proteins that are present in all genomes within a genome cluster (i.e., pangenome) and unique to only that genome cluster. Clusters in either protein or nucleotide space.
    • Added module_completion_ratios.py script which calculates KEGG module completion ratios for genomes and pangenomes. Automatically run in backend of annotate.py.
    • Updated annotate.py and merge_annotations.py to provide better annotations for clustered proteins.
    • Added merge_genome_quality.py and merge_taxonomy_classifications.py which compiles genome quality and taxonomy, respectively, for all organisms.
    • Added BGC clustering in protein and nucleotide space to biosynthetic.py. Also, produces prevalence tables that can be used for further clustering of BGCs.
    • Added pangenome_core_sequences in cluster.py writes both protein and CDS sequences for each genome cluster.
    • Added PDF visualization of newick trees in phylogeny.py.
  • VEBA Database (VDB_v5.2):

    • Added CAZy
    • Added MicrobeAnnotator-KEGG
**Release v1.3.0 Details**
  • Update annotate.py and merge_annotations.py to handle CAZy. They also properly address clustered protein annotations now.
  • Added module_completion_ratio.py script which is a fork of MicrobeAnnotator ko_mapper.py. Also included a database Zenodo: 10020074 which will be included in VDB_v5.2
  • Added a checkpoint for tRNAscan-SE in binning-prokaryotic.py and eukaryotic_gene_modeling_wrapper.py.
  • Added profile-pathway.py module and VEBA-profile_env environments which is a wrapper around HUMAnN for the custom database created from annotate.py and compile_custom_humann_database_from_annotations.py
  • Added GenoPype version to log output
  • Added merge_genome_quality.py which combines CheckV, CheckM2, and BUSCO results.
  • Added compile_custom_humann_database_from_annotations.py which compiles a HUMAnN protein database table from the output of annotate.py and taxonomy classifications.
  • Added functionality to merge_taxonomy_classifications.py to allow for --no_domain and --no_header which will serve as input to compile_custom_humann_database_from_annotations.py
  • Added marker_gene_clustering.py script which gets core marker genes unique to each SLC (i.e., pangenome). average_number_of_copies_per_genome to protein clusters.
  • Added --minimum_core_prevalence in global_clustering.py, local_clustering.py, and cluster.py which indicates prevalence ratio of protein clusters in a SLC will be considered core. Also remove --no_singletons from cluster.py to avoid complications with marker genes. Relabeled --input to --genomes_table in clustering scripts/module.
  • Added a check in coverage.py to see if the mapped.sorted.bam files are created, if they are then skip them. Not yet implemented for GNU parallel option.
  • Changed default representative sequence format from table to fasta for mmseqs2_wrapper.py.
  • Added --nucleotide_fasta_output to antismash_genbank_to_table.py which outputs the actual BGC DNA sequence. Changed --fasta_output to --protein_fasta_output and added output to biosynthetic.py. Changed BGC component identifiers to [bgc_id]_[position_in_bgc]|[start]:[end]([strand]) to match with MetaEuk identifiers. Changed bgc_type to protocluster_type. biosynthetic.py now supports GFF files from MetaEuk (exon and gene features not supported by antiSMASH). Fixed error related to antiSMASH adding CDS (i.e., allorf_[start]_[end]) that are not in GFF so antismash_genbank_to_table.py failed in those cases.
  • Added ete3 to VEBA-phylogeny_env.yml and automatically renders trees to PDF.
  • Added presets for MEGAHIT using the --megahit_preset option.
  • The change for using --mash_db with GTDB-Tk violated the assumption that all prokaryotic classifications had a msa_percent field which caused the cluster-level taxonomy to fail. compile_prokaryotic_genome_cluster_classification_scores_table.py fixes this by uses fastani_ani as the weight when genomes were classified using ANI and msa_percent for everything else. Initial error caused unclassified prokaryotic for all cluster-level classifications.
  • Fixed small error where empty gff files with an asterisk in the name were created for samples that didn't have any prokaryotic MAGs.
  • Fixed critical error where descriptions in header were not being removed in eukaryota.scaffolds.list and did not remove eukaryotic scaffolds in seqkit grep so DAS_Tool output eukaryotic MAGs in identifier_mapping.tsv and __DASTool_scaffolds2bin.no_eukaryota.txt
  • Fixed krona.html in biosynthetic.py which was being created incorrectly from compile_krona.py script.
  • Create pangenome_core_sequences in global_clustering.py and local_clustering.py which writes both protein and CDS sequences for each SLC. Also made default in cluster.py to NOT do local clustering switching --no_local_clustering to --local_clustering.
  • pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects in biosynthetic.py when Diamond finds multiple regions in one hit that matches. Added --sort_by and --ascending to concatenate_dataframes.py along with automatic detection and removal of duplicate indices. Also added --sort_by bitscore in biosynthetic.py.
  • Added core pangenome and singleton hits to clustering output
  • Updated --megahit_memory default from 0.9 to 0.99
  • Fixed error in genomad_taxonomy_wrapper.py where viral_taxonomy.tsv should have been taxonomy.tsv.
  • Fixed minor error in assembly.py that was preventing users from using SPAdes programs that were not spades.py, metaspades.py, or rnaspades.py that was the result of using an incorrect string formatting.
  • Updated bowtie2 in preprocess, assembly, and mapping modules. Updated fastp and fastq_preprocessor in preprocess module.

VEBA_v1.2.0

12 Jul 00:22
Compare
Choose a tag to compare

Release v1.2.0:

  • Fixed minor error in binning-prokaryotic.py where the --veba_database argument wasn't utilized and only the environment variable VEBA_DATABASE could be used.
  • Updated the Docker images to have /volumes/input, /volumes/output, and /volumes/database directories to mount.
  • Replaced prodigal with pyrodigal as it is faster and under active development.
  • Added support for missing classifications in compile_krona.py and consensus_genome_classification.py.
  • Updated GTDB-Tk from version 2.1.32.3.0 and GTDB from version r202_v2r214. Changed ${VEBA_DATABASE}/Classify/GTDBTk${VEBA_DATABASE}/Classify/GTDB. Added gtdb_r214.msh to GTDB database for ANI screening.
  • Added pangenome and singularity tables to cluster.py (and associated global/local clustering scripts) to output automatically.
  • Added compile_gff.py to merge CDS, rRNA, and tRNA GFF files. Used in binning-prokaryotic.py and binning-viral.py. binning-eukaryotic.py uses the source of this in the backend of filter_busco_results.py. Includes GC content for contigs and various tags.
  • Updated BUSCO v5.3.2 -> v5.4.3 which changes the json output structure and made the appropriate changes in filter_busco_results.py.
  • Added eukaryotic_gene_modeling_wrapper.py which 1) splits nuclear, mitochondrial, and plastid genomes; 2) performs gene modeling via MetaEuk and Pyrodigal; 3) performs rRNA detection via BARRNAP; 4) performs tRNA detection via tRNAscan-SE; 5) merges processed GFF files; and 5) calculates sequences statistics.
  • Added gene_biotype=protein_coding to P(y)rodigal(-GV) GFF output.
  • Added VFDB to annotate.py and database.
  • Compiled and pushed gtdb_r214.msh mash file to Zenodo:8048187 which is now used by default in classify-prokaryotic.py. It is now included in VDB_v5.1.
  • Cleaned up global and local clustering intermediate files. Added pangenome tables and singelton information to outputs.

VEBA_v1.1.2

17 May 03:34
Compare
Choose a tag to compare
Release v1.1.2
  • Created Docker images for all modules
  • Replaced all absolute path symlinks with relative symlinks
  • Changed prokaryotic_taxonomy.tsv and prokaryotic_taxonomy.clusters.tsv in classify-prokaryotic.py (along with eukaryotic and viral) files to taxonomy.tsv and taxonomy.clusters.tsv for uniformity.
  • Updating all symlinks to relative links (also in fastq_preprocessor) to prepare for dockerization and updating all environments to use updated GenoPype 2023.4.13.
  • Changed nr to uniref in annotate.py and added propagate_annotations_from_representatives.py script while simplifying merge_annotations_and_taxonomy.py to merge_annotations.py and excluding taxonomy operations.
  • Changed nr to UniRef90 and UniRef50 in VDB_v5
  • Changed orfs_to_orthogroups.tsv to proteins_to_orthogroups.tsv for consistency with the cluster.py module. Will eventually find some consitency with scaffolds_to_bins/scaffolds_to_mags but this will be later.
  • Added a scaffolds_to_mags.tsv in the clustering output.
  • Added convert_counts_table.py which converts a counts table (and metadata) to Pandas pickle, Anndata h5ad, or Biom hdf5
  • Fixed output directory for mapping.py which now uses output_directory/${NAME} structure like binning-*.py.
  • Removed "python" prefix for script calls and now uses shebang in script for executable. Also added single paranthesis around script filepath (e.g., '[script_filepath]') to escape characters/spaces in filepath.
  • Added support for index.py to accept individual --references [file.fasta] and --gene_models [file.gff].
  • Added stdin support for scaffolds_to_bins.py along with the ability to input genome tables [id_genome][filepath]. Also added progress bars.
  • As a result of issues/22, assembly.py, assembly-sequential.py, binning-*.py, and mapping.py will use -p --countReadPairs for featureCounts and updates subread 2.0.1 -> subread 2.0.3. For binning-*.py, long reads can be used with the --long_reads flag.
  • Updated cluster.py and associated global_clustering.py/local_clustering.py scripts to use mmseqs2_wrapper.py which now automatically outputs representative sequences.
  • Added check_fasta_duplicates.py script that gives 0 and 1 exit codes for fasta without and with duplicates, respectively. Added reformat_representative_sequences.py to reformat representative sequences from MMSEQS2 into either a table or fasta file where the identifers are cluster labels. Removed --dbtype from [global/local]_clustering.py. Removed appended prefix for .graph.pkl and dict.pkl in edgelist_to_clusters.py. Added mmseqs2_wrapper.py and hmmer_wrapper.py scripts.
  • Added an option to merge_generalized_mapping.py to include the sample index in a filepath and also an option to remove empty features (useful for Salmon). Added an executable='/bin/bash' option to the subprocess.Popen calls in GenoPype to address issues/23.
  • Added genbanks/[id_genome]/ to output directory of biosynthetic.py which has symlinks to all the BGC genbanks from antiSMASH.

VEBA_v1.1.1

21 Mar 00:20
Compare
Choose a tag to compare

Minor updates from v1.1.0.

  • Most important update includes fixing a broken VEBA-binning-viral.yml install recipe which had package conflicts for aria2 30e8b0a.
  • Fixes on conda-related environment variables in the install scripts.
  • Added MIBiG to database and annotate.py
  • Added a composite label for annotations in annotate.py
  • Added --dastool_minimum_score to binning-prokaryotic.py module
  • Added a wrapper around STAR aligner
  • Updated merge_generalized_mapping.py script to take in BAM files instead of being dependent on a specific directory.
  • Added option to have no header in subst_table.py

VEBA_v1.1.0

02 Mar 20:44
Compare
Choose a tag to compare
Release v1.1.0
  • Modules:

    • annotate.py

      • Added NCBIfam-AMRFinder AMR domain annotations
      • Added AntiFam contimination annotations
      • Uses taxopy instead of ete3 in backend with merge_annotations_and_score_taxonomy.py
    • assembly.py

      • Added a transcripts_to_genes.py script which creates a genes_to_transcripts.tsv table that can be used with TransDecoder.
    • binning-prokaryotic.py

      • Updated CheckMCheckM2. This removes the dependency of GTDB-Tk and EXTREMELY REDUCES compute resource requirements (e.g., memory and time) as CheckM2 automatically handles candidate phyla radiation. With this, several backend scripts were deprecated. This cleans up the binning pipeline and error messages SUBSTANTIALLY.
      • Uses binning_wrapper.py for all binning. This makes it easier to add new binning algorithms in the future (e.g., VAMB). Also, check out the new multi-split binning functionality described below.
      • Added --skip_concoct in addition to the already existing --skip_maxbin2 option as MaxBin2 takes very long when there's a lot of contigs and CONCOCT takes a long time when there are a lot of samples (i.e., BAM files). MetaBAT2 is not optional.
    • binning-viral.py

      • Complete rewrite of this module which now uses geNomad as the default binning algorithm but still supports VirFinder.
      • If VirFinder is used, the genomad annotate is run via the genomad_taxonomy_wrapper.py script included in the update.
      • Updated ProdigalProdigal-GV to handle additional viral genetic codes.
    • biosynthetic.py

      • Introduces component_id and bgc_id which are unique, pareseable, and informative. For example, component_id = SRR17458614__CONCOCT__P.2__9|NODE_3319_length_2682_cov_2.840502|region001_1|2-2681(+) contains the unique bgc_id (i.e., SRR17458614__CONCOCT__P.2__9|NODE_3319_length_2682_cov_2.840502|region001), shows that it is the 1st gene in the cluster (the _1 in region001_1), and the gene start/end/strand. The bgc_id is composed of the genome_id|contig_id|region_id.
    • classify-prokaryotic.py

      • Updated GTDB-Tk v2.1.1GTDB-Tk v2.2.3. For now, --skip_ani_screen is the only option because of this thread. However, --mash_db may be an option in the near future.
      • Added functionality to classify prokaryotic genomes that were not binned via VEBA which is available with the --genomes option (--prokaryotic_binning_directory is still available which can leverage existing intermediate files).
    • classify-eukaryotic.py

      • Added functionality to classify eukaryotic genomes that were not binned via VEBA which is available with the --genomes option (--eukaryotic_binning_directory is still available which can leverage existing intermediate files). This is implemented by using the eukaryota_odb10 markers from the VEBA Microeukaryotic Database to substantially improve performance and decrease resources required for gene models.
    • classify-viral.py

      • Complete rewrite of this module which does not rely on (deprecated) intermediate files from CheckV.
      • Uses taxonomy generated from geNomad and consensus_genome_classification_unranked.py (a wrapper around taxopy) that can handle the chaotic taxonomy of viruses.
      • Added functionality to classify viral genomes that were not binned via VEBA which is available with the --genomes option (--viral_binning_directory is still available which can leverage existing intermediate files).
    • cluster.py

      • Complete rewrite of this module which now uses MMSEQS2 as the orthogroup detection algorithm instead of OrthoFinder. OrthoFinder is overkill for creating protein clusters and it generates thousands of intermediate files (e.g., fasta, alignments, trees, etc.) which substantially increases the compute time. MMSEQS2 has very similar performance with a fraction of the resources and compute time. Clustered the entire Plastisphere dataset on a local machine in ~30 minutes compared to several days on a HPC.
      • Now that the resources are minimal, clustering is performed at global level as before (i.e., all samples in the dataset) and now at the local level, optionally but ON by default, which clusters all genomes within a sample. Accompanying wrapper scripts are global_clustering.py and local_clustering.py.
      • The genomic and functional feature compression ratios (FCR) (described here]) are now calculated automatically. The calculation is 1 - number_of_clusters/number_of_features which can easily be converted into an unsupervised biodiversity metric. This is calculated at the global (original implementation) and local levels.
      • Input is now a table with the following columns: [organism_type]<tab>[id_sample]<tab>[id_mag]<tab>[genome]<tab>[proteins] and is generated easily with the compile_genomes_table.py script. This allows clustering to be performed for prokaryotes, eukaryotes, and viruses all at the same time.
      • SLC-specific orthogroups (SSO) are now refered to as SLC-specific protein clusters (SSPC).
      • Support zfilling (e.g., zfill=3, SLC7 → SLC007) for genomic and protein clusters.
      • Deprecated fastani_to_clusters.py to now use the more generalizable edgelist_to_clusters.py which is used for both genomic and protein clusters. This also outputs a NetworkX graph and a pickled dictionary {"cluster_a":{"component_1", "component_2", ..., "component_n"}}
    • phylogeny.py

      • Updated MUSCLE to v5 which has -align and -super5 algorithms which are now accessible with --alignment_algorithm. Cannot use stdin so now the fasta files are not gzipped. The merge_msa.py now output uncompressed fasta as default and can output gzipped with the --gzip flag.
  • VEBA Database:

    • VDB_v3.1VDB_v4
      • Updated CheckV DB v1.0CheckV DB v1.5
      • Added geNomad DB v1.2
      • Added CheckM2 DB
      • Removed CheckM DB
      • Removed taxa.sqlite and taxa.sqlite.traverse.pkl
      • Added reference.eukaryota_odb10.list and corresponding MMSEQS2 database (i.e., microeukaryotic.eukaryota_odb10)
      • Added NCBIfam-AMRFinder marker set for annotation
      • Added AntiFam marker set for contamination
      • Marker sets HMMs are now all gzipped (previously could not gzip because CheckM CPR workflow)
  • Scripts:

    • Added:

      • append_geneid_to_transdecoder_gff.py
      • bowtie2_wrapper.py
      • compile_genomes_table.py
      • consensus_genome_classification_unranked.py
      • cut_table.py
      • cut_table_by_column_labels.py
      • drop_missing_values.py
      • edgelist_to_clusters.py
      • filter_checkm2_results.py
      • genomad_taxonomy_wrapper.py
      • global_clustering.py
      • local_clustering.py
      • partition_multisplit_bins.py
      • scaffolds_to_clusters.py
      • scaffolds_to_samples.py
      • transcripts_to_genes.py
      • transdecoder_wrapper.py (Note: Requires separate environment to run due to dependency conflicts)
    • Updated:

      • antismash_genbanks_to_table.py - Added option to output biosynthetic gene cluster (BGC) fasta. Adds unique (and parseable) BGC identifiers making the output much more useful.
      • binning_wrapper.py - This binning wrapper now includes functionality to use multi-split binning (i.e., concatenated contigs from different assemblies, map all reads to the contigs, bin all together, and then parition bins by sample). This concept AFAIK was first introduced in the VAMB paper.
      • compile_reads_table.py - Minimal change but now the extension excludes the . to make usage more consistent with other tools.
      • consensus_genome_classification.py - Changed the output to match that of consensus_genome_classification_unranked.py.
      • filter_checkv_results.py - Option to use taxonomy and viral summaries generated by geNomad.
      • scaffolds_to_bins.py - Support for getting scaffolds to bins for a list of genomes via --genomes argument while maintaining original support with --binning_directory argument.
      • subset_table.py - Added option to set index column and to drop duplicates.
      • virfinder_wrapper.r - Used to be VirFinder_wrapper.R. This now has an option to use FDR values instead of P values.
      • merge_annotations_and_score_taxonomy.py - Completely rewritten. Uses taxopy instead of ete3.
      • merge_msa.py - Output uncompressed protein fasta files by default and can compress with --gzip flag.
    • Deprecated:

      • adjust_genomes_for_cpr.py
      • filter_checkm_results.py
      • fastani_to_clusters.py
      • partition_orthogroups.py
      • partition_clusters.py
      • compile_viral_classifications.py
      • build_taxa_sqlite.py
  • Miscellaneous:

    • Updated environments and now add versions to environments.
    • Added mamba to installation to speed up.
    • Added transdecoder_wrapper.py which is a wrapper around TransDecoder with direct support for Diamond and HMMSearch homology searches. Also includes append_geneid_to_transdecoder_gff.py which is run in the backend to clean up the GFF file and make them compatible with what is output by Prodigal and MetaEuk runs of VEBA.
    • Added support for using n_jobs -1 to use all available threads (similar to scikit-learn methodology).

VEBA_v1.0.4

28 Dec 06:04
Compare
Choose a tag to compare
Release v1.0.4
  • Added biopython to VEBA-assembly_env which is needed when running MEGAHIT as the scaffolds are rewritten and an error was raised. aea51c3
  • Updated Microeukaryotic protein database to exclude a few higher eukaryotes that were present in database, changed naming scheme to hash identifiers (from cat reference.faa | seqkit fx2tab -s -n > id_to_hash.tsv). Switching database from FigShare to Zenodo. Uses database version VDB_v3 which has the updated microeukaryotic protein database (VDB-Microeukaryotic_v2) 0845ba6

VEBA_v1.0.3e

14 Dec 18:47
Compare
Choose a tag to compare

If you have 1.0.3 ≤ version < 1.0.3e, you can update easily on Patch Fix #1

Release v1.0.3e
  • Patch fix for install_veba.sh where install/environments/VEBA-assembly_env.yml raised a compatibilty error when creating the VEBA-assembly_env environment c2ab957
  • Patch fix for VirFinder_wrapper.R where __version__ = variable was throwing an R error when running binning-viral.py module. 19e8f38
  • Patch fix for filter_busco_results.py where an error arose that produced empty identifier_mapping.metaeuk.tsv subset tables. 359e4569
  • Patch fix for compile_metaeuk_identifiers.py where a Python error arised when duplicate gene identifiers were present. c248527
  • Patch fix for install_veba.sh where install/environments/VEBA-preprocess_env.yml raised a compatibilty error when creating the VEBA-preprocess_env environment 8ed6eea

  • Added biosynthetic.py module which runs antiSMASH and converts genbank files to tabular format. 6c0ed82
  • Added megahit support for assembly.py module (not yet available in assembly-sequential.py). 6c0ed82
  • Changed -P/--spades_program to -P/--program for assembly.py. 6c0ed82
  • Replaced penultimate step in binning-prokaryotic.py to use adjust_genomes_for_cpr.py instead of the extremely long series of bash commands. This will make it easier to diagnose errors in this critical step. 6c0ed82
  • Added support for contig descriptions and added MAG identifier in fasta files in binning-eukaryotic.py. Now uses the metaeuk_wrapper.py script for the MetaEuk step. 6c0ed82
  • Added separate option of --run_metaplasmidspades for assembly-sequential.py instead of making it mandatory (now it just runs biosyntheticSPAdes and metaSPAdes by default). 6c0ed82
  • Added --use_mag_as_description in parition_gene_models.py script to include the MAG identifier in the contig description of the fasta header which is default in binning-prokaryotic.py. 6c0ed82
  • Added adjust_genomes_for_cpr.py script to easier run and understand the CPR adjustment step of binning-prokaryotic.py. 6c0ed82
  • Added support for fasta header descriptions in binning-prokaryotic.py. 6c0ed82
  • Added functionality to replace_fasta_descriptions.py script to be able to use a string for replacing fasta headers in addition to the original functionality. 6c0ed82

VEBA_v1.0.2a

27 Oct 00:57
Compare
Choose a tag to compare
Release v1.0.2a

Not to be confused with v1.0.2 which is deprecated

  • Updated GTDB-Tk in VEBA-binning-prokaryotic_env from 1.x to 2.x (this version uses much less memory): f3507dd
  • Updated the GTDB-Tk database from R202 to R207_v2 to be compatible with GTDB-Tk v2.x: f3507dd
  • Updated the GRCh38 no-alt analysis set to T2T CHM13v2.0 for the default human reference: 5ccb4e2
  • Added an experimental amplicon.py module for short-read ASV detection via the DADA2 workflow of QIIME2: cd4ed2b
  • Added additional functionality to compile_reads_table.py to handle advanced parsing of samples from fastq directories while also maintaining support for parsing filenames from veba_output/preprocess: cd4ed2b
  • Added sra-tools to VEBA-preprocess_env: f3507dd
  • Fixed symlinks to scripts for install_veba.sh: d1fad03
  • Added missing CHECKM_DATA_PATH environment variable to VEBA-binning-prokaryotic_env and VEBA-classify_env: d1fad03
  • ⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)

Module Versions:

amplicon.py __version__ = "2022.10.24"
annotate.py __version__ = "2021.7.8"
assembly.py __version__ = "2022.03.25"
binning-eukaryotic.py __version__ = "2022.10.20"
binning-prokaryotic.py __version__ = "2022.10.25"
binning-viral.py __version__ = "2022.7.13"
classify-eukaryotic.py __version__ = "2022.7.8"
classify-prokaryotic.py __version__ = "2022.06.07"
classify-viral.py __version__ = "2022.7.13"
cluster.py __version__ = "2022.10.16"
coverage.py __version__ = "2022.06.03"
index.py __version__ = "2022.02.17"
mapping.py __version__ = "2022.8.17"
phylogeny.py __version__ = "2022.06.22"
preprocess.py __version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py __version__ = "2021.06.19"
scripts/binning_wrapper.py __version__ = "2022.04.11"
scripts/build_taxa_sqlite.py __version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py __version__ = "2021.08.20"
scripts/compile_binning.py __version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py __version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py __version__ = "2022.03.18"
scripts/compile_reads_table.py __version__ = "2022.10.24"
scripts/compile_scaffold_identifiers.py __version__ = "2022.02.23"
scripts/compile_viral_classifications.py __version__ = "2022.03.08"
scripts/concatenate_dataframes.py __version__ = "2022.03.24"
scripts/concatenate_fasta.py __version__ = "2022.02.17"
scripts/concatenate_gff.py __version__ = "2022.02.17"
scripts/consensus_domain_classification.py __version__ = "2022.02.28"
scripts/consensus_genome_classification.py __version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py __version__ = "2022.02.02"
scripts/determine_trim_position.py __version__ = "2022.8.11"
scripts/fasta_to_saf.py __version__ = "2021.04.04"
scripts/fasta_utility.py __version__ = "2021.07.31"
scripts/fastani_to_clusters.py __version__ = "2021.11.16"
scripts/fastq_position_statistics.py __version__ = "2022.10.24"
scripts/filter_busco_results.py __version__ = "2022.04.04"
scripts/filter_checkm_results.py __version__ = "2022.03.28"
scripts/filter_checkv_results.py __version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py __version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py __version__ = "2022.7.14"
scripts/genome_spatial_coverage.py __version__ = "2022.08.17"
scripts/groupby_table.py __version__ = "2022.08.17"
scripts/hmmer_to_proteins.py __version__ = "2021.08.03"
scripts/insert_column_to_table.py __version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py __version__ = "2021.08.25"
scripts/merge_busco_json.py __version__ = "2022.03.10"
scripts/merge_contig_mapping.py __version__ = "2022.06.27"
scripts/merge_fastq_statistics.py __version__ = "2022.03.08"
scripts/merge_gtdbtk.py __version__ = "2022.03.24"
scripts/merge_msa.py __version__ = "2022.06.21"
scripts/merge_orf_mapping.py __version__ = "2021.03.27"
scripts/metaeuk_wrapper.py __version__ = "2022.08.27"
scripts/partition_clusters.py __version__ = "2021.08.12"
scripts/partition_gene_models.py __version__ = "2021.08.24"
scripts/partition_hmmsearch.py __version__ = "2022.06.20"
scripts/partition_multisplit_bins.py __version__ = "2022.04.08"
scripts/partition_orthogroups.py __version__ = "2022.04.01"
scripts/partition_unbinned.py __version__ = "2021.08.05"
scripts/replace_fasta_descriptions.py __version__ = "2022.9.1"
scripts/scaffolds_to_bins.py __version__ = "2021.03.26"
scripts/subset_table.py __version__ = "2022.04.20"
scripts/subset_table_by_column.py __version__ = "2022.04.20"

VEBA_v1.0.1

21 Oct 00:42
Compare
Choose a tag to compare

Small patch fix:

  • Fixed the fatal binning-eukaryotic.py error: 7c5addf
  • Fixed the minor file naming in cluster.py: 5803845
  • Removes left-over human genome tar.gz during database download/config: 5803845
  • ⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)

Module Versions:

annotate.py	__version__ = "2021.7.8"
assembly.py	__version__ = "2022.03.25"
binning-eukaryotic.py	__version__ = "2022.10.20"
binning-prokaryotic.py	__version__ = "2022.7.8"
binning-viral.py	__version__ = "2022.7.13"
classify-eukaryotic.py	__version__ = "2022.7.8"
classify-prokaryotic.py	__version__ = "2022.06.07"
classify-viral.py	__version__ = "2022.7.13"
cluster.py	__version__ = "2022.10.16"
coverage.py	__version__ = "2022.06.03"
index.py	__version__ = "2022.02.17"
mapping.py	__version__ = "2022.8.17"
phylogeny.py	__version__ = "2022.06.22"
preprocess.py	__version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py	__version__ = "2021.06.19"
scripts/binning_wrapper.py	__version__ = "2022.04.11"
scripts/build_taxa_sqlite.py	__version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py	__version__ = "2021.08.20"
scripts/compile_binning.py	__version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py	__version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py	__version__ = "2022.03.18"
scripts/compile_reads_table.py	__version__ = "2021.7.18"
scripts/compile_scaffold_identifiers.py	__version__ = "2022.02.23"
scripts/compile_viral_classifications.py	__version__ = "2022.03.08"
scripts/concatenate_dataframes.py	__version__ = "2022.03.24"
scripts/concatenate_fasta.py	__version__ = "2022.02.17"
scripts/concatenate_gff.py	__version__ = "2022.02.17"
scripts/consensus_domain_classification.py	__version__ = "2022.02.28"
scripts/consensus_genome_classification.py	__version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py	__version__ = "2022.02.02"
scripts/fasta_to_saf.py	__version__ = "2021.04.04"
scripts/fasta_utility.py	__version__ = "2021.07.31"
scripts/fastani_to_clusters.py	__version__ = "2021.06.16"
scripts/filter_busco_results.py	__version__ = "2022.04.04"
scripts/filter_checkm_results.py	__version__ = "2022.03.28"
scripts/filter_checkv_results.py	__version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py	__version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py	__version__ = "2022.7.14"
scripts/genome_spatial_coverage.py	__version__ = "2022.08.17"
scripts/groupby_table.py	__version__ = "2022.08.17"
scripts/hmmer_to_proteins.py	__version__ = "2021.08.03"
scripts/insert_column_to_table.py	__version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py	__version__ = "2021.08.25"
scripts/merge_busco_json.py	__version__ = "2022.03.10"
scripts/merge_contig_mapping.py	__version__ = "2022.06.27"
scripts/merge_fastq_statistics.py	__version__ = "2022.03.08"
scripts/merge_gtdbtk.py	__version__ = "2022.03.24"
scripts/merge_msa.py	__version__ = "2022.06.21"
scripts/merge_orf_mapping.py	__version__ = "2021.03.27"
scripts/metaeuk_wrapper.py	__version__ = "2022.08.27"
scripts/partition_clusters.py	__version__ = "2021.08.12"
scripts/partition_gene_models.py	__version__ = "2021.08.24"
scripts/partition_hmmsearch.py	__version__ = "2022.06.20"
scripts/partition_multisplit_bins.py	__version__ = "2022.04.08"
scripts/partition_orthogroups.py	__version__ = "2022.04.01"
scripts/partition_unbinned.py	__version__ = "2021.08.05"
scripts/scaffolds_to_bins.py	__version__ = "2021.03.26"
scripts/subset_table.py	__version__ = "2022.04.20"
scripts/subset_table_by_column.py	__version__ = "2022.04.20"

VEBA_v1.0.0

18 Sep 22:46
Compare
Choose a tag to compare

Version released for manuscript submission.

  • ⚠️ In this version, contigs/scaffolds cannot have descriptions in fasta header for prokaryotic binning (Fixed in versions after 2022.11.07)

Module Versions:

annotate.py	__version__ = "2021.7.8"
assembly.py	__version__ = "2022.03.25"
binning-eukaryotic.py	__version__ = "2022.7.8"
binning-prokaryotic.py	__version__ = "2022.7.8"
binning-viral.py	__version__ = "2022.7.13"
classify-eukaryotic.py	__version__ = "2022.7.8"
classify-prokaryotic.py	__version__ = "2022.06.07"
classify-viral.py	__version__ = "2022.7.13"
cluster.py	__version__ = "2022.06.04"
coverage.py	__version__ = "2022.06.03"
index.py	__version__ = "2022.02.17"
mapping.py	__version__ = "2022.8.17"
phylogeny.py	__version__ = "2022.06.22"
preprocess.py	__version__ = "2022.01.19"
scripts/append_geneid_to_prodigal_gff.py	__version__ = "2021.06.19"
scripts/binning_wrapper.py	__version__ = "2022.04.11"
scripts/build_taxa_sqlite.py	__version__ = "2022.04.18"
scripts/check_scaffolds_to_bins.py	__version__ = "2021.08.20"
scripts/compile_binning.py	__version__ = "2022.03.23"
scripts/compile_eukaryotic_classifications.py	__version__ = "2022.7.8"
scripts/compile_metaeuk_identifiers.py	__version__ = "2022.03.18"
scripts/compile_reads_table.py	__version__ = "2021.7.18"
scripts/compile_scaffold_identifiers.py	__version__ = "2022.02.23"
scripts/compile_viral_classifications.py	__version__ = "2022.03.08"
scripts/concatenate_dataframes.py	__version__ = "2022.03.24"
scripts/concatenate_fasta.py	__version__ = "2022.02.17"
scripts/concatenate_gff.py	__version__ = "2022.02.17"
scripts/consensus_domain_classification.py	__version__ = "2022.02.28"
scripts/consensus_genome_classification.py	__version__ = "2022.7.13"
scripts/consensus_orthogroup_annotation.py	__version__ = "2022.02.02"
scripts/fasta_to_saf.py	__version__ = "2021.04.04"
scripts/fasta_utility.py	__version__ = "2021.07.31"
scripts/fastani_to_clusters.py	__version__ = "2021.06.16"
scripts/filter_busco_results.py	__version__ = "2022.04.04"
scripts/filter_checkm_results.py	__version__ = "2022.03.28"
scripts/filter_checkv_results.py	__version__ = "2021.08.10"
scripts/filter_hmmsearch_results.py	__version__ = "2021.06.16"
scripts/genome_coverage_from_spades.py	__version__ = "2022.7.14"
scripts/genome_spatial_coverage.py	__version__ = "2022.08.17"
scripts/groupby_table.py	__version__ = "2022.08.17"
scripts/hmmer_to_proteins.py	__version__ = "2021.08.03"
scripts/insert_column_to_table.py	__version__ = "2022.03.24"
scripts/merge_annotations_and_score_taxonomy.py	__version__ = "2021.08.25"
scripts/merge_busco_json.py	__version__ = "2022.03.10"
scripts/merge_contig_mapping.py	__version__ = "2022.06.27"
scripts/merge_fastq_statistics.py	__version__ = "2022.03.08"
scripts/merge_gtdbtk.py	__version__ = "2022.03.24"
scripts/merge_msa.py	__version__ = "2022.06.21"
scripts/merge_orf_mapping.py	__version__ = "2021.03.27"
scripts/metaeuk_wrapper.py	__version__ = "2022.08.27"
scripts/partition_clusters.py	__version__ = "2021.08.12"
scripts/partition_gene_models.py	__version__ = "2021.08.24"
scripts/partition_hmmsearch.py	__version__ = "2022.06.20"
scripts/partition_multisplit_bins.py	__version__ = "2022.04.08"
scripts/partition_orthogroups.py	__version__ = "2022.04.01"
scripts/partition_unbinned.py	__version__ = "2021.08.05"
scripts/scaffolds_to_bins.py	__version__ = "2021.03.26"
scripts/subset_table.py	__version__ = "2022.04.20"
scripts/subset_table_by_column.py	__version__ = "2022.04.20"