diff --git a/.DS_Store b/.DS_Store
deleted file mode 100644
index 2acc3c3..0000000
Binary files a/.DS_Store and /dev/null differ
diff --git a/CHANGELOG.md b/CHANGELOG.md
index a00fb2e..7f1709c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,7 +6,73 @@ ________________________________________________________________
 
 #### Current Releases:
 
-**Release v1.3.0:**
+**Release v1.4.0 Highlights:**
+
+* **`VEBA` Modules:**
+
+	* Added `profile-taxonomic.py` module which uses `sylph` to build a sketch database for genomes and queries the genome database for taxonomic abundance.
+	* Added long read support for `fastq_preprocessor`, `preprocess.py`, `assembly-long.py`, `coverage-long`, and all binning modules.
+	* Redesign `binning-eukaryotic` module to handle custom `MetaEuk` databases
+	* Added new usage syntax `veba --module preprocess --params “${PARAMS}”` where the Conda environment is abstracted and determined automatically in the backend.  Changed all the walkthroughs to reflect this change.
+	* Added `skani` which is the new default for genome-level clustering based on ANI.
+	* Added `Diamond DeepClust` as an alternative to `MMSEQS2` for protein clustering.
+
+* **`VEBA` Database (`VDB_v6`)**:
+
+	* Completely rebuilt `VEBA's Microeukaryotic Protein Database` to produce a clustered database `MicroEuk100/90/50` similar to `UniRef100/90/50`. Available on [doi:10.5281/zenodo.10139450](https://zenodo.org/records/10139451).
+
+	* **Number of sequences:**
+
+		 * MicroEuk100 = 79,920,431 (19 GB)
+		
+		 * MicroEuk90  = 51,767,730 (13 GB)
+		
+		 * MicroEuk50  = 29,898,853 (6.5 GB)
+
+ 
+
+	* **Number of source organisms per dataset:**
+
+		* MycoCosm = 2503
+		
+		* PhycoCosm = 174
+		
+		* EnsemblProtists = 233
+		
+		* MMETSP = 759
+		
+		* TARA_SAGv1 = 8
+		
+		* EukProt = 366
+		
+		* EukZoo = 27
+		
+		* TARA_SMAGv1 = 389
+		
+		* NR_Protists-Fungi = 48217
+
+<details>
+	<summary>**Release v1.4.0 Details**</summary>
+* [2023.12.15] - Added `profile-taxonomic.py` module which uses `sylph` to build a sketch database for genomes and queries the genome database similar to `Kraken` for taxonomic abundance.
+* [2023.12.14] - Removed requirement to have `--estimated_assembly_size` for Flye per [Flye Issue #652](https://github.com/fenderglass/Flye/issues/652).
+* [2023.12.14] - Added `sylph` to `VEBA-profile_env` for abundance profiling of genomes.
+* [2023.12.13] - Dereplicate duplicate contigs in `concatenate_fasta.py`.
+* [2023.12.12] - Added `--reference_gzipped` to `index.py` and `mapping.py` with new default being that the reference fasta is not gzipped.
+* [2023.12.11] - Added `skani` as new default for genome clustering in `cluster.py`, `global_clustering.py`, and `local_clustering.py`.
+* [2023.12.11] - Added support for long reads in `fastq_preprocessor`, `preprocess.py`, `assembly-long.py`, `coverage-long`, and all binning modules.
+* [2023.11.28] - Fixed `annotations.protein_clusters.tsv.gz` from `merge_annotations.py` added in patch update of `v1.3.1`.
+* [2023.11.14] - Added support for missing values in `compile_eukaryotic_classifications.py`.
+* [2023.11.13] - Added `--metaeuk_split_memory_limit` argument with (experimental) default set to `36G` in `binning-eukaryotic.py` and `eukaryotic_gene_modeling.py`.
+* [2023.11.10] - Added `--compressed 1` to `mmseqs createdb` in `download_databases.sh` installation script.
+* [2023.11.10] - Added a check to `check_fasta_duplicates.py` and `clean_fasta.py` to make sure there are no `>` characters in fasta sequence caused from concatenating fasta files that are missing linebreaks.
+* [2023.11.10] - Added `Diamond DeepClust` to `clustering_wrapper.py`, `global/local_clustering.py`, and `cluster.py`.  Changed `mmseqs2_wrapper.py` to `clustering_wrapper.py`.  Changed `easy-cluster` and `easy-linclust` to `mmseqs-cluster` and `mmseqs-linclust`.
+* [2023.11.9] - Fixed viral quality in `merge_genome_quality_assessments.py`
+* [2023.11.3] - Changed `consensus_genome_classification.py` to `consensus_genome_classification_ranked.py`.  Also, default behavior to allow for missing taxonomic levels.
+* [2023.11.2] - Fixed the `merge_annotations.py` resulting in a memory leak when creating the `annotations.protein_clusters.tsv.gz` output table.  However, still need to correct the formatting for empty sets and string lists.
+
+</details>
+
+**Release v1.3.0 Highlights:**
 
 * **`VEBA` Modules:**
 	* Added `profile-pathway.py` module and associated scripts for building `HUMAnN` databases from *de novo* genomes and annotations.  Essentially, a reads-based functional profiling method via `HUMAnN` using binned genomes as the database.
@@ -139,6 +205,7 @@ ________________________________________________________________
 	<summary>**Release v1.1.0 Details**</summary>
 	
 * **Modules**:
+
 	* `annotate.py`
 		* Added `NCBIfam-AMRFinder` AMR domain annotations
 		* Added `AntiFam` contimination annotations
@@ -238,6 +305,7 @@ ________________________________________________________________
 		* `build_taxa_sqlite.py`
 
 * **Miscellaneous**:
+
 	* Updated environments and now add versions to environments.
 	* Added `mamba` to installation to speed up.
 	* Added `transdecoder_wrapper.py` which is a wrapper around `TransDecoder` with direct support for `Diamond` and `HMMSearch` homology searches.  Also includes `append_geneid_to_transdecoder_gff.py` which is run in the backend to clean up the GFF file and make them compatible with what is output by `Prodigal` and `MetaEuk` runs of `VEBA`.
@@ -317,6 +385,8 @@ ________________________________________________________________
 
 **Critical:**
 
+* `binning-prokaryotic.py` doesn't produce an `unbinned.fasta` file for long reads if there aren't any genomes.  It also creates a symlink called `genomes` in the working directory.
+* Add a way to show all versions
 * Genome checkpoints in `tRNAscan-SE` aren't working properly.
 * Dereplcate CDS sequences in GFF from `MetaEuk` for `antiSMASH` to work for eukaryotic genomes
 * Error with `amplicon.py` that works when run manually...
@@ -329,39 +399,58 @@ There was a problem importing veba_output/misc/reads_table.tsv:
 
 **Definitely:**
 
+* Use `pigz` instead of `gzip`
+* Create a taxdump for `MicroEuk`
+* Reimplement `compile_eukaryotic_classifications.py`
 * Add representative to `identifier_mapping.proteins.tsv.gz`
-* Add coding density to GFF files
 * Split `download_databases.sh`  into `download_databases.sh` (low memory, high threads) and `configure_databases.sh` (high memory, low-to-mid threads).  Use `aria2` in parallel instead of `wget`.
 * `NextFlow` support
-* Consistent usage of the following terms: 1) dataframe vs. table; 2) protein-cluster vs. orthogroup.
-* Add support for `FAMSA` in `phylogeny.py`
-* Create a `assembly-longreads.py` module that uses `MetaFlye`
-* Expand Microeukaryotic Protein Database to include more microeukaryotes (`Mycocosm` and `PhycoCosm` from `JGI`)
 * Install each module via `bioconda`
 * Add support for `Salmon` in `mapping.py` and `index.py`.  This can be used instead of `STAR` which will require adding the `exon` field to `Prodigal` GFF file (`MetaEuk` modified GFF files already have exon ids). 
 
 
-**Probably (Yes)?:**
+**Eventually (Yes)?:**
 
+* Don't load all genomes, proteins, and cds into memory for clustering.
+* Add support for `FAMSA` in `phylogeny.py`
+* Consistent usage of the following terms: 1) dataframe vs. table; 2) protein-cluster vs. orthogroup.
+* Add coding density to GFF files
+* Add `vRhyme` to `binning_wrapper.py` and support `vRhyme` in `binning-viral.py`.
+* Phylogenetic tree of `MicroEuk100`
 * Convert HMMs to `MMSEQS2` (https://github.com/soedinglab/MMseqs2/wiki#how-to-create-a-target-profile-database-from-pfam)?
 * Run `cmsearch` before `tRNAscan-SE`
 * DN/DS from pangeome analysis
 * Add [iPHoP](https://bitbucket.org/srouxjgi/iphop/src/main/) to `binning-viral.py`.
 * Add a `metabolic.py` module	
 * Swap [`TransDecoder`](https://github.com/TransDecoder/TransDecoder) for [`TransSuite`](https://github.com/anonconda/TranSuite)
-* Build a clustered version of the Microeukaryotic Protein Database that is more efficient to run.  Similar to UniRef100, UniRef90, UniRef50.
+* For viral binning, contigs that are not identified as viral via `geNomad -> CheckV` use with `vRhyme`.
 
 **...Maybe (Not)?**
 
 * Modify behavior of `annotate.py` to allow for skipping Pfam and/or KOFAM since they take a long time. 
 
-
 ________________________________________________________________
 
 
 <details>
 	<summary>**Daily Change Log:**</summary>
 	
+* [2023.12.15] - Added `profile-taxonomic.py` module which uses `sylph` to build a sketch database for genomes and queries the genome database similar to `Kraken` for taxonomic abundance.
+* [2023.12.14] - Removed requirement to have `--estimated_assembly_size` for Flye per [Flye Issue #652](https://github.com/fenderglass/Flye/issues/652).
+* [2023.12.14] - Added `sylph` to `VEBA-profile_env` for abundance profiling of genomes.
+* [2023.12.13] - Dereplicate duplicate contigs in `concatenate_fasta.py`.
+* [2023.12.12] - Added `--reference_gzipped` to `index.py` and `mapping.py` with new default being that the reference fasta is not gzipped.
+* [2023.12.11] - Added `skani` as new default for genome clustering in `cluster.py`, `global_clustering.py`, and `local_clustering.py`.
+* [2023.12.11] - Added support for long reads in `fastq_preprocessor`, `preprocess.py`, `assembly-long.py`, and all binning modules.
+* [2023.11.28] - Fixed `annotations.protein_clusters.tsv.gz` from `merge_annotations.py` added in patch update of `v1.3.1`.
+* [2023.11.14] - Added support for missing values in `compile_eukaryotic_classifications.py`.
+* [2023.11.13] - Added `--metaeuk_split_memory_limit` argument with (experimental) default set to `36G` in `binning-eukaryotic.py` and `eukaryotic_gene_modeling.py`.
+* [2023.11.10] - Added `--compressed 1` to `mmseqs createdb` in `download_databases.sh` installation script.
+* [2023.11.10] - Added a check to `check_fasta_duplicates.py` and `clean_fasta.py` to make sure there are no `>` characters in fasta sequence caused from concatenating fasta files that are missing linebreaks.
+* [2023.11.10] - Added `Diamond DeepClust` to `clustering_wrapper.py`, `global/local_clustering.py`, and `cluster.py`.  Changed `mmseqs2_wrapper.py` to `clustering_wrapper.py`.  Changed `easy-cluster` and `easy-linclust` to `mmseqs-cluster` and `mmseqs-linclust`.
+* [2023.11.9] - Fixed viral quality in `merge_genome_quality_assessments.py`
+* [2023.11.3] - Changed `consensus_genome_classification.py` to `consensus_genome_classification_ranked.py`.  Also, default behavior to allow for missing taxonomic levels.
+* [2023.11.2] - Fixed the `merge_annotations.py` resulting in a memory leak when creating the `annotations.protein_clusters.tsv.gz` output table.  However, still need to correct the formatting for empty sets and string lists.
 * [2023.10.27] - Update `annotate.py` and `merge_annotations.py` to handle `CAZy`.  They also properly address clustered protein annotations now. 
 * [2023.10.18] - Added `module_completion_ratio.py` script which is a fork of `MicrobeAnnotator` [`ko_mapper.py`](https://github.com/cruizperez/MicrobeAnnotator/blob/master/microbeannotator/pipeline/ko_mapper.py).  Also included a database [Zenodo: 10020074](https://zenodo.org/records/10020074) which will be included in `VDB_v5.2`
 * [2023.10.16] - Added a checkpoint for `tRNAscan-SE` in `binning-prokaryotic.py` and `eukaryotic_gene_modeling_wrapper.py`.
diff --git a/MODULE_RESOURCES.xlsx b/MODULE_RESOURCES.xlsx
new file mode 100644
index 0000000..bf99e33
Binary files /dev/null and b/MODULE_RESOURCES.xlsx differ
diff --git a/SOURCES.xlsx b/SOURCES.xlsx
index 33478b5..b0f60d3 100644
Binary files a/SOURCES.xlsx and b/SOURCES.xlsx differ
diff --git a/VERSION b/VERSION
index 0a7e926..a0fef3f 100644
--- a/VERSION
+++ b/VERSION
@@ -1,2 +1,2 @@
-1.3.0
-VDB_v5.2
+1.4.0b
+VDB_v6
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/00.KEGG_Data_Scrapper.py b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/00.KEGG_Data_Scrapper.py
new file mode 100644
index 0000000..e1f5f8f
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/00.KEGG_Data_Scrapper.py
@@ -0,0 +1,165 @@
+from bs4 import BeautifulSoup
+import pandas as pd
+import re
+import pickle
+import ast
+import requests
+
+
+""" Script to download and parse KEGG information and store it in data """
+
+def download_kegg_modules(module_name_file, chrome_driver):
+    module_ids =[]
+    module_names = {}
+    module_components_raw = {}
+    # Parse module names
+    with open(module_name_file) as module_input:
+        for line in module_input:
+            line = line.strip().split("\t")
+            module_ids.append(line[0])
+            module_names[line[0]] = line[1]
+    # Access KEGG and download module information
+    for identifier in module_ids:
+        url = "https://www.kegg.jp/kegg-bin/show_module?" + identifier
+        site_request = requests.get(url)
+        soup = BeautifulSoup(site_request.text, "html.parser")
+        module_definition = ""
+        module_definition_bool = False
+        definition = soup.find(class_ = 'definition')
+        for line in (definition.text).splitlines():
+            if line.strip() == "":
+                continue
+            elif module_definition_bool == True:
+                module_definition = line.strip()
+                module_definition_bool = False
+            elif line.strip() == 'Definition':
+                module_definition_bool = True
+        print(module_definition)
+        module_components_raw[identifier] = module_definition
+    return module_components_raw
+
+
+def parse_regular_module_dictionary(bifurcating_list_file, structural_list_file, module_components_raw):
+    bifurcating_list = []
+    structural_list = []
+    # Populate bifurcating and structural lists
+    with open(bifurcating_list_file, 'r') as bif_list:
+        for line in bif_list:
+            bifurcating_list.append(line.strip())
+    with open(structural_list_file, 'r') as bif_list:
+        for line in bif_list:
+            structural_list.append(line.strip())
+    # Parse raw module information
+    module_steps_parsed = {}
+    for key, values in module_components_raw.items():
+        values = values.replace(" --", "")
+        values = values.replace("-- ", "")
+        if key in bifurcating_list or key in structural_list:
+            continue
+        else:
+            module = []
+            parenthesis_count = 0
+            for character in values:
+                if character == "(":
+                    parenthesis_count += 1
+                    module.append(character)
+                elif character == " ":
+                    if parenthesis_count == 0:
+                        module.append(character)
+                    else:
+                        module.append("_")
+                elif character == ")":
+                    parenthesis_count -= 1
+                    module.append(character)
+                else:
+                    module.append(character)
+            steps = ''.join(module).split()
+            module_steps_parsed[key] = steps
+    # Remove modules depending on other modules
+    temporal_dictionary = module_steps_parsed.copy()
+    for key, values in temporal_dictionary.items():
+        for value in values:
+            if re.search(r'M[0-9]{5}', value) is not None:
+                del module_steps_parsed[key]
+                break
+    return module_steps_parsed
+
+
+def create_final_regular_dictionary(module_steps_parsed, module_components_raw, outfile):
+    final_regular_dict = {}
+    # Parse module steps and export them into a text file
+    with open(outfile, 'w') as output:
+        for key, value in module_steps_parsed.items():
+            output.write("{}\n".format(key))
+            output.write("{}\n".format(module_components_raw[key]))
+            output.write("{}\n".format(value))
+            output.write("{}\n".format("=="))
+            final_regular_dict[key] = {}
+            step_number = 0
+            for step in value:
+                step_number += 1
+                count = 0
+                options = 0
+                temp_string = ""
+                for char in step:
+                    if char == "(":
+                        count += 1
+                        options += 1
+                        if len(temp_string) > 1 and temp_string[-1] == "-":
+                            temp_string += "%"
+                    elif char == ")":
+                        count -= 1
+                        if count >= 1:
+                            temp_string += char
+                        else:
+                            continue
+                    elif char == ",":
+                        if count >= 2:
+                            temp_string += char
+                            print(step)
+                        else:
+                            temp_string += " "
+                    else:
+                        temp_string += char
+                if options >= 2:
+                    temp_string = temp_string.replace(")_", "_")
+                    if re.search('%.*\)', temp_string) is None:
+                        temp_string = temp_string.replace(")", "")
+                    temp_string = "".join(temp_string.rsplit("__", 1))
+                    temp_string = temp_string.split()
+                if isinstance(temp_string, str):
+                    temp_string = temp_string.split()
+                temp_string = sorted(temp_string, key=len)
+                final_regular_dict[key][step_number] = temp_string
+                output.write("{}\n".format(temp_string))
+            output.write("{}\n".format("++++++++++++++++++"))
+    return final_regular_dict
+
+
+def export_module_dictionary(dictionary, location):
+    pickle_out = open(location,"wb")
+    pickle.dump(dictionary, pickle_out)
+    pickle_out.close()
+
+
+
+def transform_module_dictionaries(bifurcating_data, structural_data, output_bifur, output_struct):
+    bifurcating_dictionary = ast.literal_eval(open(bifurcating_data).read())
+    export_module_dictionary(bifurcating_dictionary, output_bifur)
+    structural_dictionary = ast.literal_eval(open(structural_data).read())
+    export_module_dictionary(structural_dictionary, output_struct)
+
+
+# Execute parsing functions
+
+module_components_raw = download_kegg_modules("00.Module_Names.txt", 'chromedriver')
+module_steps_parsed = parse_regular_module_dictionary("01.Bifurcating_List.txt", 
+                                "02.Structural_List.txt", module_components_raw)
+final_regular_dict = create_final_regular_dictionary(module_steps_parsed, module_components_raw, "05.Modules_Parsed.txt")
+
+
+export_module_dictionary(final_regular_dict, "../01.KEGG_Regular_Module_Information.pickle")
+transform_module_dictionaries("03.Bifurcating_Modules.dict", 
+                              "04.Structural_Modules.dict", 
+                              "../02.KEGG_Bifurcating_Module_Information.pickle", 
+                              "../03.KEGG_Structural_Module_Information.pickle")
\ No newline at end of file
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/00.Module_Names.txt b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/00.Module_Names.txt
new file mode 100644
index 0000000..db9ec87
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/00.Module_Names.txt
@@ -0,0 +1,394 @@
+M00015	Proline biosynthesis, glutamate => proline	Arginine and proline metabolism	#8a3222
+M00028	Ornithine biosynthesis, glutamate => ornithine	Arginine and proline metabolism	#8a3222
+M00029	Urea cycle	Arginine and proline metabolism	#8a3222
+M00047	Creatine pathway	Arginine and proline metabolism	#8a3222
+M00763	Ornithine biosynthesis, mediated by LysW, glutamate => ornithine	Arginine and proline metabolism	#8a3222
+M00844	Arginine biosynthesis, ornithine => arginine	Arginine and proline metabolism	#8a3222
+M00845	Arginine biosynthesis, glutamate => acetylcitrulline => arginine	Arginine and proline metabolism	#8a3222
+M00879	Arginine succinyltransferase pathway, arginine => glutamate	Arginine and proline metabolism	#8a3222
+M00022	Shikimate pathway, phosphoenolpyruvate + erythrose-4P => chorismate	Aromatic amino acid metabolism	#8641b6
+M00023	Tryptophan biosynthesis, chorismate => tryptophan	Aromatic amino acid metabolism	#8641b6
+M00024	Phenylalanine biosynthesis, chorismate => phenylalanine	Aromatic amino acid metabolism	#8641b6
+M00025	Tyrosine biosynthesis, chorismate => tyrosine	Aromatic amino acid metabolism	#8641b6
+M00037	Melatonin biosynthesis, tryptophan => serotonin => melatonin	Aromatic amino acid metabolism	#8641b6
+M00038	Tryptophan metabolism, tryptophan => kynurenine => 2-aminomuconate	Aromatic amino acid metabolism	#8641b6
+M00040	Tyrosine biosynthesis, prephanate => pretyrosine => tyrosine	Aromatic amino acid metabolism	#8641b6
+M00042	Catecholamine biosynthesis, tyrosine => dopamine => noradrenaline => adrenaline	Aromatic amino acid metabolism	#8641b6
+M00043	Thyroid hormone biosynthesis, tyrosine => triiodothyronine--thyroxine	Aromatic amino acid metabolism	#8641b6
+M00044	Tyrosine degradation, tyrosine => homogentisate	Aromatic amino acid metabolism	#8641b6
+M00533	Homoprotocatechuate degradation, homoprotocatechuate => 2-oxohept-3-enedioate	Aromatic amino acid metabolism	#8641b6
+M00545	Trans-cinnamate degradation, trans-cinnamate => acetyl-CoA	Aromatic amino acid metabolism	#8641b6
+M00418	Toluene degradation, anaerobic, toluene => benzoyl-CoA	Aromatics degradation	#76d25b
+M00419	Cymene degradation, p-cymene => p-cumate	Aromatics degradation	#76d25b
+M00534	Naphthalene degradation, naphthalene => salicylate	Aromatics degradation	#76d25b
+M00537	Xylene degradation, xylene => methylbenzoate	Aromatics degradation	#76d25b
+M00538	Toluene degradation, toluene => benzoate	Aromatics degradation	#76d25b
+M00539	Cumate degradation, p-cumate => 2-oxopent-4-enoate + 2-methylpropanoate	Aromatics degradation	#76d25b
+M00540	Benzoate degradation, cyclohexanecarboxylic acid =>pimeloyl-CoA	Aromatics degradation	#76d25b
+M00541	Benzoyl-CoA degradation, benzoyl-CoA => 3-hydroxypimeloyl-CoA	Aromatics degradation	#76d25b
+M00543	Biphenyl degradation, biphenyl => 2-oxopent-4-enoate + benzoate	Aromatics degradation	#76d25b
+M00544	Carbazole degradation, carbazole => 2-oxopent-4-enoate + anthranilate	Aromatics degradation	#76d25b
+M00547	Benzene--toluene degradation, benzene => catechol -- toluene => 3-methylcatechol	Aromatics degradation	#76d25b
+M00548	Benzene degradation, benzene => catechol	Aromatics degradation	#76d25b
+M00551	Benzoate degradation, benzoate => catechol -- methylbenzoate => methylcatechol	Aromatics degradation	#76d25b
+M00568	Catechol ortho-cleavage, catechol => 3-oxoadipate	Aromatics degradation	#76d25b
+M00569	Catechol meta-cleavage, catechol => acetyl-CoA -- 4-methylcatechol => propanoyl-CoA	Aromatics degradation	#76d25b
+M00623	Phthalate degradation 1, phthalate => protocatechuate	Aromatics degradation	#76d25b
+M00624	Terephthalate degradation, terephthalate => 3,4-dihydroxybenzoate	Aromatics degradation	#76d25b
+M00636	Phthalate degradation 2, phthalate => protocatechuate	Aromatics degradation	#76d25b
+M00637	Anthranilate degradation, anthranilate => catechol	Aromatics degradation	#76d25b
+M00638	Salicylate degradation, salicylate => gentisate	Aromatics degradation	#76d25b
+M00878	Phenylacetate degradation, phenylaxetate => acetyl-CoA--succinyl-CoA	Aromatics degradation	#76d25b
+M00142	NADH:ubiquinone oxidoreductase, mitochondria	ATP synthesis	#cdd346
+M00143	NADH dehydrogenase (ubiquinone) Fe-S protein--flavoprotein complex, mitochondria	ATP synthesis	#cdd346
+M00144	NADH:quinone oxidoreductase, prokaryotes	ATP synthesis	#cdd346
+M00145	NAD(P)H:quinone oxidoreductase, chloroplasts and cyanobacteria	ATP synthesis	#cdd346
+M00146	NADH dehydrogenase (ubiquinone) 1 alpha subcomplex	ATP synthesis	#cdd346
+M00147	NADH dehydrogenase (ubiquinone) 1 beta subcomplex	ATP synthesis	#cdd346
+M00148	Succinate dehydrogenase (ubiquinone)	ATP synthesis	#cdd346
+M00149	Succinate dehydrogenase, prokaryotes	ATP synthesis	#cdd346
+M00150	Fumarate reductase, prokaryotes	ATP synthesis	#cdd346
+M00151	Cytochrome bc1 complex respiratory unit	ATP synthesis	#cdd346
+M00152	Cytochrome bc1 complex	ATP synthesis	#cdd346
+M00153	Cytochrome bd ubiquinol oxidase	ATP synthesis	#cdd346
+M00154	Cytochrome c oxidase	ATP synthesis	#cdd346
+M00155	Cytochrome c oxidase, prokaryotes	ATP synthesis	#cdd346
+M00156	Cytochrome c oxidase, cbb3-type	ATP synthesis	#cdd346
+M00157	F-type ATPase, prokaryotes and chloroplasts	ATP synthesis	#cdd346
+M00158	F-type ATPase, eukaryotes	ATP synthesis	#cdd346
+M00159	V-type ATPase, prokaryotes	ATP synthesis	#cdd346
+M00160	V-type ATPase, eukaryotes	ATP synthesis	#cdd346
+M00162	Cytochrome b6f complex	ATP synthesis	#cdd346
+M00416	Cytochrome aa3-600 menaquinol oxidase	ATP synthesis	#cdd346
+M00417	Cytochrome o ubiquinol oxidase	ATP synthesis	#cdd346
+M00672	Penicillin biosynthesis, aminoadipate + cycteine + valine => penicillin	Beta-Lactam biosynthesis	#3b2882
+M00673	Cephamycin C biosynthesis, aminoadipate + cycteine + valine => cephamycin C	Beta-Lactam biosynthesis	#3b2882
+M00674	Clavaminate biosynthesis, arginine + glyceraldehyde-3P => clavaminate	Beta-Lactam biosynthesis	#3b2882
+M00675	Carbapenem-3-carboxylate biosynthesis, pyrroline-5-carboxylate + malonyl-CoA => carbapenem-3-carboxylate	Beta-Lactam biosynthesis	#3b2882
+M00736	Nocardicin A biosynthesis, L-pHPG + arginine + serine => nocardicin A	Beta-Lactam biosynthesis	#3b2882
+M00039	Monolignol biosynthesis, phenylalanine--tyrosine => monolignol	Biosynthesis of other secondary metabolites	#cbde82
+M00137	Flavanone biosynthesis, phenylalanine => naringenin	Biosynthesis of other secondary metabolites	#cbde82
+M00138	Flavonoid biosynthesis, naringenin => pelargonidin	Biosynthesis of other secondary metabolites	#cbde82
+M00370	Glucosinolate biosynthesis, tryptophan => glucobrassicin	Biosynthesis of other secondary metabolites	#cbde82
+M00661	Paspaline biosynthesis, geranylgeranyl-PP + indoleglycerol phosphate => paspaline	Biosynthesis of other secondary metabolites	#cbde82
+M00785	Cycloserine biosynthesis, arginine--serine => cycloserine	Biosynthesis of other secondary metabolites	#cbde82
+M00786	Fumitremorgin alkaloid biosynthesis, tryptophan + proline => fumitremorgin C--A	Biosynthesis of other secondary metabolites	#cbde82
+M00787	Bacilysin biosynthesis, prephenate => bacilysin	Biosynthesis of other secondary metabolites	#cbde82
+M00788	Terpentecin biosynthesis, GGAP => terpentecin	Biosynthesis of other secondary metabolites	#cbde82
+M00789	Rebeccamycin biosynthesis, tryptophan => rebeccamycin	Biosynthesis of other secondary metabolites	#cbde82
+M00790	Pyrrolnitrin biosynthesis, tryptophan => pyrrolnitrin	Biosynthesis of other secondary metabolites	#cbde82
+M00805	Staurosporine biosynthesis, tryptophan => staurosporine	Biosynthesis of other secondary metabolites	#cbde82
+M00808	Violacein biosynthesis, tryptophan => violacein	Biosynthesis of other secondary metabolites	#cbde82
+M00814	Acarbose biosynthesis, sedoheptulopyranose-7P => acarbose	Biosynthesis of other secondary metabolites	#cbde82
+M00815	Validamycin A biosynthesis, sedoheptulopyranose-7P => validamycin A	Biosynthesis of other secondary metabolites	#cbde82
+M00819	Pentalenolactone biosynthesis, farnesyl-PP => pentalenolactone	Biosynthesis of other secondary metabolites	#cbde82
+M00835	Pyocyanine biosynthesis, chorismate => pyocyanine	Biosynthesis of other secondary metabolites	#cbde82
+M00837	Prodigiosin biosynthesis, L-proline => prodigiosin	Biosynthesis of other secondary metabolites	#cbde82
+M00838	Undecylprodigiosin biosynthesis, L-proline => undecylprodigiosin	Biosynthesis of other secondary metabolites	#cbde82
+M00848	Aurachin biosynthesis, anthranilate => aurachin A	Biosynthesis of other secondary metabolites	#cbde82
+M00875	Staphyloferrin B biosynthesis, L-serine => staphyloferrin B	Biosynthesis of other secondary metabolites	#cbde82
+M00876	Staphyloferrin A biosynthesis, L-ornithine => staphyloferrin A	Biosynthesis of other secondary metabolites	#cbde82
+M00877	Kanosamine biosynthesis glucose 6-phosphate => kanosamine	Biosynthesis of other secondary metabolites	#cbde82
+M00019	Valine--isoleucine biosynthesis, pyruvate => valine -- 2-oxobutanoate => isoleucine	Branched-chain amino acid metabolism	#656cdb
+M00036	Leucine degradation, leucine => acetoacetate + acetyl-CoA	Branched-chain amino acid metabolism	#656cdb
+M00432	Leucine biosynthesis, 2-oxoisovalerate => 2-oxoisocaproate	Branched-chain amino acid metabolism	#656cdb
+M00535	Isoleucine biosynthesis, pyruvate => 2-oxobutanoate	Branched-chain amino acid metabolism	#656cdb
+M00570	Isoleucine biosynthesis, threonine => 2-oxobutanoate => isoleucine	Branched-chain amino acid metabolism	#656cdb
+M00165	Reductive pentose phosphate cycle (Calvin cycle)	Carbon fixation	#408937
+M00166	Reductive pentose phosphate cycle, ribulose-5P => glyceraldehyde-3P	Carbon fixation	#408937
+M00167	Reductive pentose phosphate cycle, glyceraldehyde-3P => ribulose-5P	Carbon fixation	#408937
+M00168	CAM (Crassulacean acid metabolism), dark	Carbon fixation	#408937
+M00169	CAM (Crassulacean acid metabolism), light	Carbon fixation	#408937
+M00170	C4-dicarboxylic acid cycle, phosphoenolpyruvate carboxykinase type	Carbon fixation	#408937
+M00171	C4-dicarboxylic acid cycle, NAD - malic enzyme type	Carbon fixation	#408937
+M00172	C4-dicarboxylic acid cycle, NADP - malic enzyme type	Carbon fixation	#408937
+M00173	Reductive citrate cycle (Arnon-Buchanan cycle)	Carbon fixation	#408937
+M00374	Dicarboxylate-hydroxybutyrate cycle	Carbon fixation	#408937
+M00375	Hydroxypropionate-hydroxybutylate cycle	Carbon fixation	#408937
+M00376	3-Hydroxypropionate bi-cycle	Carbon fixation	#408937
+M00377	Reductive acetyl-CoA pathway (Wood-Ljungdahl pathway)	Carbon fixation	#408937
+M00579	Phosphate acetyltransferase-acetate kinase pathway, acetyl-CoA => acetate	Carbon fixation	#408937
+M00620	Incomplete reductive citrate cycle, acetyl-CoA => oxoglutarate	Carbon fixation	#408937
+M00001	Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate	Central carbohydrate metabolism	#c644a5
+M00002	Glycolysis, core module involving three-carbon compounds	Central carbohydrate metabolism	#c644a5
+M00003	Gluconeogenesis, oxaloacetate => fructose-6P	Central carbohydrate metabolism	#c644a5
+M00004	Pentose phosphate pathway (Pentose phosphate cycle)	Central carbohydrate metabolism	#c644a5
+M00005	PRPP biosynthesis, ribose 5P => PRPP	Central carbohydrate metabolism	#c644a5
+M00006	Pentose phosphate pathway, oxidative phase, glucose 6P => ribulose 5P	Central carbohydrate metabolism	#c644a5
+M00007	Pentose phosphate pathway, non-oxidative phase, fructose 6P => ribose 5P	Central carbohydrate metabolism	#c644a5
+M00008	Entner-Doudoroff pathway, glucose-6P => glyceraldehyde-3P + pyruvate	Central carbohydrate metabolism	#c644a5
+M00009	Citrate cycle (TCA cycle, Krebs cycle)	Central carbohydrate metabolism	#c644a5
+M00010	Citrate cycle, first carbon oxidation, oxaloacetate => 2-oxoglutarate	Central carbohydrate metabolism	#c644a5
+M00011	Citrate cycle, second carbon oxidation, 2-oxoglutarate => oxaloacetate	Central carbohydrate metabolism	#c644a5
+M00307	Pyruvate oxidation, pyruvate => acetyl-CoA	Central carbohydrate metabolism	#c644a5
+M00308	Semi-phosphorylative Entner-Doudoroff pathway, gluconate => glycerate-3P	Central carbohydrate metabolism	#c644a5
+M00309	Non-phosphorylative Entner-Doudoroff pathway, gluconate--galactonate => glycerate	Central carbohydrate metabolism	#c644a5
+M00580	Pentose phosphate pathway, archaea, fructose 6P => ribose 5P	Central carbohydrate metabolism	#c644a5
+M00633	Semi-phosphorylative Entner-Doudoroff pathway, gluconate--galactonate => glycerate-3P	Central carbohydrate metabolism	#c644a5
+M00112	Tocopherol--tocotorienol biosynthesis	Cofactor and vitamin metabolism	#5fda98
+M00115	NAD biosynthesis, aspartate => NAD	Cofactor and vitamin metabolism	#5fda98
+M00116	Menaquinone biosynthesis, chorismate => menaquinol	Cofactor and vitamin metabolism	#5fda98
+M00117	Ubiquinone biosynthesis, prokaryotes, chorismate => ubiquinone	Cofactor and vitamin metabolism	#5fda98
+M00119	Pantothenate biosynthesis, valine--L-aspartate => pantothenate	Cofactor and vitamin metabolism	#5fda98
+M00120	Coenzyme A biosynthesis, pantothenate => CoA	Cofactor and vitamin metabolism	#5fda98
+M00121	Heme biosynthesis, plants and bacteria, glutamate => heme	Cofactor and vitamin metabolism	#5fda98
+M00122	Cobalamin biosynthesis, cobinamide => cobalamin	Cofactor and vitamin metabolism	#5fda98
+M00123	Biotin biosynthesis, pimeloyl-ACP--CoA => biotin	Cofactor and vitamin metabolism	#5fda98
+M00124	Pyridoxal biosynthesis, erythrose-4P => pyridoxal-5P	Cofactor and vitamin metabolism	#5fda98
+M00125	Riboflavin biosynthesis, GTP => riboflavin--FMN--FAD	Cofactor and vitamin metabolism	#5fda98
+M00126	Tetrahydrofolate biosynthesis, GTP => THF	Cofactor and vitamin metabolism	#5fda98
+M00127	Thiamine biosynthesis, AIR => thiamine-P--thiamine-2P	Cofactor and vitamin metabolism	#5fda98
+M00128	Ubiquinone biosynthesis, eukaryotes, 4-hydroxybenzoate => ubiquinone	Cofactor and vitamin metabolism	#5fda98
+M00140	C1-unit interconversion, prokaryotes	Cofactor and vitamin metabolism	#5fda98
+M00141	C1-unit interconversion, eukaryotes	Cofactor and vitamin metabolism	#5fda98
+M00572	Pimeloyl-ACP biosynthesis, BioC-BioH pathway, malonyl-ACP => pimeloyl-ACP	Cofactor and vitamin metabolism	#5fda98
+M00573	Biotin biosynthesis, BioI pathway, long-chain-acyl-ACP => pimeloyl-ACP => biotin	Cofactor and vitamin metabolism	#5fda98
+M00577	Biotin biosynthesis, BioW pathway, pimelate => pimeloyl-CoA => biotin	Cofactor and vitamin metabolism	#5fda98
+M00622	Nicotinate degradation, nicotinate => fumarate	Cofactor and vitamin metabolism	#5fda98
+M00810	Nicotine degradation, pyridine pathway, nicotine => 2,6-dihydroxypyridine--succinate semialdehyde	Cofactor and vitamin metabolism	#5fda98
+M00811	Nicotine degradation, pyrrolidine pathway, nicotine => succinate semialdehyde	Cofactor and vitamin metabolism	#5fda98
+M00836	Coenzyme F430 biosynthesis, sirohydrochlorin => coenzyme F430	Cofactor and vitamin metabolism	#5fda98
+M00840	Tetrahydrofolate biosynthesis, mediated by ribA and trpF, GTP => THF	Cofactor and vitamin metabolism	#5fda98
+M00841	Tetrahydrofolate biosynthesis, mediated by PTPS, GTP => THF	Cofactor and vitamin metabolism	#5fda98
+M00842	Tetrahydrobiopterin biosynthesis, GTP => BH4	Cofactor and vitamin metabolism	#5fda98
+M00843	L-threo-Tetrahydrobiopterin biosynthesis, GTP => L-threo-BH4	Cofactor and vitamin metabolism	#5fda98
+M00846	Siroheme biosynthesis, glutamate => siroheme	Cofactor and vitamin metabolism	#5fda98
+M00847	Heme biosynthesis, archaea, siroheme => heme	Cofactor and vitamin metabolism	#5fda98
+M00868	Heme biosynthesis, animals and fungi, glycine => heme	Cofactor and vitamin metabolism	#5fda98
+M00880	Molybdenum cofactor biosynthesis, GTP => molybdenum cofactor	Cofactor and vitamin metabolism	#5fda98
+M00017	Methionine biosynthesis, apartate => homoserine => methionine	Cysteine and methionine metabolism	#782975
+M00021	Cysteine biosynthesis, serine => cysteine	Cysteine and methionine metabolism	#782975
+M00034	Methionine salvage pathway	Cysteine and methionine metabolism	#782975
+M00035	Methionine degradation	Cysteine and methionine metabolism	#782975
+M00338	Cysteine biosynthesis, homocysteine + serine => cysteine	Cysteine and methionine metabolism	#782975
+M00368	Ethylene biosynthesis, methionine => ethylene	Cysteine and methionine metabolism	#782975
+M00609	Cysteine biosynthesis, methionine => cysteine	Cysteine and methionine metabolism	#782975
+M00625	Methicillin resistance	Drug resistance	#869534
+M00627	beta-Lactam resistance, Bla system	Drug resistance	#869534
+M00639	Multidrug resistance, efflux pump MexCD-OprJ	Drug resistance	#869534
+M00641	Multidrug resistance, efflux pump MexEF-OprN	Drug resistance	#869534
+M00642	Multidrug resistance, efflux pump MexJK-OprM	Drug resistance	#869534
+M00643	Multidrug resistance, efflux pump MexXY-OprM	Drug resistance	#869534
+M00649	Multidrug resistance, efflux pump AdeABC	Drug resistance	#869534
+M00651	Vancomycin resistance, D-Ala-D-Lac type	Drug resistance	#869534
+M00652	Vancomycin resistance, D-Ala-D-Ser type	Drug resistance	#869534
+M00696	Multidrug resistance, efflux pump AcrEF-TolC	Drug resistance	#869534
+M00697	Multidrug resistance, efflux pump MdtEF-TolC	Drug resistance	#869534
+M00698	Multidrug resistance, efflux pump BpeEF-OprC	Drug resistance	#869534
+M00700	Multidrug resistance, efflux pump AbcA	Drug resistance	#869534
+M00702	Multidrug resistance, efflux pump NorB	Drug resistance	#869534
+M00704	Tetracycline resistance, efflux pump Tet38	Drug resistance	#869534
+M00705	Multidrug resistance, efflux pump MepA	Drug resistance	#869534
+M00714	Multidrug resistance, efflux pump QacA	Drug resistance	#869534
+M00718	Multidrug resistance, efflux pump MexAB-OprM	Drug resistance	#869534
+M00725	Cationic antimicrobial peptide (CAMP) resistance, dltABCD operon	Drug resistance	#869534
+M00726	Cationic antimicrobial peptide (CAMP) resistance, lysyl-phosphatidylglycerol (L-PG) synthase MprF	Drug resistance	#869534
+M00730	Cationic antimicrobial peptide (CAMP) resistance, VraFG transporter	Drug resistance	#869534
+M00744	Cationic antimicrobial peptide (CAMP) resistance, protease PgtE	Drug resistance	#869534
+M00745	Imipenem resistance, repression of porin OprD	Drug resistance	#869534
+M00746	Multidrug resistance, repression of porin OmpF	Drug resistance	#869534
+M00769	Multidrug resistance, efflux pump MexPQ-OpmE	Drug resistance	#869534
+M00851	Carbapenem resistance	Drug resistance	#869534
+M00824	9-membered enediyne core biosynthesis, malonyl-CoA => 3-hydroxyhexadeca-4,6,8,10,12,14-hexaenoyl-ACP => 9-membered enediyne core	Enediyne biosynthesis	#d27bde
+M00825	10-membered enediyne core biosynthesis, malonyl-CoA => 3-hydroxyhexadeca-4,6,8,10,12,14-hexaenoyl-ACP => 10-membered enediyne core	Enediyne biosynthesis	#d27bde
+M00826	C-1027 benzoxazolinate moiety biosynthesis, chorismate => benzoxazolinyl-CoA	Enediyne biosynthesis	#d27bde
+M00827	C-1027 beta-amino acid moiety biosynthesis, tyrosine => 3-chloro-4,5-dihydroxy-beta-phenylalanyl-PCP	Enediyne biosynthesis	#d27bde
+M00828	Maduropeptin beta-hydroxy acid moiety biosynthesis, tyrosine => 3-(4-hydroxyphenyl)-3-oxopropanoyl-PCP	Enediyne biosynthesis	#d27bde
+M00829	3,6-Dimethylsalicylyl-CoA biosynthesis, malonyl-CoA => 6-methylsalicylate => 3,6-dimethylsalicylyl-CoA	Enediyne biosynthesis	#d27bde
+M00830	Neocarzinostatin naphthoate moiety biosynthesis, malonyl-CoA => 2-hydroxy-5-methyl-1-naphthoate => 2-hydroxy-7-methoxy-5-methyl-1-naphthoyl-CoA	Enediyne biosynthesis	#d27bde
+M00831	Kedarcidin 2-hydroxynaphthoate moiety biosynthesis, malonyl-CoA => 3,6,8-trihydroxy-2-naphthoate => 3-hydroxy-7,8-dimethoxy-6-isopropoxy-2-naphthoyl-CoA	Enediyne biosynthesis	#d27bde
+M00832	Kedarcidin 2-aza-3-chloro-beta-tyrosine moiety biosynthesis, azatyrosine => 2-aza-3-chloro-beta-tyrosyl-PCP	Enediyne biosynthesis	#d27bde
+M00833	Calicheamicin biosynthesis, calicheamicinone => calicheamicin	Enediyne biosynthesis	#d27bde
+M00834	Calicheamicin orsellinate moiety biosynthesis, malonyl-CoA => orsellinate-ACP => 5-iodo-2,3-dimethoxyorsellinate-ACP	Enediyne biosynthesis	#d27bde
+M00082	Fatty acid biosynthesis, initiation	Fatty acid metabolism	#d9a344
+M00083	Fatty acid biosynthesis, elongation	Fatty acid metabolism	#d9a344
+M00085	Fatty acid elongation in mitochondria	Fatty acid metabolism	#d9a344
+M00086	beta-Oxidation, acyl-CoA synthesis	Fatty acid metabolism	#d9a344
+M00087	beta-Oxidation	Fatty acid metabolism	#d9a344
+M00415	Fatty acid elongation in endoplasmic reticulum	Fatty acid metabolism	#d9a344
+M00861	beta-Oxidation, peroxisome, VLCFA	Fatty acid metabolism	#d9a344
+M00873	Fatty acid biosynthesis in mitochondria, animals	Fatty acid metabolism	#d9a344
+M00874	Fatty acid biosynthesis in mitochondria, fungi	Fatty acid metabolism	#d9a344
+M00055	N-glycan precursor biosynthesis	Glycan biosynthesis	#588cd6
+M00056	O-glycan biosynthesis, mucin type core	Glycan biosynthesis	#588cd6
+M00065	GPI-anchor biosynthesis, core oligosaccharide	Glycan biosynthesis	#588cd6
+M00068	Glycosphingolipid biosynthesis, globo-series, LacCer => Gb4Cer	Glycan biosynthesis	#588cd6
+M00069	Glycosphingolipid biosynthesis, ganglio series, LacCer => GT3	Glycan biosynthesis	#588cd6
+M00070	Glycosphingolipid biosynthesis, lacto-series, LacCer => Lc4Cer	Glycan biosynthesis	#588cd6
+M00071	Glycosphingolipid biosynthesis, neolacto-series, LacCer => nLc4Cer	Glycan biosynthesis	#588cd6
+M00072	N-glycosylation by oligosaccharyltransferase	Glycan biosynthesis	#588cd6
+M00073	N-glycan precursor trimming	Glycan biosynthesis	#588cd6
+M00074	N-glycan biosynthesis, high-mannose type	Glycan biosynthesis	#588cd6
+M00075	N-glycan biosynthesis, complex type	Glycan biosynthesis	#588cd6
+M00872	O-glycan biosynthesis, mannose type (core M3)	Glycan biosynthesis	#588cd6
+M00057	Glycosaminoglycan biosynthesis, linkage tetrasaccharide	Glycosaminoglycan metabolism	#d66432
+M00058	Glycosaminoglycan biosynthesis, chondroitin sulfate backbone	Glycosaminoglycan metabolism	#d66432
+M00059	Glycosaminoglycan biosynthesis, heparan sulfate backbone	Glycosaminoglycan metabolism	#d66432
+M00076	Dermatan sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00077	Chondroitin sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00078	Heparan sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00079	Keratan sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00026	Histidine biosynthesis, PRPP => histidine	Histidine metabolism	#66d7bf
+M00045	Histidine degradation, histidine => N-formiminoglutamate => glutamate	Histidine metabolism	#66d7bf
+M00066	Lactosylceramide biosynthesis	Lipid metabolism	#d53e55
+M00067	Sulfoglycolipids biosynthesis, ceramide--1-alkyl-2-acylglycerol => sulfatide--seminolipid	Lipid metabolism	#d53e55
+M00088	Ketone body biosynthesis, acetyl-CoA => acetoacetate--3-hydroxybutyrate--acetone	Lipid metabolism	#d53e55
+M00089	Triacylglycerol biosynthesis	Lipid metabolism	#d53e55
+M00090	Phosphatidylcholine (PC) biosynthesis, choline => PC	Lipid metabolism	#d53e55
+M00091	Phosphatidylcholine (PC) biosynthesis, PE => PC	Lipid metabolism	#d53e55
+M00092	Phosphatidylethanolamine (PE) biosynthesis, ethanolamine => PE	Lipid metabolism	#d53e55
+M00093	Phosphatidylethanolamine (PE) biosynthesis, PA => PS => PE	Lipid metabolism	#d53e55
+M00094	Ceramide biosynthesis	Lipid metabolism	#d53e55
+M00098	Acylglycerol degradation	Lipid metabolism	#d53e55
+M00099	Sphingosine biosynthesis	Lipid metabolism	#d53e55
+M00100	Sphingosine degradation	Lipid metabolism	#d53e55
+M00113	Jasmonic acid biosynthesis	Lipid metabolism	#d53e55
+M00060	KDO2-lipid A biosynthesis, Raetz pathway, LpxL-LpxM type	Lipopolysaccharide metabolism	#83d2de
+M00063	CMP-KDO biosynthesis	Lipopolysaccharide metabolism	#83d2de
+M00064	ADP-L-glycero-D-manno-heptose biosynthesis	Lipopolysaccharide metabolism	#83d2de
+M00866	KDO2-lipid A biosynthesis, Raetz pathway, non-LpxL-LpxM type	Lipopolysaccharide metabolism	#83d2de
+M00867	KDO2-lipid A modification pathway	Lipopolysaccharide metabolism	#83d2de
+M00016	Lysine biosynthesis, succinyl-DAP pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00030	Lysine biosynthesis, AAA pathway, 2-oxoglutarate => 2-aminoadipate => lysine	Lysine metabolism	#d84e8b
+M00031	Lysine biosynthesis, mediated by LysW, 2-aminoadipate => lysine	Lysine metabolism	#d84e8b
+M00032	Lysine degradation, lysine => saccharopine => acetoacetyl-CoA	Lysine metabolism	#d84e8b
+M00433	Lysine biosynthesis, 2-oxoglutarate => 2-oxoadipate	Lysine metabolism	#d84e8b
+M00525	Lysine biosynthesis, acetyl-DAP pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00526	Lysine biosynthesis, DAP dehydrogenase pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00527	Lysine biosynthesis, DAP aminotransferase pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00773	Tylosin biosynthesis, methylmalonyl-CoA + malonyl-CoA => tylactone => tylosin	Macrolide biosynthesis	#2e4b26
+M00774	Erythromycin biosynthesis, propanoyl-CoA + methylmalonyl-CoA => deoxyerythronolide B => erythromycin A--B	Macrolide biosynthesis	#2e4b26
+M00775	Oleandomycin biosynthesis, malonyl-CoA + methylmalonyl-CoA => 8,8a-deoxyoleandolide => oleandomycin	Macrolide biosynthesis	#2e4b26
+M00776	Pikromycin--methymycin biosynthesis, methylmalonyl-CoA + malonyl-CoA => narbonolide--10-deoxymethynolide => pikromycin--methymycin	Macrolide biosynthesis	#2e4b26
+M00777	Avermectin biosynthesis, 2-methylbutanoyl-CoA--isobutyryl-CoA => 6,8a-Seco-6,8a-deoxy-5-oxoavermectin 1a--1b aglycone => avermectin A1a--B1a--A1b--B1b	Macrolide biosynthesis	#2e4b26
+M00611	Oxygenic photosynthesis in plants and cyanobacteria	Metabolic capacity	#9378c3
+M00612	Anoxygenic photosynthesis in purple bacteria	Metabolic capacity	#9378c3
+M00613	Anoxygenic photosynthesis in green nonsulfur bacteria	Metabolic capacity	#9378c3
+M00614	Anoxygenic photosynthesis in green sulfur bacteria	Metabolic capacity	#9378c3
+M00615	Nitrate assimilation	Metabolic capacity	#9378c3
+M00616	Sulfate-sulfur assimilation	Metabolic capacity	#9378c3
+M00617	Methanogen	Metabolic capacity	#9378c3
+M00618	Acetogen	Metabolic capacity	#9378c3
+M00174	Methane oxidation, methanotroph, methane => formaldehyde	Methane metabolism	#9e7336
+M00344	Formaldehyde assimilation, xylulose monophosphate pathway	Methane metabolism	#9e7336
+M00345	Formaldehyde assimilation, ribulose monophosphate pathway	Methane metabolism	#9e7336
+M00346	Formaldehyde assimilation, serine pathway	Methane metabolism	#9e7336
+M00356	Methanogenesis, methanol => methane	Methane metabolism	#9e7336
+M00357	Methanogenesis, acetate => methane	Methane metabolism	#9e7336
+M00358	Coenzyme M biosynthesis	Methane metabolism	#9e7336
+M00378	F420 biosynthesis	Methane metabolism	#9e7336
+M00422	Acetyl-CoA pathway, CO2 => acetyl-CoA	Methane metabolism	#9e7336
+M00563	Methanogenesis, methylamine--dimethylamine--trimethylamine => methane	Methane metabolism	#9e7336
+M00567	Methanogenesis, CO2 => methane	Methane metabolism	#9e7336
+M00608	2-Oxocarboxylic acid chain extension, 2-oxoglutarate => 2-oxoadipate => 2-oxopimelate => 2-oxosuberate	Methane metabolism	#9e7336
+M00175	Nitrogen fixation, nitrogen => ammonia	Nitrogen metabolism	#2c2351
+M00528	Nitrification, ammonia => nitrite	Nitrogen metabolism	#2c2351
+M00529	Denitrification, nitrate => nitrogen	Nitrogen metabolism	#2c2351
+M00530	Dissimilatory nitrate reduction, nitrate => ammonia	Nitrogen metabolism	#2c2351
+M00531	Assimilatory nitrate reduction, nitrate => ammonia	Nitrogen metabolism	#2c2351
+M00804	Complete nitrification, comammox, ammonia => nitrite => nitrate	Nitrogen metabolism	#2c2351
+M00027	GABA (gamma-Aminobutyrate) shunt	Other amino acid metabolism	#c5d7a9
+M00118	Glutathione biosynthesis, glutamate => glutathione	Other amino acid metabolism	#c5d7a9
+M00369	Cyanogenic glycoside biosynthesis, tyrosine => dhurrin	Other amino acid metabolism	#c5d7a9
+M00012	Glyoxylate cycle	Other carbohydrate metabolism	#872b4e
+M00013	Malonate semialdehyde pathway, propanoyl-CoA => acetyl-CoA	Other carbohydrate metabolism	#872b4e
+M00014	Glucuronate pathway (uronate pathway)	Other carbohydrate metabolism	#872b4e
+M00061	D-Glucuronate degradation, D-glucuronate => pyruvate + D-glyceraldehyde 3P	Other carbohydrate metabolism	#872b4e
+M00081	Pectin degradation	Other carbohydrate metabolism	#872b4e
+M00114	Ascorbate biosynthesis, plants, glucose-6P => ascorbate	Other carbohydrate metabolism	#872b4e
+M00129	Ascorbate biosynthesis, animals, glucose-1P => ascorbate	Other carbohydrate metabolism	#872b4e
+M00130	Inositol phosphate metabolism, PI=> PIP2 => Ins(1,4,5)P3 => Ins(1,3,4,5)P4	Other carbohydrate metabolism	#872b4e
+M00131	Inositol phosphate metabolism, Ins(1,3,4,5)P4 => Ins(1,3,4)P3 => myo-inositol	Other carbohydrate metabolism	#872b4e
+M00132	Inositol phosphate metabolism, Ins(1,3,4)P3 => phytate	Other carbohydrate metabolism	#872b4e
+M00373	Ethylmalonyl pathway	Other carbohydrate metabolism	#872b4e
+M00532	Photorespiration	Other carbohydrate metabolism	#872b4e
+M00549	Nucleotide sugar biosynthesis, glucose => UDP-glucose	Other carbohydrate metabolism	#872b4e
+M00550	Ascorbate degradation, ascorbate => D-xylulose-5P	Other carbohydrate metabolism	#872b4e
+M00552	D-galactonate degradation, De Ley-Doudoroff pathway, D-galactonate => glycerate-3P	Other carbohydrate metabolism	#872b4e
+M00554	Nucleotide sugar biosynthesis, galactose => UDP-galactose	Other carbohydrate metabolism	#872b4e
+M00565	Trehalose biosynthesis, D-glucose 1P => trehalose	Other carbohydrate metabolism	#872b4e
+M00630	D-Galacturonate degradation (fungi), D-galacturonate => glycerol	Other carbohydrate metabolism	#872b4e
+M00631	D-Galacturonate degradation (bacteria), D-galacturonate => pyruvate + D-glyceraldehyde 3P	Other carbohydrate metabolism	#872b4e
+M00632	Galactose degradation, Leloir pathway, galactose => alpha-D-glucose-1P	Other carbohydrate metabolism	#872b4e
+M00740	Methylaspartate cycle	Other carbohydrate metabolism	#872b4e
+M00741	Propanoyl-CoA metabolism, propanoyl-CoA => succinyl-CoA	Other carbohydrate metabolism	#872b4e
+M00761	Undecaprenylphosphate alpha-L-Ara4N biosynthesis, UDP-GlcA => undecaprenyl phosphate alpha-L-Ara4N	Other carbohydrate metabolism	#872b4e
+M00854	Glycogen biosynthesis, glucose-1P => glycogen--starch	Other carbohydrate metabolism	#872b4e
+M00855	Glycogen degradation, glycogen => glucose-6P	Other carbohydrate metabolism	#872b4e
+M00097	beta-Carotene biosynthesis, GGAP => beta-carotene	Other terpenoid biosynthesis	#6e9368
+M00371	Castasterone biosynthesis, campesterol => castasterone	Other terpenoid biosynthesis	#6e9368
+M00372	Abscisic acid biosynthesis, beta-carotene => abscisic acid	Other terpenoid biosynthesis	#6e9368
+M00363	EHEC pathogenicity signature, Shiga toxin	Pathogenicity	#66406d
+M00542	EHEC--EPEC pathogenicity signature, T3SS and effectors	Pathogenicity	#66406d
+M00564	Helicobacter pylori pathogenicity signature, cagA pathogenicity island	Pathogenicity	#66406d
+M00574	Pertussis pathogenicity signature, pertussis toxin	Pathogenicity	#66406d
+M00575	Pertussis pathogenicity signature, T1SS	Pathogenicity	#66406d
+M00576	ETEC pathogenicity signature, heat-labile and heat-stable enterotoxins	Pathogenicity	#66406d
+M00850	Vibrio cholerae pathogenicity signature, cholera toxins	Pathogenicity	#66406d
+M00852	Vibrio cholerae pathogenicity signature, toxin coregulated pilus	Pathogenicity	#66406d
+M00853	ETEC pathogenicity signature, colonization factors	Pathogenicity	#66406d
+M00856	Salmonella enterica pathogenicity signature, typhoid toxin	Pathogenicity	#66406d
+M00857	Salmonella enterica pathogenicity signature, Vi antigen	Pathogenicity	#66406d
+M00859	Bacillus anthracis pathogenicity signature, anthrax toxin	Pathogenicity	#66406d
+M00860	Bacillus anthracis pathogenicity signature, polyglutamic acid capsule biosynthesis	Pathogenicity	#66406d
+M00161	Photosystem II	Photosynthesis	#cfa68a
+M00163	Photosystem I	Photosynthesis	#cfa68a
+M00597	Anoxygenic photosystem II [BR:ko00194]	Photosynthesis	#cfa68a
+M00598	Anoxygenic photosystem I [BR:ko00194]	Photosynthesis	#cfa68a
+M00660	Xanthomonas spp. pathogenicity signature, T3SS and effectors	Plant pathogenicity	#461d27
+M00133	Polyamine biosynthesis, arginine => agmatine => putrescine => spermidine	Polyamine biosynthesis	#a5b3da
+M00134	Polyamine biosynthesis, arginine => ornithine => putrescine	Polyamine biosynthesis	#a5b3da
+M00135	GABA biosynthesis, eukaryotes, putrescine => GABA	Polyamine biosynthesis	#a5b3da
+M00136	GABA biosynthesis, prokaryotes, putrescine => GABA	Polyamine biosynthesis	#a5b3da
+M00793	dTDP-L-rhamnose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00794	dTDP-6-deoxy-D-allose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00795	dTDP-beta-L-noviose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00796	dTDP-D-mycaminose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00797	dTDP-D-desosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00798	dTDP-L-mycarose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00799	dTDP-L-oleandrose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00800	dTDP-L-megosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00801	dTDP-L-olivose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00802	dTDP-D-forosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00803	dTDP-D-angolosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00048	Inosine monophosphate biosynthesis, PRPP + glutamine => IMP	Purine metabolism	#e0a7d2
+M00049	Adenine ribonucleotide biosynthesis, IMP => ADP,ATP	Purine metabolism	#e0a7d2
+M00050	Guanine ribonucleotide biosynthesis IMP => GDP,GTP	Purine metabolism	#e0a7d2
+M00546	Purine degradation, xanthine => urea	Purine metabolism	#e0a7d2
+M00046	Pyrimidine degradation, uracil => beta-alanine, thymine => 3-aminoisobutanoate	Pyrimidine metabolism	#25585e
+M00051	Uridine monophosphate biosynthesis, glutamine (+ PRPP) => UMP	Pyrimidine metabolism	#25585e
+M00052	Pyrimidine ribonucleotide biosynthesis, UMP => UDP--UTP,CDP--CTP	Pyrimidine metabolism	#25585e
+M00053	Pyrimidine deoxyribonuleotide biosynthesis, CDP--CTP => dCDP--dCTP,dTDP--dTTP	Pyrimidine metabolism	#25585e
+M00018	Threonine biosynthesis, aspartate => homoserine => threonine	Serine and threonine metabolism	#de7d78
+M00020	Serine biosynthesis, glycerate-3P => serine	Serine and threonine metabolism	#de7d78
+M00033	Ectoine biosynthesis, aspartate => ectoine	Serine and threonine metabolism	#de7d78
+M00555	Betaine biosynthesis, choline => betaine	Serine and threonine metabolism	#de7d78
+M00101	Cholesterol biosynthesis, squalene 2,3-epoxide => cholesterol	Sterol biosynthesis	#4e96a2
+M00102	Ergocalciferol biosynthesis	Sterol biosynthesis	#4e96a2
+M00103	Cholecalciferol biosynthesis	Sterol biosynthesis	#4e96a2
+M00104	Bile acid biosynthesis, cholesterol => cholate--chenodeoxycholate	Sterol biosynthesis	#4e96a2
+M00106	Conjugated bile acid biosynthesis, cholate => taurocholate--glycocholate	Sterol biosynthesis	#4e96a2
+M00107	Steroid hormone biosynthesis, cholesterol => prognenolone => progesterone	Sterol biosynthesis	#4e96a2
+M00108	C21-Steroid hormone biosynthesis, progesterone => corticosterone--aldosterone	Sterol biosynthesis	#4e96a2
+M00109	C21-Steroid hormone biosynthesis, progesterone => cortisol--cortisone	Sterol biosynthesis	#4e96a2
+M00110	C19--C18-Steroid hormone biosynthesis, pregnenolone => androstenedione => estrone	Sterol biosynthesis	#4e96a2
+M00862	beta-Oxidation, peroxisome, tri--dihydroxycholestanoyl-CoA => choloyl--chenodeoxycholoyl-CoA	Sterol biosynthesis	#4e96a2
+M00176	Assimilatory sulfate reduction, sulfate => H2S	Sulfur metabolism	#4e96a2
+M00595	Thiosulfate oxidation by SOX complex, thiosulfate => sulfate	Sulfur metabolism	#4e96a2
+M00596	Dissimilatory sulfate reduction, sulfate => H2S	Sulfur metabolism	#4e96a2
+M00664	Nodulation	Symbiosis	#88574e
+M00095	C5 isoprenoid biosynthesis, mevalonate pathway	Terpenoid backbone biosynthesis	#4e6089
+M00096	C5 isoprenoid biosynthesis, non-mevalonate pathway	Terpenoid backbone biosynthesis	#4e6089
+M00364	C10-C20 isoprenoid biosynthesis, bacteria	Terpenoid backbone biosynthesis	#4e6089
+M00365	C10-C20 isoprenoid biosynthesis, archaea	Terpenoid backbone biosynthesis	#4e6089
+M00366	C10-C20 isoprenoid biosynthesis, plants	Terpenoid backbone biosynthesis	#4e6089
+M00367	C10-C20 isoprenoid biosynthesis, non-plant eukaryotes	Terpenoid backbone biosynthesis	#4e6089
+M00849	C5 isoprenoid biosynthesis, mevalonate pathway, archaea	Terpenoid backbone biosynthesis	#4e6089
+M00778	Type II polyketide backbone biosynthesis, acyl-CoA + malonyl-CoA => polyketide	Type II polyketide biosynthesis	#af7194
+M00779	Dihydrokalafungin biosynthesis, octaketide => dihydrokalafungin	Type II polyketide biosynthesis	#af7194
+M00780	Tetracycline--oxytetracycline biosynthesis, pretetramide => tetracycline--oxytetracycline	Type II polyketide biosynthesis	#af7194
+M00781	Nogalavinone--aklavinone biosynthesis, deoxynogalonate--deoxyaklanonate => nogalavinone--aklavinone	Type II polyketide biosynthesis	#af7194
+M00782	Mithramycin biosynthesis, 4-demethylpremithramycinone => mithramycin	Type II polyketide biosynthesis	#af7194
+M00783	Tetracenomycin C--8-demethyltetracenomycin C biosynthesis, tetracenomycin F2 => tetracenomycin C--8-demethyltetracenomycin C	Type II polyketide biosynthesis	#af7194
+M00784	Elloramycin biosynthesis, 8-demethyltetracenomycin C => elloramycin A	Type II polyketide biosynthesis	#af7194
+M00823	Chlortetracycline biosynthesis, pretetramide => chlortetracycline	Type II polyketide biosynthesis	#af7194
\ No newline at end of file
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/01.Bifurcating_List.txt b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/01.Bifurcating_List.txt
new file mode 100644
index 0000000..8d909f9
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/01.Bifurcating_List.txt
@@ -0,0 +1,23 @@
+M00373
+M00532
+M00376
+M00378
+M00088
+M00031
+M00763
+M00133
+M00075
+M00872
+M00125
+M00119
+M00122
+M00827
+M00828
+M00832
+M00833
+M00837
+M00838
+M00785
+M00307
+M00048
+M00127
\ No newline at end of file
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/02.Structural_List.txt b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/02.Structural_List.txt
new file mode 100644
index 0000000..7fbba00
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/02.Structural_List.txt
@@ -0,0 +1,10 @@
+M00144
+M00149
+M00151
+M00152
+M00154
+M00155
+M00153
+M00156
+M00158
+M00160
\ No newline at end of file
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/03.Bifurcating_Modules.dict b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/03.Bifurcating_Modules.dict
new file mode 100644
index 0000000..a09f7b4
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/03.Bifurcating_Modules.dict
@@ -0,0 +1 @@
+{'M00373':{'M00373_1':{1:'K00626',2:'K00023',3:'K17865',4:'K14446',5:'K14447',6:'K14448',7:'K14449',8:'K08691',9:'K14451'},'M00373_2':{1:'K00626',2:'K00023',3:'K17865',4:'K14446',5:'K14447',6:'K14448',7:'K14449',8:'K01965+K01966',9:'K05606',10:'K01847'}},'M00532':{'M00532_1':{1:'K01601-K01602',2:'K19269',3:'K11517',4:'K03781',5:'K14272',6:'K00600',7:'K00830',8:'K15893,K15919',9:'K15918'},'M00532_2':{1:'K01601-K01602',2:'K19269',3:'K11517',4:'K03781',5:'K14272',6:'K00600',7:'K00830',8:'K00281+K00605+K00382+K02437'}},'M00376':{'M00376_1':{1:'K02160+K01961+K01962+K01963',2:'K14468',3:'K14469',4:'K15052',5:'K05606',6:['K01847','K01848+K01849'],7:'K14471+K14472',8:'K00239+K00240+K00241',9:'K01679'},'M00376_2':{1:'K02160+K01961+K01962+K01963',2:'K14468',3:'K14469',4:'K08691',5:'K14449',6:'K14470',7:'K09709'}},'M00378':{'M00378_1':{1:['K11779','K11780+K11781'],2:'K11212',3:'K12234'},'M00378_2':{1:'K14941',2:'K11212',3:'K12234'}},'M00088':{'M00088_1':{1:'K00626',2:'K01641',3:'K01640',4:'K00019'},'M00088_2':{1:'K00626',2:'K01641',3:'K01640',4:'K01574'}},'M00031':{'M00031_1':{1:'K05826',2:'K05827',3:'K05828',4:'K05829',5:'K05830',6:'K05831'}},'M00763':{'M00763_1':{1:'K05826',2:'K19412',3:'K05828',4:'K05829',5:'K05830',6:'K05831'}},'M00133':{'M00133_1':{1:'K01583,K01584,K01585,K02626',2:'K01480',3:'K01611'},'M00133_2':{1:'K00797',2:'K01611'}},'M00075':{'M00075_1':{1:'K01231',2:'K00736',3:'K00737'},'M00075_2':{1:'K01231',2:'K00736',3:'K00738',4:'K00744,K09661',5:'K13748'},'M00075_3':{1:'K01231',2:'K00736',3:'K00717',4:'K07966,K07967,K07968',5:'K00778,K00779'}},'M00872':{'M00872_1':{1:'K00728',2:'K18207',3:'K09654',4:'K17547',5:'K19872',6:'K19873',7:'K21052',8:'K21032',9:'K09668'},'M00872_2':{1:'K21031',2:'K19872',3:'K19873',4:'K21052',5:'K21032',6:'K09668'}},'M00125':{'M00125_1':{1:'K01497,K14652',2:['K01498_K00082','K11752'],3:'K22912,K20860,K20861,K20862,K21063,K21064',4:'K00794',5:'K00793',6:['K00861,K20884_K00953,K22949','K11753']},'M00125_2':{1:'K02858,K14652',2:'K00794',3:'K00793',4:['K00861,K20884_K00953,K22949','K11753']}},'M00119':{'M00119_1':{1:'K00826',2:'K00606',3:'K00077',4:'K01918,K13799'},'M00119_2':{1:'K01579',2:'K01918,K13799'}},'M00122':{'M00122_1':{1:'K00798,K19221',2:'K02232',3:'K02225,K02227',4:'K02231',5:'K02233'},'M00122_2':{1:'K00768',2:'K02226,K22316',3:'K02233'}},'M00827':{'M00827_1':{1:'K21183',2:'K21181',3:'K21182',4:'K16431',5:'K21184',6:'K21185'}},'M00828':{'M00828_1':{1:'K21183',2:'K21181',3:'K21182',4:'K21188'}},'M00832':{'M00832_1':{1:'K21183',2:'K21227',3:'K21228',4:'K16431',5:'K21185'}},'M00833':{'M00833_1':{1:'K21254',2:'K21255',3:'K21256',4:'K21257',5:'K21258',6:'K21261',7:'K21262',8:'K21263'},'M00833_2':{1:'K21259',2:'K21260',3:'K21261',4:'K21262',5:'K21263'}},'M00837':{'M00837_1':{1:'K21780+K21781',2:'K21782',3:'K21783',4:'K21784',5:'K21785',6:'K21786',7:'K21787'},'M00837_2':{1:'K21428',2:'K21778',3:'K21779',4:'K21787'}},'M00838':{'M00838_1':{1:'K21780+K21781',2:'K21782',3:'K21783',4:'K21784',5:'K21785',6:'K21786',7:'K21787'},'M00837_2':{1:'K21791',2:'K21792',3:'K21793',4:'K21787'}},'M00785':{'M00785_1':{1:'K19741',2:'K19723',3:'K19725',4:'K19724',5:'K19727'},'M00785_2':{1:'K19726',2:'K19725',3:'K19724',4:'K19727'}},'M00307':{'M00307_1':{1:'K03737'},'M00307_2':{1:'K00169+K00170+K00171+K00172'},'M00307_3':{1:'K00161+K00162+K00627+K00382-K13997'},'M00307_4':{1:'K00163+K00627+K00382-K13997'}},'M00048':{'M00048_1':{1:'K00764',2:'K01945,K11787,K11788,K13713',3:'K00601,K11175,K08289,K11787,K01492',4:['K01952','K23269+K23264+K23265','K23270+K23265'], 5:'K01933,K11787',6:'K01923,K01587,K13713',7:'K01756',8:['K00602','K01492','K06863_K11176']}, 'M00048_2':{1:'K00764',2:'K01945,K11787,K11788,K13713',3:'K00601,K11175,K08289,K11787,K01492',4:['K01952','K23269+K23264+K23265','K23270+K23265'],5:'K11788',6:['K01587','K11808','K01589_K01588'],7:'K01923,K01587,K13713',8:'K01756',9:['K00602','K01492','K06863_K11176']}},'M00127':{'M00127_1':{1:'K03147',2:'K00877,K00941,K14153',3:'K00788,K14153,K14154',4:'K00946'},'M00127_2':{1:'K00878,K14154',2:'K00788,K14153,K14154',3:'K00946'}}}
\ No newline at end of file
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/04.Structural_Modules.dict b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/04.Structural_Modules.dict
new file mode 100644
index 0000000..b9aa2e8
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/04.Structural_Modules.dict
@@ -0,0 +1 @@
+{'M00144':['K00330', 'K00331+K00332+K00333,K00331+K13378,K13380','K00334+K00335+K00336+K00337+K00338+K00339+K00340','K00341+K00342,K15863','K00343'],'M00149':['K00241','K00242,K18859,K18860','K00239+K00240'],'M00151':[['K03890+K03891+K03889','K03886+K03887+K03888','K00412+K00413,K00410_K00411']],'M00152':['K00412+K00413,K00410','K00411+K00414+K00415+K00416+K00417+K00418+K00419+K00420'],'M00154':['K02257+K02262+K02256+K02261+K02263+K02264+K02265+K02266+K02267+K02268','K02269,K02270-K02271','K02272-K02273+K02258+K02259+K02260'],'M00155':['K02275','K02274+K02276,K15408','K02277'],'M00153':['K00425+K00426','K00424,K22501'],'M00156':['K00404+K00405,K15862','K00407+K00406'],'M00158':['K02132+K02133+K02136+K02134+K02135+K02137+K02126+K02127+K02128+K02138','K02129,K01549','K02130,K02139','K02140','K02141,K02131','K02142-K02143+K02125'],'M00160':['K02145+K02147+K02148+K02149+K02150+K02151+K02152+K02144+K02154','K03661,K02155','K02146+K02153+K03662']}
\ No newline at end of file
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/05.Modules_Parsed.txt b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/05.Modules_Parsed.txt
new file mode 100644
index 0000000..c8229a5
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/05.Modules_Parsed.txt
@@ -0,0 +1,3343 @@
+M00001
+(K00844,K12407,K00845,K00886,K08074,K00918) (K01810,K06859,K13810,K15916) (K00850,K16370,K00918) (K01623,K01624,K11645,K16305,K16306) K01803 ((K00134,K00150) K00927,K11389) (K01834,K15633,K15634,K15635) K01689 (K00873,K12406)
+['(K00844,K12407,K00845,K00886,K08074,K00918)', '(K01810,K06859,K13810,K15916)', '(K00850,K16370,K00918)', '(K01623,K01624,K11645,K16305,K16306)', 'K01803', '((K00134,K00150)_K00927,K11389)', '(K01834,K15633,K15634,K15635)', 'K01689', '(K00873,K12406)']
+==
+['K00844', 'K12407', 'K00845', 'K00886', 'K08074', 'K00918']
+['K01810', 'K06859', 'K13810', 'K15916']
+['K00850', 'K16370', 'K00918']
+['K01623', 'K01624', 'K11645', 'K16305', 'K16306']
+['K01803']
+['K11389', 'K00134,K00150_K00927']
+['K01834', 'K15633', 'K15634', 'K15635']
+['K01689']
+['K00873', 'K12406']
+++++++++++++++++++
+M00002
+K01803 ((K00134,K00150) K00927,K11389) (K01834,K15633,K15634,K15635) K01689 (K00873,K12406)
+['K01803', '((K00134,K00150)_K00927,K11389)', '(K01834,K15633,K15634,K15635)', 'K01689', '(K00873,K12406)']
+==
+['K01803']
+['K11389', 'K00134,K00150_K00927']
+['K01834', 'K15633', 'K15634', 'K15635']
+['K01689']
+['K00873', 'K12406']
+++++++++++++++++++
+M00003
+(K01596,K01610) K01689 (K01834,K15633,K15634,K15635) K00927 (K00134,K00150) K01803 ((K01623,K01624,K11645) (K03841,K02446,K11532,K01086,K04041),K01622)
+['(K01596,K01610)', 'K01689', '(K01834,K15633,K15634,K15635)', 'K00927', '(K00134,K00150)', 'K01803', '((K01623,K01624,K11645)_(K03841,K02446,K11532,K01086,K04041),K01622)']
+==
+['K01596', 'K01610']
+['K01689']
+['K01834', 'K15633', 'K15634', 'K15635']
+['K00927']
+['K00134', 'K00150']
+['K01803']
+['K01622', 'K01623,K01624,K11645_K03841,K02446,K11532,K01086,K04041']
+++++++++++++++++++
+M00009
+(K01647,K05942) (K01681,K01682) (K00031,K00030) (K00164+K00658+K00382,K00174+K00175-K00177-K00176) (K01902+K01903,K01899+K01900,K18118) (K00234+K00235+K00236+K00237,K00239+K00240+K00241-(K00242,K18859,K18860),K00244+K00245+K00246-K00247) (K01676,K01679,K01677+K01678) (K00026,K00025,K00024,K00116)
+['(K01647,K05942)', '(K01681,K01682)', '(K00031,K00030)', '(K00164+K00658+K00382,K00174+K00175-K00177-K00176)', '(K01902+K01903,K01899+K01900,K18118)', '(K00234+K00235+K00236+K00237,K00239+K00240+K00241-(K00242,K18859,K18860),K00244+K00245+K00246-K00247)', '(K01676,K01679,K01677+K01678)', '(K00026,K00025,K00024,K00116)']
+==
+['K01647', 'K05942']
+['K01681', 'K01682']
+['K00031', 'K00030']
+['K00164+K00658+K00382', 'K00174+K00175-K00177-K00176']
+['K18118', 'K01902+K01903', 'K01899+K01900']
+['K00234+K00235+K00236+K00237', 'K00244+K00245+K00246-K00247', 'K00239+K00240+K00241-%K00242,K18859,K18860)']
+['K01676', 'K01679', 'K01677+K01678']
+['K00026', 'K00025', 'K00024', 'K00116']
+++++++++++++++++++
+M00010
+(K01647,K05942) (K01681,K01682) (K00031,K00030)
+['(K01647,K05942)', '(K01681,K01682)', '(K00031,K00030)']
+==
+['K01647', 'K05942']
+['K01681', 'K01682']
+['K00031', 'K00030']
+++++++++++++++++++
+M00011
+(K00164+K00658+K00382,K00174+K00175-K00177-K00176) (K01902+K01903,K01899+K01900,K18118) (K00234+K00235+K00236+K00237,K00239+K00240+K00241-(K00242,K18859,K18860),K00244+K00245+K00246-K00247) (K01676,K01679,K01677+K01678) (K00026,K00025,K00024,K00116)
+['(K00164+K00658+K00382,K00174+K00175-K00177-K00176)', '(K01902+K01903,K01899+K01900,K18118)', '(K00234+K00235+K00236+K00237,K00239+K00240+K00241-(K00242,K18859,K18860),K00244+K00245+K00246-K00247)', '(K01676,K01679,K01677+K01678)', '(K00026,K00025,K00024,K00116)']
+==
+['K00164+K00658+K00382', 'K00174+K00175-K00177-K00176']
+['K18118', 'K01902+K01903', 'K01899+K01900']
+['K00234+K00235+K00236+K00237', 'K00244+K00245+K00246-K00247', 'K00239+K00240+K00241-%K00242,K18859,K18860)']
+['K01676', 'K01679', 'K01677+K01678']
+['K00026', 'K00025', 'K00024', 'K00116']
+++++++++++++++++++
+M00004
+(K13937,((K00036,K19243) (K01057,K07404))) K00033 K01783 (K01807,K01808) K00615 K00616 (K01810,K06859,K13810,K15916)
+['(K13937,((K00036,K19243)_(K01057,K07404)))', 'K00033', 'K01783', '(K01807,K01808)', 'K00615', 'K00616', '(K01810,K06859,K13810,K15916)']
+==
+['K13937', 'K00036,K19243_K01057,K07404']
+['K00033']
+['K01783']
+['K01807', 'K01808']
+['K00615']
+['K00616']
+['K01810', 'K06859', 'K13810', 'K15916']
+++++++++++++++++++
+M00006
+(K13937,((K00036,K19243) (K01057,K07404))) K00033
+['(K13937,((K00036,K19243)_(K01057,K07404)))', 'K00033']
+==
+['K13937', 'K00036,K19243_K01057,K07404']
+['K00033']
+++++++++++++++++++
+M00007
+K00615 (K00616,K13810) K01783 (K01807,K01808)
+['K00615', '(K00616,K13810)', 'K01783', '(K01807,K01808)']
+==
+['K00615']
+['K00616', 'K13810']
+['K01783']
+['K01807', 'K01808']
+++++++++++++++++++
+M00580
+(K08094 (K08093,K13812),K13831) K01807
+['(K08094_(K08093,K13812),K13831)', 'K01807']
+==
+['K13831', 'K08094_K08093,K13812']
+['K01807']
+++++++++++++++++++
+M00005
+K00948
+['K00948']
+==
+['K00948']
+++++++++++++++++++
+M00008
+K00036 (K01057,K07404) K01690 K01625
+['K00036', '(K01057,K07404)', 'K01690', 'K01625']
+==
+['K00036']
+['K01057', 'K07404']
+['K01690']
+['K01625']
+++++++++++++++++++
+M00308
+K05308 K00874 K01625 (K00134 K00927,K00131,K18978)
+['K05308', 'K00874', 'K01625', '(K00134_K00927,K00131,K18978)']
+==
+['K05308']
+['K00874']
+['K01625']
+['K00131', 'K18978', 'K00134_K00927']
+++++++++++++++++++
+M00633
+K05308 K18126 K11395 (K00131,K18978)
+['K05308', 'K18126', 'K11395', '(K00131,K18978)']
+==
+['K05308']
+['K18126']
+['K11395']
+['K00131', 'K18978']
+++++++++++++++++++
+M00309
+K05308 (K11395,K18127) (K18020+K18021+K18022,K18128,K03738)
+['K05308', '(K11395,K18127)', '(K18020+K18021+K18022,K18128,K03738)']
+==
+['K05308']
+['K11395', 'K18127']
+['K18128', 'K03738', 'K18020+K18021+K18022']
+++++++++++++++++++
+M00014
+K00012 ((K12447 K16190),(K00699 (K01195,K14756))) K00002 K13247 -- K03331 (K05351,K00008) K00854
+['K00012', '((K12447_K16190),(K00699_(K01195,K14756)))', 'K00002', 'K13247', 'K03331', '(K05351,K00008)', 'K00854']
+==
+['K00012']
+['K12447_K16190', 'K00699_K01195,K14756']
+['K00002']
+['K13247']
+['K03331']
+['K05351', 'K00008']
+['K00854']
+++++++++++++++++++
+M00630
+(K18106,K19634) K18102 K18103 K18107
+['(K18106,K19634)', 'K18102', 'K18103', 'K18107']
+==
+['K18106', 'K19634']
+['K18102']
+['K18103']
+['K18107']
+++++++++++++++++++
+M00631
+K01812 K00041 (K01685,K16849+K16850) K00874 (K01625,K17463)
+['K01812', 'K00041', '(K01685,K16849+K16850)', 'K00874', '(K01625,K17463)']
+==
+['K01812']
+['K00041']
+['K01685', 'K16849+K16850']
+['K00874']
+['K01625', 'K17463']
+++++++++++++++++++
+M00061
+K01812 K00040 (K01686,K08323) K00874 (K01625,K17463)
+['K01812', 'K00040', '(K01686,K08323)', 'K00874', '(K01625,K17463)']
+==
+['K01812']
+['K00040']
+['K01686', 'K08323']
+['K00874']
+['K01625', 'K17463']
+++++++++++++++++++
+M00081
+K01051 K01184 K01213
+['K01051', 'K01184', 'K01213']
+==
+['K01051']
+['K01184']
+['K01213']
+++++++++++++++++++
+M00632
+K01785 K00849 K00965 K01784
+['K01785', 'K00849', 'K00965', 'K01784']
+==
+['K01785']
+['K00849']
+['K00965']
+['K01784']
+++++++++++++++++++
+M00552
+K01684 K00883 K01631 K00134 K00927
+['K01684', 'K00883', 'K01631', 'K00134', 'K00927']
+==
+['K01684']
+['K00883']
+['K01631']
+['K00134']
+['K00927']
+++++++++++++++++++
+M00129
+K00963 K00012 K00699 (K01195,K14756) K00002 K01053 K00103
+['K00963', 'K00012', 'K00699', '(K01195,K14756)', 'K00002', 'K01053', 'K00103']
+==
+['K00963']
+['K00012']
+['K00699']
+['K01195', 'K14756']
+['K00002']
+['K01053']
+['K00103']
+++++++++++++++++++
+M00114
+((K01810,K06859,K13810) (K01809,K16011),K15916) (K16881,(K17497,K01840,K15778) (K00966,K00971,K16011)) K10046 K14190 (K10047,K18649) (K00064,K17744) K00225
+['((K01810,K06859,K13810)_(K01809,K16011),K15916)', '(K16881,(K17497,K01840,K15778)_(K00966,K00971,K16011))', 'K10046', 'K14190', '(K10047,K18649)', '(K00064,K17744)', 'K00225']
+==
+['K15916', 'K01810,K06859,K13810_K01809,K16011']
+['K16881', 'K17497,K01840,K15778_K00966,K00971,K16011']
+['K10046']
+['K14190']
+['K10047', 'K18649']
+['K00064', 'K17744']
+['K00225']
+++++++++++++++++++
+M00550
+K02821+K02822+K03475 K03476 K03078 K03079 K03077
+['K02821+K02822+K03475', 'K03476', 'K03078', 'K03079', 'K03077']
+==
+['K02821+K02822+K03475']
+['K03476']
+['K03078']
+['K03079']
+['K03077']
+++++++++++++++++++
+M00854
+(K00963 (K00693+K00750,K16150,K16153,K13679,K20812)),(K00975 (K00703,K13679,K20812)) (K00700,K16149)
+['(K00963_(K00693+K00750,K16150,K16153,K13679,K20812)),(K00975_(K00703,K13679,K20812))', '(K00700,K16149)']
+==
+['K00975_K00703,K13679,K20812', 'K00963_K00693+K00750,K16150,K16153,K13679,K20812']
+['K00700', 'K16149']
+++++++++++++++++++
+M00855
+(K00688,K16153) (K01196,((K00705,K22451) (K02438,K01200))) (K15779,K01835,K15778)
+['(K00688,K16153)', '(K01196,((K00705,K22451)_(K02438,K01200)))', '(K15779,K01835,K15778)']
+==
+['K00688', 'K16153']
+['K01196', 'K00705,K22451_K02438,K01200']
+['K15779', 'K01835', 'K15778']
+++++++++++++++++++
+M00565
+K00975 K00703 (K00700,K16149) K01214 K06044 K01236
+['K00975', 'K00703', '(K00700,K16149)', 'K01214', 'K06044', 'K01236']
+==
+['K00975']
+['K00703']
+['K00700', 'K16149']
+['K01214']
+['K06044']
+['K01236']
+++++++++++++++++++
+M00549
+(K00844,K00845,K12407,K00886) K01835 K00963
+['(K00844,K00845,K12407,K00886)', 'K01835', 'K00963']
+==
+['K00844', 'K00845', 'K12407', 'K00886']
+['K01835']
+['K00963']
+++++++++++++++++++
+M00554
+K00849 K00965
+['K00849', 'K00965']
+==
+['K00849']
+['K00965']
+++++++++++++++++++
+M00761
+K10011 K07806 K10012 K13014
+['K10011', 'K07806', 'K10012', 'K13014']
+==
+['K10011']
+['K07806']
+['K10012']
+['K13014']
+++++++++++++++++++
+M00012
+K01647 (K01681,K01682) K01637 (K01638,K19282) (K00026,K00025,K00024)
+['K01647', '(K01681,K01682)', 'K01637', '(K01638,K19282)', '(K00026,K00025,K00024)']
+==
+['K01647']
+['K01681', 'K01682']
+['K01637']
+['K01638', 'K19282']
+['K00026', 'K00025', 'K00024']
+++++++++++++++++++
+M00740
+K01647 K01681 K00031 K00261 K19268+K01846 K04835 K19280 K14449 K19281 K19282 K00024
+['K01647', 'K01681', 'K00031', 'K00261', 'K19268+K01846', 'K04835', 'K19280', 'K14449', 'K19281', 'K19282', 'K00024']
+==
+['K01647']
+['K01681']
+['K00031']
+['K00261']
+['K19268+K01846']
+['K04835']
+['K19280']
+['K14449']
+['K19281']
+['K19282']
+['K00024']
+++++++++++++++++++
+M00013
+(K00248,K00232) (K07511,K07514,K07515,K14729) K05605 K23146 K00140
+['(K00248,K00232)', '(K07511,K07514,K07515,K14729)', 'K05605', 'K23146', 'K00140']
+==
+['K00248', 'K00232']
+['K07511', 'K07514', 'K07515', 'K14729']
+['K05605']
+['K23146']
+['K00140']
+++++++++++++++++++
+M00741
+(K01965+K01966,K11263+(K18472,K19312+K22568),K01964+K15036+K15037) K05606 (K01847,K01848+K01849)
+['(K01965+K01966,K11263+(K18472,K19312+K22568),K01964+K15036+K15037)', 'K05606', '(K01847,K01848+K01849)']
+==
+['K01965+K01966', 'K01964+K15036+K15037', 'K11263+K18472,K19312+K22568']
+['K05606']
+['K01847', 'K01848+K01849']
+++++++++++++++++++
+M00130
+(K00888,K19801,K13711) (K00889,K13712) (K01116,K05857,K05858,K05859,K05860,K05861) K00911
+['(K00888,K19801,K13711)', '(K00889,K13712)', '(K01116,K05857,K05858,K05859,K05860,K05861)', 'K00911']
+==
+['K00888', 'K19801', 'K13711']
+['K00889', 'K13712']
+['K01116', 'K05857', 'K05858', 'K05859', 'K05860', 'K05861']
+['K00911']
+++++++++++++++++++
+M00131
+K01106 (K01107,K15422) K01109 (K01092,K15759,K10047,K18649)
+['K01106', '(K01107,K15422)', 'K01109', '(K01092,K15759,K10047,K18649)']
+==
+['K01106']
+['K01107', 'K15422']
+['K01109']
+['K01092', 'K15759', 'K10047', 'K18649']
+++++++++++++++++++
+M00132
+(K00913,K01765) K00915 K10572
+['(K00913,K01765)', 'K00915', 'K10572']
+==
+['K00913', 'K01765']
+['K00915']
+['K10572']
+++++++++++++++++++
+M00165
+K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01623,K01624) (K01100,K11532,K01086) K00615 (K01807,K01808)
+['K00855', '(K01601-K01602)', 'K00927', '(K05298,K00150,K00134)', '(K01623,K01624)', '(K03841,K02446,K11532,K01086)', 'K00615', '(K01623,K01624)', '(K01100,K11532,K01086)', 'K00615', '(K01807,K01808)']
+==
+['K00855']
+['K01601-K01602']
+['K00927']
+['K05298', 'K00150', 'K00134']
+['K01623', 'K01624']
+['K03841', 'K02446', 'K11532', 'K01086']
+['K00615']
+['K01623', 'K01624']
+['K01100', 'K11532', 'K01086']
+['K00615']
+['K01807', 'K01808']
+++++++++++++++++++
+M00166
+K00855 (K01601-K01602) K00927 (K05298,K00150,K00134)
+['K00855', '(K01601-K01602)', 'K00927', '(K05298,K00150,K00134)']
+==
+['K00855']
+['K01601-K01602']
+['K00927']
+['K05298', 'K00150', 'K00134']
+++++++++++++++++++
+M00167
+(K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01623,K01624) (K01100,K11532,K01086) K00615 (K01807,K01808)
+['(K01623,K01624)', '(K03841,K02446,K11532,K01086)', 'K00615', '(K01623,K01624)', '(K01100,K11532,K01086)', 'K00615', '(K01807,K01808)']
+==
+['K01623', 'K01624']
+['K03841', 'K02446', 'K11532', 'K01086']
+['K00615']
+['K01623', 'K01624']
+['K01100', 'K11532', 'K01086']
+['K00615']
+['K01807', 'K01808']
+++++++++++++++++++
+M00168
+K01595 (K00025,K00026,K00024)
+['K01595', '(K00025,K00026,K00024)']
+==
+['K01595']
+['K00025', 'K00026', 'K00024']
+++++++++++++++++++
+M00169
+K00029 K01006
+['K00029', 'K01006']
+==
+['K00029']
+['K01006']
+++++++++++++++++++
+M00172
+K01595 K00051 K00029 K01006
+['K01595', 'K00051', 'K00029', 'K01006']
+==
+['K01595']
+['K00051']
+['K00029']
+['K01006']
+++++++++++++++++++
+M00171
+K01595 K14454 K14455 (K00025,K00026) K00028 (K00814,K14272) K01006
+['K01595', 'K14454', 'K14455', '(K00025,K00026)', 'K00028', '(K00814,K14272)', 'K01006']
+==
+['K01595']
+['K14454']
+['K14455']
+['K00025', 'K00026']
+['K00028']
+['K00814', 'K14272']
+['K01006']
+++++++++++++++++++
+M00170
+K01595 K14454 K14455 K01610
+['K01595', 'K14454', 'K14455', 'K01610']
+==
+['K01595']
+['K14454']
+['K14455']
+['K01610']
+++++++++++++++++++
+M00173
+(K00169+K00170+K00171+K00172,K03737) ((K01007,K01006) K01595,K01959+K01960,K01958) K00024 (K01676,K01679,K01677+K01678) (K00239+K00240-K00241-K00242,K00244+K00245-K00246-K00247,K18556+K18557+K18558+K18559+K18560) (K01902+K01903) (K00174+K00175-K00177-K00176) K00031 (K01681,K01682) (K15230+K15231,K15232+K15233 K15234)
+['(K00169+K00170+K00171+K00172,K03737)', '((K01007,K01006)_K01595,K01959+K01960,K01958)', 'K00024', '(K01676,K01679,K01677+K01678)', '(K00239+K00240-K00241-K00242,K00244+K00245-K00246-K00247,K18556+K18557+K18558+K18559+K18560)', '(K01902+K01903)', '(K00174+K00175-K00177-K00176)', 'K00031', '(K01681,K01682)', '(K15230+K15231,K15232+K15233_K15234)']
+==
+['K03737', 'K00169+K00170+K00171+K00172']
+['K01958', 'K01959+K01960', 'K01007,K01006_K01595']
+['K00024']
+['K01676', 'K01679', 'K01677+K01678']
+['K00239+K00240-K00241-K00242', 'K00244+K00245-K00246-K00247', 'K18556+K18557+K18558+K18559+K18560']
+['K01902+K01903']
+['K00174+K00175-K00177-K00176']
+['K00031']
+['K01681', 'K01682']
+['K15230+K15231', 'K15232+K15233_K15234']
+++++++++++++++++++
+M00375
+K01964+K15037+K15036 K15017 K15039 K15018 K15019 K15020 K05606 K01848+K01849 (K15038,K15017) K14465 (K14466,K18861) K14534 K15016 K00626
+['K01964+K15037+K15036', 'K15017', 'K15039', 'K15018', 'K15019', 'K15020', 'K05606', 'K01848+K01849', '(K15038,K15017)', 'K14465', '(K14466,K18861)', 'K14534', 'K15016', 'K00626']
+==
+['K01964+K15037+K15036']
+['K15017']
+['K15039']
+['K15018']
+['K15019']
+['K15020']
+['K05606']
+['K01848+K01849']
+['K15038', 'K15017']
+['K14465']
+['K14466', 'K18861']
+['K14534']
+['K15016']
+['K00626']
+++++++++++++++++++
+M00374
+K00169+K00170+K00171+K00172 K01007 K01595 K00024 (K01676,K01677+K01678) (K00239+K00240-K00241-K18860) K01902+K01903 (K15038,K15017) K14465 (K14467,K18861) K14534 K15016 K00626
+['K00169+K00170+K00171+K00172', 'K01007', 'K01595', 'K00024', '(K01676,K01677+K01678)', '(K00239+K00240-K00241-K18860)', 'K01902+K01903', '(K15038,K15017)', 'K14465', '(K14467,K18861)', 'K14534', 'K15016', 'K00626']
+==
+['K00169+K00170+K00171+K00172']
+['K01007']
+['K01595']
+['K00024']
+['K01676', 'K01677+K01678']
+['K00239+K00240-K00241-K18860']
+['K01902+K01903']
+['K15038', 'K15017']
+['K14465']
+['K14467', 'K18861']
+['K14534']
+['K15016']
+['K00626']
+++++++++++++++++++
+M00377
+K00198 K05299-K15022 K01938 K01491 K00297 K15023 K14138+K00197+K00194
+['K00198', 'K05299-K15022', 'K01938', 'K01491', 'K00297', 'K15023', 'K14138+K00197+K00194']
+==
+['K00198']
+['K05299-K15022']
+['K01938']
+['K01491']
+['K00297']
+['K15023']
+['K14138+K00197+K00194']
+++++++++++++++++++
+M00579
+(K00625,K13788,K15024) K00925
+['(K00625,K13788,K15024)', 'K00925']
+==
+['K00625', 'K13788', 'K15024']
+['K00925']
+++++++++++++++++++
+M00620
+K00169+K00170+K00171+K00172 K01959+K01960 K00024 K01677+K01678 K18209+K18210 K01902+K01903 K00174+K00175+K00176+K00177
+['K00169+K00170+K00171+K00172', 'K01959+K01960', 'K00024', 'K01677+K01678', 'K18209+K18210', 'K01902+K01903', 'K00174+K00175+K00176+K00177']
+==
+['K00169+K00170+K00171+K00172']
+['K01959+K01960']
+['K00024']
+['K01677+K01678']
+['K18209+K18210']
+['K01902+K01903']
+['K00174+K00175+K00176+K00177']
+++++++++++++++++++
+M00567
+(K00200+K00201+K00202+K00203-K11261+(K00205,K11260,K00204)) K00672 K01499 (K00319,K13942) K00320 (K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584) (K00399+K00401+K00402) (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))
+['(K00200+K00201+K00202+K00203-K11261+(K00205,K11260,K00204))', 'K00672', 'K01499', '(K00319,K13942)', 'K00320', '(K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584)', '(K00399+K00401+K00402)', '(K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))']
+==
+['K00200+K00201+K00202+K00203-K11261+K00205,K11260,K00204']
+['K00672']
+['K01499']
+['K00319', 'K13942']
+['K00320']
+['K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584']
+['K00399+K00401+K00402']
+['K08264+K08265', 'K22480+K22481+K22482', 'K03388+K03389+K03390', 'K03388+K03389+K03390+K14127+K14126+K14128,K22516+K00125']
+++++++++++++++++++
+M00357
+(K00925 (K00625,K13788),K01895) (K00193+K00197+K00194) (K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584) (K00399+K00401+K00402) (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))
+['(K00925_(K00625,K13788),K01895)', '(K00193+K00197+K00194)', '(K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584)', '(K00399+K00401+K00402)', '(K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))']
+==
+['K01895', 'K00925_K00625,K13788']
+['K00193+K00197+K00194']
+['K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584']
+['K00399+K00401+K00402']
+['K08264+K08265', 'K22480+K22481+K22482', 'K03388+K03389+K03390', 'K03388+K03389+K03390+K14127+K14126+K14128,K22516+K00125']
+++++++++++++++++++
+M00356
+K14080+K04480+K14081 K00399+K00401+K00402 (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))
+['K14080+K04480+K14081', 'K00399+K00401+K00402', '(K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))']
+==
+['K14080+K04480+K14081']
+['K00399+K00401+K00402']
+['K08264+K08265', 'K22480+K22481+K22482', 'K03388+K03389+K03390', 'K03388+K03389+K03390+K14127+K14126+K14128,K22516+K00125']
+++++++++++++++++++
+M00563
+K14082 ((K16177-K16176),(K16179-K16178),(K14084-K14083)) K00399+K00401+K00402 (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))
+['K14082', '((K16177-K16176),(K16179-K16178),(K14084-K14083))', 'K00399+K00401+K00402', '(K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))']
+==
+['K14082']
+['K16177-K16176', 'K16179-K16178', 'K14084-K14083']
+['K00399+K00401+K00402']
+['K08264+K08265', 'K22480+K22481+K22482', 'K03388+K03389+K03390', 'K03388+K03389+K03390+K14127+K14126+K14128,K22516+K00125']
+++++++++++++++++++
+M00358
+K08097 K05979 K05884 K13039+K06034
+['K08097', 'K05979', 'K05884', 'K13039+K06034']
+==
+['K08097']
+['K05979']
+['K05884']
+['K13039+K06034']
+++++++++++++++++++
+M00608
+K10977 K16792+K16793 K10978
+['K10977', 'K16792+K16793', 'K10978']
+==
+['K10977']
+['K16792+K16793']
+['K10978']
+++++++++++++++++++
+M00174
+((K10944+K10945+K10946),(K16157+K16158+K16159+K16160+K16161+K16162)) ((K14028-K14029),K23995)
+['((K10944+K10945+K10946),(K16157+K16158+K16159+K16160+K16161+K16162))', '((K14028-K14029),K23995)']
+==
+['K10944+K10945+K10946', 'K16157+K16158+K16159+K16160+K16161+K16162']
+['K23995', 'K14028-K14029']
+++++++++++++++++++
+M00346
+K00600 K00830 K00018 K11529 K01689 K01595 K00024 K08692+K14067 K08691
+['K00600', 'K00830', 'K00018', 'K11529', 'K01689', 'K01595', 'K00024', 'K08692+K14067', 'K08691']
+==
+['K00600']
+['K00830']
+['K00018']
+['K11529']
+['K01689']
+['K01595']
+['K00024']
+['K08692+K14067']
+['K08691']
+++++++++++++++++++
+M00345
+(((K08093,K13812) K08094),K13831) (K00850,K16370) K01624
+['(((K08093,K13812)_K08094),K13831)', '(K00850,K16370)', 'K01624']
+==
+['K13831', 'K08093,K13812_K08094']
+['K00850', 'K16370']
+['K01624']
+++++++++++++++++++
+M00344
+K17100 K00863 K01624 K03841
+['K17100', 'K00863', 'K01624', 'K03841']
+==
+['K17100']
+['K00863']
+['K01624']
+['K03841']
+++++++++++++++++++
+M00422
+K00192+K00195 K00193+K00197+K00194
+['K00192+K00195', 'K00193+K00197+K00194']
+==
+['K00192+K00195']
+['K00193+K00197+K00194']
+++++++++++++++++++
+M00175
+K02588+K02586+K02591-K00531,K22896+K22897+K22898+K22899
+['K02588+K02586+K02591-K00531,K22896+K22897+K22898+K22899']
+==
+['K02588+K02586+K02591-K00531', 'K22896+K22897+K22898+K22899']
+++++++++++++++++++
+M00531
+(K00367,K10534,K00372-K00360) (K00366,K17877)
+['(K00367,K10534,K00372-K00360)', '(K00366,K17877)']
+==
+['K00367', 'K10534', 'K00372-K00360']
+['K00366', 'K17877']
+++++++++++++++++++
+M00530
+(K00370+K00371+K00374,K02567+K02568) (K00362+K00363,K03385+K15876)
+['(K00370+K00371+K00374,K02567+K02568)', '(K00362+K00363,K03385+K15876)']
+==
+['K02567+K02568', 'K00370+K00371+K00374']
+['K00362+K00363', 'K03385+K15876']
+++++++++++++++++++
+M00529
+(K00370+K00371+K00374,K02567+K02568) (K00368,K15864) (K04561+K02305) K00376
+['(K00370+K00371+K00374,K02567+K02568)', '(K00368,K15864)', '(K04561+K02305)', 'K00376']
+==
+['K02567+K02568', 'K00370+K00371+K00374']
+['K00368', 'K15864']
+['K04561+K02305']
+['K00376']
+++++++++++++++++++
+M00528
+K10944+K10945+K10946 K10535
+['K10944+K10945+K10946', 'K10535']
+==
+['K10944+K10945+K10946']
+['K10535']
+++++++++++++++++++
+M00804
+K10944+K10945+K10946 K10535 K00370+K00371
+['K10944+K10945+K10946', 'K10535', 'K00370+K00371']
+==
+['K10944+K10945+K10946']
+['K10535']
+['K00370+K00371']
+++++++++++++++++++
+M00176
+(K13811,K00958+K00860,K00955+K00957,K00956+K00957+K00860) K00390 (K00380+K00381,K00392)
+['(K13811,K00958+K00860,K00955+K00957,K00956+K00957+K00860)', 'K00390', '(K00380+K00381,K00392)']
+==
+['K13811', 'K00958+K00860', 'K00955+K00957', 'K00956+K00957+K00860']
+['K00390']
+['K00392', 'K00380+K00381']
+++++++++++++++++++
+M00596
+K00958 (K00394+K00395) (K11180+K11181)
+['K00958', '(K00394+K00395)', '(K11180+K11181)']
+==
+['K00958']
+['K00394+K00395']
+['K11180+K11181']
+++++++++++++++++++
+M00595
+K17222+K17223+K17224-K17225-K22622+K17226+K17227
+['K17222+K17223+K17224-K17225-K22622+K17226+K17227']
+==
+['K17222+K17223+K17224-K17225-K22622+K17226+K17227']
+++++++++++++++++++
+M00161
+K02703+K02706+K02705+K02704+K02707+K02708
+['K02703+K02706+K02705+K02704+K02707+K02708']
+==
+['K02703+K02706+K02705+K02704+K02707+K02708']
+++++++++++++++++++
+M00163
+K02689+K02690+K02691+K02692+K02693+K02694
+['K02689+K02690+K02691+K02692+K02693+K02694']
+==
+['K02689+K02690+K02691+K02692+K02693+K02694']
+++++++++++++++++++
+M00597
+K08928+K08929
+['K08928+K08929']
+==
+['K08928+K08929']
+++++++++++++++++++
+M00598
+K08940+K08941+K08942+K08943
+['K08940+K08941+K08942+K08943']
+==
+['K08940+K08941+K08942+K08943']
+++++++++++++++++++
+M00145
+K05574+K05582+K05581+K05579+K05572+K05580+K05578+K05576+K05577+K05575+K05573-K05583-K05584-K05585
+['K05574+K05582+K05581+K05579+K05572+K05580+K05578+K05576+K05577+K05575+K05573-K05583-K05584-K05585']
+==
+['K05574+K05582+K05581+K05579+K05572+K05580+K05578+K05576+K05577+K05575+K05573-K05583-K05584-K05585']
+++++++++++++++++++
+M00142
+K03878+K03879+K03880+K03881+K03882+K03883+K03884
+['K03878+K03879+K03880+K03881+K03882+K03883+K03884']
+==
+['K03878+K03879+K03880+K03881+K03882+K03883+K03884']
+++++++++++++++++++
+M00143
+K03934+K03935+K03936+K03937+K03938+K03939+K03940+K03941+K03942+K03943-K03944
+['K03934+K03935+K03936+K03937+K03938+K03939+K03940+K03941+K03942+K03943-K03944']
+==
+['K03934+K03935+K03936+K03937+K03938+K03939+K03940+K03941+K03942+K03943-K03944']
+++++++++++++++++++
+M00146
+K03945+K03946+K03947+K03948+K03949+K03950+K03951+K03952+K03953+K03954+K03955+K03956+K11352+K11353
+['K03945+K03946+K03947+K03948+K03949+K03950+K03951+K03952+K03953+K03954+K03955+K03956+K11352+K11353']
+==
+['K03945+K03946+K03947+K03948+K03949+K03950+K03951+K03952+K03953+K03954+K03955+K03956+K11352+K11353']
+++++++++++++++++++
+M00147
+K03957+K03958+K03959+K03960+K03961+K03962+K03963+K03964+K03965+K03966+K11351+K03967+K03968
+['K03957+K03958+K03959+K03960+K03961+K03962+K03963+K03964+K03965+K03966+K11351+K03967+K03968']
+==
+['K03957+K03958+K03959+K03960+K03961+K03962+K03963+K03964+K03965+K03966+K11351+K03967+K03968']
+++++++++++++++++++
+M00150
+K00244+K00245+K00246+K00247
+['K00244+K00245+K00246+K00247']
+==
+['K00244+K00245+K00246+K00247']
+++++++++++++++++++
+M00148
+K00236+K00237+K00234+K00235
+['K00236+K00237+K00234+K00235']
+==
+['K00236+K00237+K00234+K00235']
+++++++++++++++++++
+M00162
+K02635+K02637+K02634+K02636+K02642+K02643+K03689+K02640
+['K02635+K02637+K02634+K02636+K02642+K02643+K03689+K02640']
+==
+['K02635+K02637+K02634+K02636+K02642+K02643+K03689+K02640']
+++++++++++++++++++
+M00154
+(K02257+K02262+K02256+K02261+K02263+K02264+K02265+K02266+K02267+K02268+(K02269,K02270-K02271)+K02272-K02273+K02258+K02259+K02260)
+['(K02257+K02262+K02256+K02261+K02263+K02264+K02265+K02266+K02267+K02268+(K02269,K02270-K02271)+K02272-K02273+K02258+K02259+K02260)']
+==
+['K02257+K02262+K02256+K02261+K02263+K02264+K02265+K02266+K02267+K02268+K02269,K02270-K02271+K02272-K02273+K02258+K02259+K02260']
+++++++++++++++++++
+M00155
+K02275+(K02274+K02276,K15408)-K02277
+['K02275+(K02274+K02276,K15408)-K02277']
+==
+['K15408-K02277', 'K02275+K02274+K02276']
+++++++++++++++++++
+M00153
+K00425+K00426+(K00424,K22501)
+['K00425+K00426+(K00424,K22501)']
+==
+['K22501', 'K00425+K00426+K00424']
+++++++++++++++++++
+M00417
+K02297+K02298+K02299+K02300
+['K02297+K02298+K02299+K02300']
+==
+['K02297+K02298+K02299+K02300']
+++++++++++++++++++
+M00416
+K02827+K02826+K02828+K02829
+['K02827+K02826+K02828+K02829']
+==
+['K02827+K02826+K02828+K02829']
+++++++++++++++++++
+M00156
+((K00404+K00405,K15862)+K00407+K00406)
+['((K00404+K00405,K15862)+K00407+K00406)']
+==
+['K00404+K00405,K15862+K00407+K00406']
+++++++++++++++++++
+M00157
+K02111+K02112+K02113+K02114+K02115+K02108+K02109+K02110
+['K02111+K02112+K02113+K02114+K02115+K02108+K02109+K02110']
+==
+['K02111+K02112+K02113+K02114+K02115+K02108+K02109+K02110']
+++++++++++++++++++
+M00158
+K02132+K02133+K02136+K02134+K02135+K02137+K02126+K02127+K02128+K02138+(K02129,K01549)+(K02130,K02139)+K02140+(K02141,K02131)-K02142-K02143+K02125
+['K02132+K02133+K02136+K02134+K02135+K02137+K02126+K02127+K02128+K02138+(K02129,K01549)+(K02130,K02139)+K02140+(K02141,K02131)-K02142-K02143+K02125']
+==
+['K01549+K02130', 'K02139+K02140+K02141', 'K02131-K02142-K02143+K02125', 'K02132+K02133+K02136+K02134+K02135+K02137+K02126+K02127+K02128+K02138+K02129']
+++++++++++++++++++
+M00159
+K02117+K02118+K02119+K02120+K02121+K02122+K02107+K02123+K02124
+['K02117+K02118+K02119+K02120+K02121+K02122+K02107+K02123+K02124']
+==
+['K02117+K02118+K02119+K02120+K02121+K02122+K02107+K02123+K02124']
+++++++++++++++++++
+M00160
+K02145+K02147+K02148+K02149+K02150+K02151+K02152+K02144+K02154+(K03661,K02155)+K02146+K02153+K03662
+['K02145+K02147+K02148+K02149+K02150+K02151+K02152+K02144+K02154+(K03661,K02155)+K02146+K02153+K03662']
+==
+['K02155+K02146+K02153+K03662', 'K02145+K02147+K02148+K02149+K02150+K02151+K02152+K02144+K02154+K03661']
+++++++++++++++++++
+M00082
+(K11262,(K02160+K01961,K11263)+(K01962+K01963,K18472)) (K00665,K00667+K00668,K11533,(K00645 (K00648,K18473)))
+['(K11262,(K02160+K01961,K11263)+(K01962+K01963,K18472))', '(K00665,K00667+K00668,K11533,(K00645_(K00648,K18473)))']
+==
+['K11262', 'K02160+K01961,K11263+K01962+K01963,K18472']
+['K00665', 'K11533', 'K00667+K00668', 'K00645_K00648,K18473']
+++++++++++++++++++
+M00083
+K00665,(K00667 K00668),K11533,((K00647,K09458) K00059 (K02372,K01716,K16363) (K00208,K02371,K10780,K00209))
+['K00665,(K00667_K00668),K11533,((K00647,K09458)_K00059_(K02372,K01716,K16363)_(K00208,K02371,K10780,K00209))']
+==
+['K00665', 'K11533', 'K00667_K00668', 'K00647,K09458_K00059_K02372,K01716,K16363_K00208,K02371,K10780,K00209']
+++++++++++++++++++
+M00873
+K18660 K03955+K00645 K09458 K11539+K13370 K22540 K07512
+['K18660', 'K03955+K00645', 'K09458', 'K11539+K13370', 'K22540', 'K07512']
+==
+['K18660']
+['K03955+K00645']
+['K09458']
+['K11539+K13370']
+['K22540']
+['K07512']
+++++++++++++++++++
+M00874
+K11262 K03955+K00645 K09458 K00059 K22541 K07512
+['K11262', 'K03955+K00645', 'K09458', 'K00059', 'K22541', 'K07512']
+==
+['K11262']
+['K03955+K00645']
+['K09458']
+['K00059']
+['K22541']
+['K07512']
+++++++++++++++++++
+M00085
+(K07508,K07509) (K00022 K07511,K07515) K07512
+['(K07508,K07509)', '(K00022_K07511,K07515)', 'K07512']
+==
+['K07508', 'K07509']
+['K07515', 'K00022_K07511']
+['K07512']
+++++++++++++++++++
+M00415
+(K10247,K10205,K10248,K10249,K10244,K10203,K10250,K15397,K10245,K10246) K10251 K10703 K10258
+['(K10247,K10205,K10248,K10249,K10244,K10203,K10250,K15397,K10245,K10246)', 'K10251', 'K10703', 'K10258']
+==
+['K10247', 'K10205', 'K10248', 'K10249', 'K10244', 'K10203', 'K10250', 'K15397', 'K10245', 'K10246']
+['K10251']
+['K10703']
+['K10258']
+++++++++++++++++++
+M00086
+K01897,K15013
+['K01897,K15013']
+==
+['K01897', 'K15013']
+++++++++++++++++++
+M00087
+(K00232,K00249,K00255,K06445,K09479) (((K01692,K07511,K13767) (K00022,K07516)),K01825,K01782,K07514,K07515,K10527) (K00632,K07508,K07509,K07513)
+['(K00232,K00249,K00255,K06445,K09479)', '(((K01692,K07511,K13767)_(K00022,K07516)),K01825,K01782,K07514,K07515,K10527)', '(K00632,K07508,K07509,K07513)']
+==
+['K00232', 'K00249', 'K00255', 'K06445', 'K09479']
+['K01825', 'K01782', 'K07514', 'K07515', 'K10527', 'K01692,K07511,K13767_K00022,K07516']
+['K00632', 'K07508', 'K07509', 'K07513']
+++++++++++++++++++
+M00861
+K00232 K12405 (K07513,K08764)
+['K00232', 'K12405', '(K07513,K08764)']
+==
+['K00232']
+['K12405']
+['K07513', 'K08764']
+++++++++++++++++++
+M00101
+K01852 K05917 K00222 K07750 K07748 (K09827,K13373) K09828 K01824 K00227 K00213
+['K01852', 'K05917', 'K00222', 'K07750', 'K07748', '(K09827,K13373)', 'K09828', 'K01824', 'K00227', 'K00213']
+==
+['K01852']
+['K05917']
+['K00222']
+['K07750']
+['K07748']
+['K09827', 'K13373']
+['K09828']
+['K01824']
+['K00227']
+['K00213']
+++++++++++++++++++
+M00102
+K00559 K09829 K00227 K09831 K00223
+['K00559', 'K09829', 'K00227', 'K09831', 'K00223']
+==
+['K00559']
+['K09829']
+['K00227']
+['K09831']
+['K00223']
+++++++++++++++++++
+M00103
+K07419 K07438
+['K07419', 'K07438']
+==
+['K07419']
+['K07438']
+++++++++++++++++++
+M00104
+K00489 K12408 K07431 K00251 K00037 K00488 K08748 K01796 K10214 K12405 K08764 K11992
+['K00489', 'K12408', 'K07431', 'K00251', 'K00037', 'K00488', 'K08748', 'K01796', 'K10214', 'K12405', 'K08764', 'K11992']
+==
+['K00489']
+['K12408']
+['K07431']
+['K00251']
+['K00037']
+['K00488']
+['K08748']
+['K01796']
+['K10214']
+['K12405']
+['K08764']
+['K11992']
+++++++++++++++++++
+M00106
+K08748 K00659
+['K08748', 'K00659']
+==
+['K08748']
+['K00659']
+++++++++++++++++++
+M00862
+K10214 K12405 K08764
+['K10214', 'K12405', 'K08764']
+==
+['K10214']
+['K12405']
+['K08764']
+++++++++++++++++++
+M00107
+K00498 K00070
+['K00498', 'K00070']
+==
+['K00498']
+['K00070']
+++++++++++++++++++
+M00108
+K00513 (K00497,K07433) K07433
+['K00513', '(K00497,K07433)', 'K07433']
+==
+['K00513']
+['K00497', 'K07433']
+['K07433']
+++++++++++++++++++
+M00109
+K00512 K00513 K00497 (K15680,K00071)
+['K00512', 'K00513', 'K00497', '(K15680,K00071)']
+==
+['K00512']
+['K00513']
+['K00497']
+['K15680', 'K00071']
+++++++++++++++++++
+M00110
+K00512 K00070 K07434
+['K00512', 'K00070', 'K07434']
+==
+['K00512']
+['K00070']
+['K07434']
+++++++++++++++++++
+M00089
+(K00629,K13506,K13507,K00630,K13508) (K00655,K13509,K13523,K19007,K13513,K13517,K13519,K14674,K22831) (K01080,K15728,K18693) (K11155,K11160,K14456,K22848,K22849)
+['(K00629,K13506,K13507,K00630,K13508)', '(K00655,K13509,K13523,K19007,K13513,K13517,K13519,K14674,K22831)', '(K01080,K15728,K18693)', '(K11155,K11160,K14456,K22848,K22849)']
+==
+['K00629', 'K13506', 'K13507', 'K00630', 'K13508']
+['K00655', 'K13509', 'K13523', 'K19007', 'K13513', 'K13517', 'K13519', 'K14674', 'K22831']
+['K01080', 'K15728', 'K18693']
+['K11155', 'K11160', 'K14456', 'K22848', 'K22849']
+++++++++++++++++++
+M00098
+(K01046,K12298,K16816,K13534,K14073,K14074,K14075,K14076,K22283,K14452,K22284,K14674,K14675,K17900) K01054
+['(K01046,K12298,K16816,K13534,K14073,K14074,K14075,K14076,K22283,K14452,K22284,K14674,K14675,K17900)', 'K01054']
+==
+['K01046', 'K12298', 'K16816', 'K13534', 'K14073', 'K14074', 'K14075', 'K14076', 'K22283', 'K14452', 'K22284', 'K14674', 'K14675', 'K17900']
+['K01054']
+++++++++++++++++++
+M00090
+(K00866,K14156) K00968 (K00994,K13644)
+['(K00866,K14156)', 'K00968', '(K00994,K13644)']
+==
+['K00866', 'K14156']
+['K00968']
+['K00994', 'K13644']
+++++++++++++++++++
+M00091
+K00551,(K16369 K00550),K00570
+['K00551,(K16369_K00550),K00570']
+==
+['K00551', 'K00570', 'K16369_K00550']
+++++++++++++++++++
+M00092
+(K00894,K14156) K00967 (K00993,K13644)
+['(K00894,K14156)', 'K00967', '(K00993,K13644)']
+==
+['K00894', 'K14156']
+['K00967']
+['K00993', 'K13644']
+++++++++++++++++++
+M00093
+K00981 (K00998,K17103) K01613
+['K00981', '(K00998,K17103)', 'K01613']
+==
+['K00981']
+['K00998', 'K17103']
+['K01613']
+++++++++++++++++++
+M00094
+K00654 K04708 (K04709,K04710,K23727) K04712
+['K00654', 'K04708', '(K04709,K04710,K23727)', 'K04712']
+==
+['K00654']
+['K04708']
+['K04709', 'K04710', 'K23727']
+['K04712']
+++++++++++++++++++
+M00066
+K00720 K07553
+['K00720', 'K07553']
+==
+['K00720']
+['K07553']
+++++++++++++++++++
+M00067
+K04628 K01019
+['K04628', 'K01019']
+==
+['K04628']
+['K01019']
+++++++++++++++++++
+M00099
+K00654 K04708 (K04709,K04710,K23727) K04712 (K01441,K12348,K12349)
+['K00654', 'K04708', '(K04709,K04710,K23727)', 'K04712', '(K01441,K12348,K12349)']
+==
+['K00654']
+['K04708']
+['K04709', 'K04710', 'K23727']
+['K04712']
+['K01441', 'K12348', 'K12349']
+++++++++++++++++++
+M00100
+K04718 K01634
+['K04718', 'K01634']
+==
+['K04718']
+['K01634']
+++++++++++++++++++
+M00113
+K00454 K01723 K10525 K05894 K10526 K00232 K10527 K07513 --
+['K00454', 'K01723', 'K10525', 'K05894', 'K10526', 'K00232', 'K10527', 'K07513']
+==
+['K00454']
+['K01723']
+['K10525']
+['K05894']
+['K10526']
+['K00232']
+['K10527']
+['K07513']
+++++++++++++++++++
+M00049
+K01939 K01756 (K00939,K18532,K18533,K00944) (K00940,K00873,K12406)
+['K01939', 'K01756', '(K00939,K18532,K18533,K00944)', '(K00940,K00873,K12406)']
+==
+['K01939']
+['K01756']
+['K00939', 'K18532', 'K18533', 'K00944']
+['K00940', 'K00873', 'K12406']
+++++++++++++++++++
+M00050
+K00088 K01951 K00942 (K00940,K18533,K00873,K12406)
+['K00088', 'K01951', 'K00942', '(K00940,K18533,K00873,K12406)']
+==
+['K00088']
+['K01951']
+['K00942']
+['K00940', 'K18533', 'K00873', 'K12406']
+++++++++++++++++++
+M00546
+(K00106,K00087+K13479+K13480,K13481+K13482,K11177+K11178+K13483) (K00365,K16838,K16839,K22879) (K13484,K07127 (K13485,K16838,K16840)) (K01466,K16842) K01477
+['(K00106,K00087+K13479+K13480,K13481+K13482,K11177+K11178+K13483)', '(K00365,K16838,K16839,K22879)', '(K13484,K07127_(K13485,K16838,K16840))', '(K01466,K16842)', 'K01477']
+==
+['K00106', 'K13481+K13482', 'K00087+K13479+K13480', 'K11177+K11178+K13483']
+['K00365', 'K16838', 'K16839', 'K22879']
+['K13484', 'K07127_K13485,K16838,K16840']
+['K01466', 'K16842']
+['K01477']
+++++++++++++++++++
+M00051
+(K11540,(K11541 K01465),((K01954,K01955+K01956) (K00609+K00610,K00608) K01465)) (K00226,K00254,K17828) (K13421,K00762 K01591)
+['(K11540,(K11541_K01465),((K01954,K01955+K01956)_(K00609+K00610,K00608)_K01465))', '(K00226,K00254,K17828)', '(K13421,K00762_K01591)']
+==
+['K11540', 'K11541_K01465', 'K01954,K01955+K01956_K00609+K00610,K00608_K01465']
+['K00226', 'K00254', 'K17828']
+['K13421', 'K00762_K01591']
+++++++++++++++++++
+M00052
+(K13800,K13809,K09903) (K00940,K18533) K01937
+['(K13800,K13809,K09903)', '(K00940,K18533)', 'K01937']
+==
+['K13800', 'K13809', 'K09903']
+['K00940', 'K18533']
+['K01937']
+++++++++++++++++++
+M00053
+(K00524,K00525+K00526,K10807+K10808) (K00940,K18533) (K00527,K21636) K01494 K01520 (K00560,K13998) K00943 K00940
+['(K00524,K00525+K00526,K10807+K10808)', '(K00940,K18533)', '(K00527,K21636)', 'K01494', 'K01520', '(K00560,K13998)', 'K00943', 'K00940']
+==
+['K00524', 'K00525+K00526', 'K10807+K10808']
+['K00940', 'K18533']
+['K00527', 'K21636']
+['K01494']
+['K01520']
+['K00560', 'K13998']
+['K00943']
+['K00940']
+++++++++++++++++++
+M00046
+(K00207,K17722+K17723) K01464 (K01431,K06016)
+['(K00207,K17722+K17723)', 'K01464', '(K01431,K06016)']
+==
+['K00207', 'K17722+K17723']
+['K01464']
+['K01431', 'K06016']
+++++++++++++++++++
+M00020
+K00058 K00831 (K01079,K02203,K22305)
+['K00058', 'K00831', '(K01079,K02203,K22305)']
+==
+['K00058']
+['K00831']
+['K01079', 'K02203', 'K22305']
+++++++++++++++++++
+M00018
+(K00928,K12524,K12525,K12526) K00133 (K00003,K12524,K12525) (K00872,K02204,K02203) K01733
+['(K00928,K12524,K12525,K12526)', 'K00133', '(K00003,K12524,K12525)', '(K00872,K02204,K02203)', 'K01733']
+==
+['K00928', 'K12524', 'K12525', 'K12526']
+['K00133']
+['K00003', 'K12524', 'K12525']
+['K00872', 'K02204', 'K02203']
+['K01733']
+++++++++++++++++++
+M00555
+(K17755,((K00108,K11440,K00499) (K00130,K14085)))
+['(K17755,((K00108,K11440,K00499)_(K00130,K14085)))']
+==
+['K17755', 'K00108,K11440,K00499_K00130,K14085']
+++++++++++++++++++
+M00033
+K00928 K00133 K00836 K06718 K06720
+['K00928', 'K00133', 'K00836', 'K06718', 'K06720']
+==
+['K00928']
+['K00133']
+['K00836']
+['K06718']
+['K06720']
+++++++++++++++++++
+M00021
+(K00640,K23304) (K01738,K13034,K17069)
+['(K00640,K23304)', '(K01738,K13034,K17069)']
+==
+['K00640', 'K23304']
+['K01738', 'K13034', 'K17069']
+++++++++++++++++++
+M00338
+(K01697,K10150) K01758
+['(K01697,K10150)', 'K01758']
+==
+['K01697', 'K10150']
+['K01758']
+++++++++++++++++++
+M00609
+K00789 K17462 K01243 K07173 K17216 K17217
+['K00789', 'K17462', 'K01243', 'K07173', 'K17216', 'K17217']
+==
+['K00789']
+['K17462']
+['K01243']
+['K07173']
+['K17216']
+['K17217']
+++++++++++++++++++
+M00017
+(K00928,K12524,K12525) K00133 (K00003,K12524,K12525) (K00651,K00641) K01739 (K01760,K14155) (K00548,K24042,K00549)
+['(K00928,K12524,K12525)', 'K00133', '(K00003,K12524,K12525)', '(K00651,K00641)', 'K01739', '(K01760,K14155)', '(K00548,K24042,K00549)']
+==
+['K00928', 'K12524', 'K12525']
+['K00133']
+['K00003', 'K12524', 'K12525']
+['K00651', 'K00641']
+['K01739']
+['K01760', 'K14155']
+['K00548', 'K24042', 'K00549']
+++++++++++++++++++
+M00034
+K00789 K01611 K00797 ((K01243,K01244) K00899,K00772) K08963 (K16054,K08964 (K09880,K08965 K08966)) K08967 (K00815,K08969,K23977,K00832,K00838)
+['K00789', 'K01611', 'K00797', '((K01243,K01244)_K00899,K00772)', 'K08963', '(K16054,K08964_(K09880,K08965_K08966))', 'K08967', '(K00815,K08969,K23977,K00832,K00838)']
+==
+['K00789']
+['K01611']
+['K00797']
+['K00772', 'K01243,K01244_K00899']
+['K08963']
+['K16054', 'K08964_K09880,K08965_K08966']
+['K08967']
+['K00815', 'K08969', 'K23977', 'K00832', 'K00838']
+++++++++++++++++++
+M00035
+K00789 (K00558,K17398,K17399) K01251 (K01697,K10150)
+['K00789', '(K00558,K17398,K17399)', 'K01251', '(K01697,K10150)']
+==
+['K00789']
+['K00558', 'K17398', 'K17399']
+['K01251']
+['K01697', 'K10150']
+++++++++++++++++++
+M00368
+K00789 (K01762,K20772) K05933
+['K00789', '(K01762,K20772)', 'K05933']
+==
+['K00789']
+['K01762', 'K20772']
+['K05933']
+++++++++++++++++++
+M00019
+K01652+(K01653,K11258) K00053 K01687 K00826
+['K01652+(K01653,K11258)', 'K00053', 'K01687', 'K00826']
+==
+['K11258', 'K01652+K01653']
+['K00053']
+['K01687']
+['K00826']
+++++++++++++++++++
+M00535
+K09011 K01703+K01704 K00052
+['K09011', 'K01703+K01704', 'K00052']
+==
+['K09011']
+['K01703+K01704']
+['K00052']
+++++++++++++++++++
+M00570
+(K17989,K01754) K01652+(K01653,K11258) K00053 K01687 K00826
+['(K17989,K01754)', 'K01652+(K01653,K11258)', 'K00053', 'K01687', 'K00826']
+==
+['K17989', 'K01754']
+['K11258', 'K01652+K01653']
+['K00053']
+['K01687']
+['K00826']
+++++++++++++++++++
+M00432
+K01649 (K01702,K01703+K01704) K00052
+['K01649', '(K01702,K01703+K01704)', 'K00052']
+==
+['K01649']
+['K01702', 'K01703+K01704']
+['K00052']
+++++++++++++++++++
+M00036
+K00826 ((K00166+K00167,K11381)+K09699+K00382) (K00253,K00249) (K01968+K01969) (K05607,K13766) K01640
+['K00826', '((K00166+K00167,K11381)+K09699+K00382)', '(K00253,K00249)', '(K01968+K01969)', '(K05607,K13766)', 'K01640']
+==
+['K00826']
+['K00166+K00167,K11381+K09699+K00382']
+['K00253', 'K00249']
+['K01968+K01969']
+['K05607', 'K13766']
+['K01640']
+++++++++++++++++++
+M00016
+(K00928,K12524,K12525,K12526) K00133 K01714 K00215 K00674 (K00821,K14267) K01439 K01778 (K01586,K12526)
+['(K00928,K12524,K12525,K12526)', 'K00133', 'K01714', 'K00215', 'K00674', '(K00821,K14267)', 'K01439', 'K01778', '(K01586,K12526)']
+==
+['K00928', 'K12524', 'K12525', 'K12526']
+['K00133']
+['K01714']
+['K00215']
+['K00674']
+['K00821', 'K14267']
+['K01439']
+['K01778']
+['K01586', 'K12526']
+++++++++++++++++++
+M00525
+K00928 K00133 K01714 K00215 K05822 K00841 K05823 K01778 K01586
+['K00928', 'K00133', 'K01714', 'K00215', 'K05822', 'K00841', 'K05823', 'K01778', 'K01586']
+==
+['K00928']
+['K00133']
+['K01714']
+['K00215']
+['K05822']
+['K00841']
+['K05823']
+['K01778']
+['K01586']
+++++++++++++++++++
+M00526
+(K00928,K12524,K12525,K12526) K00133 K01714 K00215 K03340 (K01586,K12526)
+['(K00928,K12524,K12525,K12526)', 'K00133', 'K01714', 'K00215', 'K03340', '(K01586,K12526)']
+==
+['K00928', 'K12524', 'K12525', 'K12526']
+['K00133']
+['K01714']
+['K00215']
+['K03340']
+['K01586', 'K12526']
+++++++++++++++++++
+M00527
+(K00928,K12524,K12525,K12526) K00133 K01714 K00215 K10206 K01778 (K01586,K12526)
+['(K00928,K12524,K12525,K12526)', 'K00133', 'K01714', 'K00215', 'K10206', 'K01778', '(K01586,K12526)']
+==
+['K00928', 'K12524', 'K12525', 'K12526']
+['K00133']
+['K01714']
+['K00215']
+['K10206']
+['K01778']
+['K01586', 'K12526']
+++++++++++++++++++
+M00030
+K01655 K17450 K01705 K05824 K00838 K00143 (K00293,K24034) K00290
+['K01655', 'K17450', 'K01705', 'K05824', 'K00838', 'K00143', '(K00293,K24034)', 'K00290']
+==
+['K01655']
+['K17450']
+['K01705']
+['K05824']
+['K00838']
+['K00143']
+['K00293', 'K24034']
+['K00290']
+++++++++++++++++++
+M00433
+K01655 (K17450 K01705,K16792+K16793) K05824
+['K01655', '(K17450_K01705,K16792+K16793)', 'K05824']
+==
+['K01655']
+['K17450_K01705', 'K16792+K16793']
+['K05824']
+++++++++++++++++++
+M00032
+K14157 K14085 K00825 (K15791+K00658+K00382) K00252 (K07514,(K07515,K07511) K00022)
+['K14157', 'K14085', 'K00825', '(K15791+K00658+K00382)', 'K00252', '(K07514,(K07515,K07511)_K00022)']
+==
+['K14157']
+['K14085']
+['K00825']
+['K15791+K00658+K00382']
+['K00252']
+['K07514', 'K07515,K07511_K00022']
+++++++++++++++++++
+M00028
+(K00618,K00619,K14681,K14682,K00620,K22477,K22478) ((K00930,K22478) K00145,K12659) (K00818,K00821) (K01438,K14677,K00620)
+['(K00618,K00619,K14681,K14682,K00620,K22477,K22478)', '((K00930,K22478)_K00145,K12659)', '(K00818,K00821)', '(K01438,K14677,K00620)']
+==
+['K00618', 'K00619', 'K14681', 'K14682', 'K00620', 'K22477', 'K22478']
+['K12659', 'K00930,K22478_K00145']
+['K00818', 'K00821']
+['K01438', 'K14677', 'K00620']
+++++++++++++++++++
+M00844
+K00611 K01940 (K01755,K14681)
+['K00611', 'K01940', '(K01755,K14681)']
+==
+['K00611']
+['K01940']
+['K01755', 'K14681']
+++++++++++++++++++
+M00845
+K22478 K00145 K00821 K09065 K01438 K01940 K01755
+['K22478', 'K00145', 'K00821', 'K09065', 'K01438', 'K01940', 'K01755']
+==
+['K22478']
+['K00145']
+['K00821']
+['K09065']
+['K01438']
+['K01940']
+['K01755']
+++++++++++++++++++
+M00029
+K01948 K00611 K01940 (K01755,K14681) K01476
+['K01948', 'K00611', 'K01940', '(K01755,K14681)', 'K01476']
+==
+['K01948']
+['K00611']
+['K01940']
+['K01755', 'K14681']
+['K01476']
+++++++++++++++++++
+M00015
+((K00931 K00147),K12657) K00286
+['((K00931_K00147),K12657)', 'K00286']
+==
+['K12657', 'K00931_K00147']
+['K00286']
+++++++++++++++++++
+M00047
+K00613 K00542 K00933
+['K00613', 'K00542', 'K00933']
+==
+['K00613']
+['K00542']
+['K00933']
+++++++++++++++++++
+M00879
+K00673 K01484 K00840 K06447 K05526
+['K00673', 'K01484', 'K00840', 'K06447', 'K05526']
+==
+['K00673']
+['K01484']
+['K00840']
+['K06447']
+['K05526']
+++++++++++++++++++
+M00134
+K01476 K01581
+['K01476', 'K01581']
+==
+['K01476']
+['K01581']
+++++++++++++++++++
+M00135
+K00657 K00274 (K00128,K14085,K00149) --
+['K00657', 'K00274', '(K00128,K14085,K00149)']
+==
+['K00657']
+['K00274']
+['K00128', 'K14085', 'K00149']
+++++++++++++++++++
+M00136
+K09470 K09471 K09472 K09473
+['K09470', 'K09471', 'K09472', 'K09473']
+==
+['K09470']
+['K09471']
+['K09472']
+['K09473']
+++++++++++++++++++
+M00026
+(K00765-K02502) (K01523 K01496,K11755,K14152) (K01814,K24017) (K02501+K02500,K01663) ((K01693 K00817 (K04486,K05602,K18649)),(K01089 K00817)) (K00013,K14152)
+['(K00765-K02502)', '(K01523_K01496,K11755,K14152)', '(K01814,K24017)', '(K02501+K02500,K01663)', '((K01693_K00817_(K04486,K05602,K18649)),(K01089_K00817))', '(K00013,K14152)']
+==
+['K00765-K02502']
+['K11755', 'K14152', 'K01523_K01496']
+['K01814', 'K24017']
+['K01663', 'K02501+K02500']
+['K01089_K00817', 'K01693_K00817_K04486,K05602,K18649']
+['K00013', 'K14152']
+++++++++++++++++++
+M00045
+K01745 K01712 K01468 (K01479,K00603,K13990,(K05603 K01458))
+['K01745', 'K01712', 'K01468', '(K01479,K00603,K13990,(K05603_K01458))']
+==
+['K01745']
+['K01712']
+['K01468']
+['K01479', 'K00603', 'K13990', 'K05603_K01458']
+++++++++++++++++++
+M00022
+(K01626,K03856,K13853) (((K01735,K13829) ((K03785,K03786) K00014,K13832)),K13830) ((K00891,K13829) (K00800,K24018),K13830) K01736
+['(K01626,K03856,K13853)', '(((K01735,K13829)_((K03785,K03786)_K00014,K13832)),K13830)', '((K00891,K13829)_(K00800,K24018),K13830)', 'K01736']
+==
+['K01626', 'K03856', 'K13853']
+['K13830', 'K01735,K13829_K03785,K03786_K00014,K13832']
+['K13830', 'K00891,K13829_K00800,K24018']
+['K01736']
+++++++++++++++++++
+M00023
+(((K01657+K01658,K13503,K13501,K01656) K00766),K13497) (((K01817,K24017) (K01656,K01609)),K13498,K13501) (K01695+(K01696,K06001),K01694)
+['(((K01657+K01658,K13503,K13501,K01656)_K00766),K13497)', '(((K01817,K24017)_(K01656,K01609)),K13498,K13501)', '(K01695+(K01696,K06001),K01694)']
+==
+['K13497', 'K01657+K01658,K13503,K13501,K01656_K00766']
+['K13498', 'K13501', 'K01817,K24017_K01656,K01609']
+['K01694', 'K01695+K01696,K06001']
+++++++++++++++++++
+M00024
+((K01850,K04092,K14187,K04093,K04516,K06208,K06209,K13853) (K01713,K04518,K05359),K14170) (K00832,K00838)
+['((K01850,K04092,K14187,K04093,K04516,K06208,K06209,K13853)_(K01713,K04518,K05359),K14170)', '(K00832,K00838)']
+==
+['K14170', 'K01850,K04092,K14187,K04093,K04516,K06208,K06209,K13853_K01713,K04518,K05359']
+['K00832', 'K00838']
+++++++++++++++++++
+M00025
+(((K01850,K04092,K14170,K04093,K04516,K06208,K06209,K13853) K04517),K14187) (K00815,K00832,K00838)
+['(((K01850,K04092,K14170,K04093,K04516,K06208,K06209,K13853)_K04517),K14187)', '(K00815,K00832,K00838)']
+==
+['K14187', 'K01850,K04092,K14170,K04093,K04516,K06208,K06209,K13853_K04517']
+['K00815', 'K00832', 'K00838']
+++++++++++++++++++
+M00040
+(K00832,K15849) (K00220,K24018,K15226,K15227)
+['(K00832,K15849)', '(K00220,K24018,K15226,K15227)']
+==
+['K00832', 'K15849']
+['K00220', 'K24018', 'K15226', 'K15227']
+++++++++++++++++++
+M00042
+(K00505,K00501) (K01592,K01593) K00503 K00553
+['(K00505,K00501)', '(K01592,K01593)', 'K00503', 'K00553']
+==
+['K00505', 'K00501']
+['K01592', 'K01593']
+['K00503']
+['K00553']
+++++++++++++++++++
+M00043
+K00431
+['K00431']
+==
+['K00431']
+++++++++++++++++++
+M00044
+(K00815,K00838,K03334) K00457 K00451 K01800 (K01555,K16171)
+['(K00815,K00838,K03334)', 'K00457', 'K00451', 'K01800', '(K01555,K16171)']
+==
+['K00815', 'K00838', 'K03334']
+['K00457']
+['K00451']
+['K01800']
+['K01555', 'K16171']
+++++++++++++++++++
+M00533
+K00455 K00151 K01826 K05921
+['K00455', 'K00151', 'K01826', 'K05921']
+==
+['K00455']
+['K00151']
+['K01826']
+['K05921']
+++++++++++++++++++
+M00545
+(((K05708+K05709+K05710+K00529) K05711),K05712) K05713 K05714 K02554 K01666 K04073
+['(((K05708+K05709+K05710+K00529)_K05711),K05712)', 'K05713', 'K05714', 'K02554', 'K01666', 'K04073']
+==
+['K05712', 'K05708+K05709+K05710+K00529_K05711']
+['K05713']
+['K05714']
+['K02554']
+['K01666']
+['K04073']
+++++++++++++++++++
+M00037
+K00502 K01593 K00669 K00543
+['K00502', 'K01593', 'K00669', 'K00543']
+==
+['K00502']
+['K01593']
+['K00669']
+['K00543']
+++++++++++++++++++
+M00038
+(K00453,K00463) (K01432,K14263,K07130) K00486 K01556 K00452 K03392 (K10217,K23234)
+['(K00453,K00463)', '(K01432,K14263,K07130)', 'K00486', 'K01556', 'K00452', 'K03392', '(K10217,K23234)']
+==
+['K00453', 'K00463']
+['K01432', 'K14263', 'K07130']
+['K00486']
+['K01556']
+['K00452']
+['K03392']
+['K10217', 'K23234']
+++++++++++++++++++
+M00027
+K01580 (K13524,K07250,K00823,K16871) (K00135,K00139,K17761)
+['K01580', '(K13524,K07250,K00823,K16871)', '(K00135,K00139,K17761)']
+==
+['K01580']
+['K13524', 'K07250', 'K00823', 'K16871']
+['K00135', 'K00139', 'K17761']
+++++++++++++++++++
+M00369
+K13027 K13029 K13030
+['K13027', 'K13029', 'K13030']
+==
+['K13027']
+['K13029']
+['K13030']
+++++++++++++++++++
+M00118
+(K11204+K11205,K01919) (K21456,K01920)
+['(K11204+K11205,K01919)', '(K21456,K01920)']
+==
+['K01919', 'K11204+K11205']
+['K21456', 'K01920']
+++++++++++++++++++
+M00055
+K01001 (K07432+K07441) K03842 K03843 K03844 K03845 K03846 K03847 K03846 K00729 K03848 K03849 K03850
+['K01001', '(K07432+K07441)', 'K03842', 'K03843', 'K03844', 'K03845', 'K03846', 'K03847', 'K03846', 'K00729', 'K03848', 'K03849', 'K03850']
+==
+['K01001']
+['K07432+K07441']
+['K03842']
+['K03843']
+['K03844']
+['K03845']
+['K03846']
+['K03847']
+['K03846']
+['K00729']
+['K03848']
+['K03849']
+['K03850']
+++++++++++++++++++
+M00072
+K07151+K12666+K12667+K12668+K12669+K12670-K00730-K12691
+['K07151+K12666+K12667+K12668+K12669+K12670-K00730-K12691']
+==
+['K07151+K12666+K12667+K12668+K12669+K12670-K00730-K12691']
+++++++++++++++++++
+M00073
+K01228 K05546 K23741 K01230
+['K01228', 'K05546', 'K23741', 'K01230']
+==
+['K01228']
+['K05546']
+['K23741']
+['K01230']
+++++++++++++++++++
+M00074
+K05546 K23741 K01230 K05528 K05529+K05530 K05529+K05531+K05532+K05533+K05534 K05535
+['K05546', 'K23741', 'K01230', 'K05528', 'K05529+K05530', 'K05529+K05531+K05532+K05533+K05534', 'K05535']
+==
+['K05546']
+['K23741']
+['K01230']
+['K05528']
+['K05529+K05530']
+['K05529+K05531+K05532+K05533+K05534']
+['K05535']
+++++++++++++++++++
+M00056
+K00710 (K00731,K09653) (K00727,K09662,K09663) K00739
+['K00710', '(K00731,K09653)', '(K00727,K09662,K09663)', 'K00739']
+==
+['K00710']
+['K00731', 'K09653']
+['K00727', 'K09662', 'K09663']
+['K00739']
+++++++++++++++++++
+M00065
+(K03857+K03859+K03858+K03861+K03860+(K11001,K11002)-K09658) K03434 K05283 (K05284+K07541) K07542 K05285 K05286 (K05288+K05287)
+['(K03857+K03859+K03858+K03861+K03860+(K11001,K11002)-K09658)', 'K03434', 'K05283', '(K05284+K07541)', 'K07542', 'K05285', 'K05286', '(K05288+K05287)']
+==
+['K03857+K03859+K03858+K03861+K03860+K11001,K11002-K09658']
+['K03434']
+['K05283']
+['K05284+K07541']
+['K07542']
+['K05285']
+['K05286']
+['K05288+K05287']
+++++++++++++++++++
+M00070
+K03766 (K07819,K07820,K03877)
+['K03766', '(K07819,K07820,K03877)']
+==
+['K03766']
+['K07819', 'K07820', 'K03877']
+++++++++++++++++++
+M00071
+K03766 (K07966,K07967,K07968,K07969)
+['K03766', '(K07966,K07967,K07968,K07969)']
+==
+['K03766']
+['K07966', 'K07967', 'K07968', 'K07969']
+++++++++++++++++++
+M00068
+K01988 K00719
+['K01988', 'K00719']
+==
+['K01988']
+['K00719']
+++++++++++++++++++
+M00069
+K03370 (K03371,K03369)
+['K03370', '(K03371,K03369)']
+==
+['K03370']
+['K03371', 'K03369']
+++++++++++++++++++
+M00057
+K00771 K00733 K00734 K10158
+['K00771', 'K00733', 'K00734', 'K10158']
+==
+['K00771']
+['K00733']
+['K00734']
+['K10158']
+++++++++++++++++++
+M00058
+K00746 (K13499,K00747,K03419)
+['K00746', '(K13499,K00747,K03419)']
+==
+['K00746']
+['K13499', 'K00747', 'K03419']
+++++++++++++++++++
+M00059
+(K02369,K02370) (K02366,K02367) (K02368,K02370) (K02576,K02577,K02578,K02579) K01793
+['(K02369,K02370)', '(K02366,K02367)', '(K02368,K02370)', '(K02576,K02577,K02578,K02579)', 'K01793']
+==
+['K02369', 'K02370']
+['K02366', 'K02367']
+['K02368', 'K02370']
+['K02576', 'K02577', 'K02578', 'K02579']
+['K01793']
+++++++++++++++++++
+M00076
+-- K01136 K01217 K01135 K01197 K01195
+['K01136', 'K01217', 'K01135', 'K01197', 'K01195']
+==
+['K01136']
+['K01217']
+['K01135']
+['K01197']
+['K01195']
+++++++++++++++++++
+M00077
+K01135 K01197 K01195 K01132
+['K01135', 'K01197', 'K01195', 'K01132']
+==
+['K01135']
+['K01197']
+['K01195']
+['K01132']
+++++++++++++++++++
+M00078
+(K07964,K07965) K01136 K01217 K01565 K10532 K01205 -- K01195 K01137
+['(K07964,K07965)', 'K01136', 'K01217', 'K01565', 'K10532', 'K01205', 'K01195', 'K01137']
+==
+['K07964', 'K07965']
+['K01136']
+['K01217']
+['K01565']
+['K10532']
+['K01205']
+['K01195']
+['K01137']
+++++++++++++++++++
+M00079
+-- K01132 K12309 K01137 K12373
+['K01132', 'K12309', 'K01137', 'K12373']
+==
+['K01132']
+['K12309']
+['K01137']
+['K12373']
+++++++++++++++++++
+M00060
+K00677 K02535 K02536 K03269 K00748 K00912 K02527 K02517 K02560
+['K00677', 'K02535', 'K02536', 'K03269', 'K00748', 'K00912', 'K02527', 'K02517', 'K02560']
+==
+['K00677']
+['K02535']
+['K02536']
+['K03269']
+['K00748']
+['K00912']
+['K02527']
+['K02517']
+['K02560']
+++++++++++++++++++
+M00866
+K00677 (K02535,K16363) K02536 K03269 K00748 K00912 K02527 K02517 K09778
+['K00677', '(K02535,K16363)', 'K02536', 'K03269', 'K00748', 'K00912', 'K02527', 'K02517', 'K09778']
+==
+['K00677']
+['K02535', 'K16363']
+['K02536']
+['K03269']
+['K00748']
+['K00912']
+['K02527']
+['K02517']
+['K09778']
+++++++++++++++++++
+M00867
+K12977 K03760 K23082+K23083 K23159 K09953
+['K12977', 'K03760', 'K23082+K23083', 'K23159', 'K09953']
+==
+['K12977']
+['K03760']
+['K23082+K23083']
+['K23159']
+['K09953']
+++++++++++++++++++
+M00063
+K06041 K01627 K03270 K00979
+['K06041', 'K01627', 'K03270', 'K00979']
+==
+['K06041']
+['K01627']
+['K03270']
+['K00979']
+++++++++++++++++++
+M00064
+K03271 (K03272,K21344) K03273 (K03272,K21345) K03274
+['K03271', '(K03272,K21344)', 'K03273', '(K03272,K21345)', 'K03274']
+==
+['K03271']
+['K03272', 'K21344']
+['K03273']
+['K03272', 'K21345']
+['K03274']
+++++++++++++++++++
+M00127
+K03147 (K00877,K00941,K14153)(K00878,K14154)(K00788,K14153,K14154) K00946
+['K03147', '(K00877,K00941,K14153)(K00878,K14154)(K00788,K14153,K14154)', 'K00946']
+==
+['K03147']
+['K00877', 'K00941', 'K14153', 'K14154', 'K14153K00878', 'K14154K00788']
+['K00946']
+++++++++++++++++++
+M00124
+K03472 K03473 K00831 K00097 K03474 (K00275,K23998)
+['K03472', 'K03473', 'K00831', 'K00097', 'K03474', '(K00275,K23998)']
+==
+['K03472']
+['K03473']
+['K00831']
+['K00097']
+['K03474']
+['K00275', 'K23998']
+++++++++++++++++++
+M00115
+K00278 K03517 K00767 (K00969,K06210) (K01916,K01950)
+['K00278', 'K03517', 'K00767', '(K00969,K06210)', '(K01916,K01950)']
+==
+['K00278']
+['K03517']
+['K00767']
+['K00969', 'K06210']
+['K01916', 'K01950']
+++++++++++++++++++
+M00810
+K19818+K19819+K19820 (K19826,K19890) K19185+K19186+K19187 K19188 -K20155
+['K19818+K19819+K19820', '(K19826,K19890)', 'K19185+K19186+K19187', 'K19188', '-K20155']
+==
+['K19818+K19819+K19820']
+['K19826', 'K19890']
+['K19185+K19186+K19187']
+['K19188']
+['-K20155']
+++++++++++++++++++
+M00811
+K20170,K20169 (K20170,(K20158 K19700)) K20171-K20172 K15359,K18276
+['K20170,K20169', '(K20170,(K20158_K19700))', 'K20171-K20172', 'K15359,K18276']
+==
+['K20170', 'K20169']
+['K20170', 'K20158_K19700']
+['K20171-K20172']
+['K15359', 'K18276']
+++++++++++++++++++
+M00622
+K18029+K18030 K14974 K18028 K15357 K13995 K01799
+['K18029+K18030', 'K14974', 'K18028', 'K15357', 'K13995', 'K01799']
+==
+['K18029+K18030']
+['K14974']
+['K18028']
+['K15357']
+['K13995']
+['K01799']
+++++++++++++++++++
+M00120
+(K00867,K03525,K09680,K01947) ((K01922,K21977) K01598,K13038) (K02318,(K00954,K02201) K00859)
+['(K00867,K03525,K09680,K01947)', '((K01922,K21977)_K01598,K13038)', '(K02318,(K00954,K02201)_K00859)']
+==
+['K00867', 'K03525', 'K09680', 'K01947']
+['K13038', 'K01922,K21977_K01598']
+['K02318', 'K00954,K02201_K00859']
+++++++++++++++++++
+M00572
+K02169 (K00647,K09458) K00059 K02372 K00208 (K02170,K09789,K19560,K19561)
+['K02169', '(K00647,K09458)', 'K00059', 'K02372', 'K00208', '(K02170,K09789,K19560,K19561)']
+==
+['K02169']
+['K00647', 'K09458']
+['K00059']
+['K02372']
+['K00208']
+['K02170', 'K09789', 'K19560', 'K19561']
+++++++++++++++++++
+M00123
+K00652 ((K00833,K19563) K01935,K19562) K01012
+['K00652', '((K00833,K19563)_K01935,K19562)', 'K01012']
+==
+['K00652']
+['K19562', 'K00833,K19563_K01935']
+['K01012']
+++++++++++++++++++
+M00573
+K16593 K00652 K19563 K01935 K01012
+['K16593', 'K00652', 'K19563', 'K01935', 'K01012']
+==
+['K16593']
+['K00652']
+['K19563']
+['K01935']
+['K01012']
+++++++++++++++++++
+M00577
+K01906 K00652 (K00833,K19563) K01935 K01012
+['K01906', 'K00652', '(K00833,K19563)', 'K01935', 'K01012']
+==
+['K01906']
+['K00652']
+['K00833', 'K19563']
+['K01935']
+['K01012']
+++++++++++++++++++
+M00126
+(K01495,K09007,K22391) (K01077,K01113,(K08310,K19965)) ((K13939,(K13940,K01633 K00950) K00796),(K01633 K13941)) (K11754,K20457) (K00287,K13998)
+['(K01495,K09007,K22391)', '(K01077,K01113,(K08310,K19965))', '((K13939,(K13940,K01633_K00950)_K00796),(K01633_K13941))', '(K11754,K20457)', '(K00287,K13998)']
+==
+['K01495', 'K09007', 'K22391']
+['K01077', 'K01113', 'K08310,K19965']
+['K01633_K13941', 'K13939,K13940,K01633_K00950_K00796']
+['K11754', 'K20457']
+['K00287', 'K13998']
+++++++++++++++++++
+M00840
+K14652 K22100 -- K01633 K13941 K22099 K00287
+['K14652', 'K22100', 'K01633', 'K13941', 'K22099', 'K00287']
+==
+['K14652']
+['K22100']
+['K01633']
+['K13941']
+['K22099']
+['K00287']
+++++++++++++++++++
+M00841
+K01495 K22101 K00950 K00796 K11754 K13998
+['K01495', 'K22101', 'K00950', 'K00796', 'K11754', 'K13998']
+==
+['K01495']
+['K22101']
+['K00950']
+['K00796']
+['K11754']
+['K13998']
+++++++++++++++++++
+M00842
+K01495 K01737 K00072
+['K01495', 'K01737', 'K00072']
+==
+['K01495']
+['K01737']
+['K00072']
+++++++++++++++++++
+M00843
+K01495 K01737 K17745
+['K01495', 'K01737', 'K17745']
+==
+['K01495']
+['K01737']
+['K17745']
+++++++++++++++++++
+M00880
+((K03639 K03637),K20967) (K03635,K21142) (((K03831,K03638) K03750),K15376)
+['((K03639_K03637),K20967)', '(K03635,K21142)', '(((K03831,K03638)_K03750),K15376)']
+==
+['K20967', 'K03639_K03637']
+['K03635', 'K21142']
+['K15376', 'K03831,K03638_K03750']
+++++++++++++++++++
+M00140
+K00600 (K01491,(K00300 K01500)) K01938
+['K00600', '(K01491,(K00300_K01500))', 'K01938']
+==
+['K00600']
+['K01491', 'K00300_K01500']
+['K01938']
+++++++++++++++++++
+M00141
+K00600 (K00288,(K13403 K13402))
+['K00600', '(K00288,(K13403_K13402))']
+==
+['K00600']
+['K00288', 'K13403_K13402']
+++++++++++++++++++
+M00868
+K00643 K01698 K01749 K01719 K01599 K00228 K00231 K01772
+['K00643', 'K01698', 'K01749', 'K01719', 'K01599', 'K00228', 'K00231', 'K01772']
+==
+['K00643']
+['K01698']
+['K01749']
+['K01719']
+['K01599']
+['K00228']
+['K00231']
+['K01772']
+++++++++++++++++++
+M00121
+(K01885,K14163) K02492 K01845 K01698 K01749 (K01719,K13542,K13543) K01599 (K00228,K02495) (K00230,K00231) K01772
+['(K01885,K14163)', 'K02492', 'K01845', 'K01698', 'K01749', '(K01719,K13542,K13543)', 'K01599', '(K00228,K02495)', '(K00230,K00231)', 'K01772']
+==
+['K01885', 'K14163']
+['K02492']
+['K01845']
+['K01698']
+['K01749']
+['K01719', 'K13542', 'K13543']
+['K01599']
+['K00228', 'K02495']
+['K00230', 'K00231']
+['K01772']
+++++++++++++++++++
+M00846
+(K01885,K14163) K02492 K01845 K01698 K01749 (K01719,K13542,K13543) (K02302,(K00589,K02303,K02496,K13542,K13543)+K02304-K03794)
+['(K01885,K14163)', 'K02492', 'K01845', 'K01698', 'K01749', '(K01719,K13542,K13543)', '(K02302,(K00589,K02303,K02496,K13542,K13543)+K02304-K03794)']
+==
+['K01885', 'K14163']
+['K02492']
+['K01845']
+['K01698']
+['K01749']
+['K01719', 'K13542', 'K13543']
+['K02302', 'K00589,K02303,K02496,K13542,K13543+K02304-K03794']
+++++++++++++++++++
+M00847
+K22225 K22226 K22227
+['K22225', 'K22226', 'K22227']
+==
+['K22225']
+['K22226']
+['K22227']
+++++++++++++++++++
+M00836
+K22011 K22012 (K21610+K21611) K21612
+['K22011', 'K22012', '(K21610+K21611)', 'K21612']
+==
+['K22011']
+['K22012']
+['K21610+K21611']
+['K21612']
+++++++++++++++++++
+M00117
+(K03181,K18240) K03179 (K03182+K03186) K18800 K00568 K03185 K03183 K03184 K00568
+['(K03181,K18240)', 'K03179', '(K03182+K03186)', 'K18800', 'K00568', 'K03185', 'K03183', 'K03184', 'K00568']
+==
+['K03181', 'K18240']
+['K03179']
+['K03182+K03186']
+['K18800']
+['K00568']
+['K03185']
+['K03183']
+['K03184']
+['K00568']
+++++++++++++++++++
+M00128
+K06125 K00591 K06126 K06127 K06134 K00591
+['K06125', 'K00591', 'K06126', 'K06127', 'K06134', 'K00591']
+==
+['K06125']
+['K00591']
+['K06126']
+['K06127']
+['K06134']
+['K00591']
+++++++++++++++++++
+M00116
+K02552 K02551 K08680 K02549 K01911 K01661 K19222 K02548 K03183
+['K02552', 'K02551', 'K08680', 'K02549', 'K01911', 'K01661', 'K19222', 'K02548', 'K03183']
+==
+['K02552']
+['K02551']
+['K08680']
+['K02549']
+['K01911']
+['K01661']
+['K19222']
+['K02548']
+['K03183']
+++++++++++++++++++
+M00112
+K09833 (K12502,K18534) K09834 K05928
+['K09833', '(K12502,K18534)', 'K09834', 'K05928']
+==
+['K09833']
+['K12502', 'K18534']
+['K09834']
+['K05928']
+++++++++++++++++++
+M00095
+K00626 K01641 K00021 K00869 (K00938,K13273) K01597 K01823
+['K00626', 'K01641', 'K00021', 'K00869', '(K00938,K13273)', 'K01597', 'K01823']
+==
+['K00626']
+['K01641']
+['K00021']
+['K00869']
+['K00938', 'K13273']
+['K01597']
+['K01823']
+++++++++++++++++++
+M00849
+K00626 K01641 (K00021,K00054) ((K00869 K17942),(K18689 K18690 K22813)) K06981 K01823
+['K00626', 'K01641', '(K00021,K00054)', '((K00869_K17942),(K18689_K18690_K22813))', 'K06981', 'K01823']
+==
+['K00626']
+['K01641']
+['K00021', 'K00054']
+['K00869_K17942', 'K18689_K18690_K22813']
+['K06981']
+['K01823']
+++++++++++++++++++
+M00096
+K01662 K00099 (K00991,K12506) K00919 (K01770,K12506) K03526 K03527 K01823
+['K01662', 'K00099', '(K00991,K12506)', 'K00919', '(K01770,K12506)', 'K03526', 'K03527', 'K01823']
+==
+['K01662']
+['K00099']
+['K00991', 'K12506']
+['K00919']
+['K01770', 'K12506']
+['K03526']
+['K03527']
+['K01823']
+++++++++++++++++++
+M00364
+K01823 (K00795,K13789,K13787)
+['K01823', '(K00795,K13789,K13787)']
+==
+['K01823']
+['K00795', 'K13789', 'K13787']
+++++++++++++++++++
+M00365
+K01823 K13787
+['K01823', 'K13787']
+==
+['K01823']
+['K13787']
+++++++++++++++++++
+M00366
+K01823 K14066 K00787 K13789
+['K01823', 'K14066', 'K00787', 'K13789']
+==
+['K01823']
+['K14066']
+['K00787']
+['K13789']
+++++++++++++++++++
+M00367
+K01823 K00787 K00804
+['K01823', 'K00787', 'K00804']
+==
+['K01823']
+['K00787']
+['K00804']
+++++++++++++++++++
+M00097
+K02291 K02293 K15744 K00514 K09835 K06443
+['K02291', 'K02293', 'K15744', 'K00514', 'K09835', 'K06443']
+==
+['K02291']
+['K02293']
+['K15744']
+['K00514']
+['K09835']
+['K06443']
+++++++++++++++++++
+M00372
+(K15746,K15747) K09838 -K14594 K09840 K09841 K09842
+['(K15746,K15747)', 'K09838', '-K14594', 'K09840', 'K09841', 'K09842']
+==
+['K15746', 'K15747']
+['K09838']
+['-K14594']
+['K09840']
+['K09841']
+['K09842']
+++++++++++++++++++
+M00371
+(K09587,K12639) K09588 K09591 (K12637,K12638) K20623 (K09590,K12640)
+['(K09587,K12639)', 'K09588', 'K09591', '(K12637,K12638)', 'K20623', '(K09590,K12640)']
+==
+['K09587', 'K12639']
+['K09588']
+['K09591']
+['K12637', 'K12638']
+['K20623']
+['K09590', 'K12640']
+++++++++++++++++++
+M00773
+K15988 K15989+K15990 K15992 K15991 K15993 K15994 K15995 K15996
+['K15988', 'K15989+K15990', 'K15992', 'K15991', 'K15993', 'K15994', 'K15995', 'K15996']
+==
+['K15988']
+['K15989+K15990']
+['K15992']
+['K15991']
+['K15993']
+['K15994']
+['K15995']
+['K15996']
+++++++++++++++++++
+M00774
+K10817 K14366 K14367 K14368+K15997 K14370 K14369
+['K10817', 'K14366', 'K14367', 'K14368+K15997', 'K14370', 'K14369']
+==
+['K10817']
+['K14366']
+['K14367']
+['K14368+K15997']
+['K14370']
+['K14369']
+++++++++++++++++++
+M00775
+K16007 K16008 K16009 K13320 K16010
+['K16007', 'K16008', 'K16009', 'K13320', 'K16010']
+==
+['K16007']
+['K16008']
+['K16009']
+['K13320']
+['K16010']
+++++++++++++++++++
+M00776
+K16000+K16001+K16002-K16003 K16004 K16005 K16006
+['K16000+K16001+K16002-K16003', 'K16004', 'K16005', 'K16006']
+==
+['K16000+K16001+K16002-K16003']
+['K16004']
+['K16005']
+['K16006']
+++++++++++++++++++
+M00777
+K14371 K14372 K14373 K14374 K14375
+['K14371', 'K14372', 'K14373', 'K14374', 'K14375']
+==
+['K14371']
+['K14372']
+['K14373']
+['K14374']
+['K14375']
+++++++++++++++++++
+M00824
+K15314 K21160+K21161+K21162+K21163+K21164+K21165+K21166+K21167
+['K15314', 'K21160+K21161+K21162+K21163+K21164+K21165+K21166+K21167']
+==
+['K15314']
+['K21160+K21161+K21162+K21163+K21164+K21165+K21166+K21167']
+++++++++++++++++++
+M00825
+K15314 K21168+K21169+K21170+K21171+K21172+K21173+K21174
+['K15314', 'K21168+K21169+K21170+K21171+K21172+K21173+K21174']
+==
+['K15314']
+['K21168+K21169+K21170+K21171+K21172+K21173+K21174']
+++++++++++++++++++
+M00826
+K20159+K21175 K20156 K21176 K21177 K21178 K21179
+['K20159+K21175', 'K20156', 'K21176', 'K21177', 'K21178', 'K21179']
+==
+['K20159+K21175']
+['K20156']
+['K21176']
+['K21177']
+['K21178']
+['K21179']
+++++++++++++++++++
+M00829
+K15320 K21191 K21192
+['K15320', 'K21191', 'K21192']
+==
+['K15320']
+['K21191']
+['K21192']
+++++++++++++++++++
+M00830
+K20422 K20420 K20421 K20423
+['K20422', 'K20420', 'K20421', 'K20423']
+==
+['K20422']
+['K20420']
+['K20421']
+['K20423']
+++++++++++++++++++
+M00831
+K21221 K21222 K21223 K21224 K21225
+['K21221', 'K21222', 'K21223', 'K21224', 'K21225']
+==
+['K21221']
+['K21222']
+['K21223']
+['K21224']
+['K21225']
+++++++++++++++++++
+M00834
+K21254 K21255 K21256 K21257 K21258
+['K21254', 'K21255', 'K21256', 'K21257', 'K21258']
+==
+['K21254']
+['K21255']
+['K21256']
+['K21257']
+['K21258']
+++++++++++++++++++
+M00778
+K05551+K05552+K05553 -K12420 ((K05554,K14249,K15884,K15885) (K05555,K14250),K15886)
+['K05551+K05552+K05553', '-K12420', '((K05554,K14249,K15884,K15885)_(K05555,K14250),K15886)']
+==
+['K05551+K05552+K05553']
+['-K12420']
+['K15886', 'K05554,K14249,K15884,K15885_K05555,K14250']
+++++++++++++++++++
+M00779
+K05556 (K14626,K14627) (K14628,K14629) (K14630+K14631,K14632)
+['K05556', '(K14626,K14627)', '(K14628,K14629)', '(K14630+K14631,K14632)']
+==
+['K05556']
+['K14626', 'K14627']
+['K14628', 'K14629']
+['K14632', 'K14630+K14631']
+++++++++++++++++++
+M00780
+K14251 K14252 K14253 K14254 K14255 K14256 K21301
+['K14251', 'K14252', 'K14253', 'K14254', 'K14255', 'K14256', 'K21301']
+==
+['K14251']
+['K14252']
+['K14253']
+['K14254']
+['K14255']
+['K14256']
+['K21301']
+++++++++++++++++++
+M00823
+K14251 K14252 K14253 K14254 K14255 K14256 K21301 K14257+K21297
+['K14251', 'K14252', 'K14253', 'K14254', 'K14255', 'K14256', 'K21301', 'K14257+K21297']
+==
+['K14251']
+['K14252']
+['K14253']
+['K14254']
+['K14255']
+['K14256']
+['K21301']
+['K14257+K21297']
+++++++++++++++++++
+M00781
+K15941 K15942 K15943 K15944
+['K15941', 'K15942', 'K15943', 'K15944']
+==
+['K15941']
+['K15942']
+['K15943']
+['K15944']
+++++++++++++++++++
+M00782
+K15959 K15960 K15961 K15963 K15964 K15965 K15966 K15967
+['K15959', 'K15960', 'K15961', 'K15963', 'K15964', 'K15965', 'K15966', 'K15967']
+==
+['K15959']
+['K15960']
+['K15961']
+['K15963']
+['K15964']
+['K15965']
+['K15966']
+['K15967']
+++++++++++++++++++
+M00783
+K15968 K15969 K15886 -K15970 K15971 K15972
+['K15968', 'K15969', 'K15886', '-K15970', 'K15971', 'K15972']
+==
+['K15968']
+['K15969']
+['K15886']
+['-K15970']
+['K15971']
+['K15972']
+++++++++++++++++++
+M00784
+K19566 K19567 K19568 K19569 K19570
+['K19566', 'K19567', 'K19568', 'K19569', 'K19570']
+==
+['K19566']
+['K19567']
+['K19568']
+['K19569']
+['K19570']
+++++++++++++++++++
+M00793
+K00973 K01710 (K01790 K00067,K23987)
+['K00973', 'K01710', '(K01790_K00067,K23987)']
+==
+['K00973']
+['K01710']
+['K23987', 'K01790_K00067']
+++++++++++++++++++
+M00794
+K13312 K13313
+['K13312', 'K13313']
+==
+['K13312']
+['K13313']
+++++++++++++++++++
+M00795
+K19855 K12710 K17625
+['K19855', 'K12710', 'K17625']
+==
+['K19855']
+['K12710']
+['K17625']
+++++++++++++++++++
+M00796
+K19853 K19854 K13307
+['K19853', 'K19854', 'K13307']
+==
+['K19853']
+['K19854']
+['K13307']
+++++++++++++++++++
+M00797
+K13308 K13309 (K13310,K16436) (K13311,K13326)
+['K13308', 'K13309', '(K13310,K16436)', '(K13311,K13326)']
+==
+['K13308']
+['K13309']
+['K13310', 'K16436']
+['K13311', 'K13326']
+++++++++++++++++++
+M00798
+K16435 K13315 K13317 (K13316,K16438) K13318
+['K16435', 'K13315', 'K13317', '(K13316,K16438)', 'K13318']
+==
+['K16435']
+['K13315']
+['K13317']
+['K13316', 'K16438']
+['K13318']
+++++++++++++++++++
+M00799
+K16435 K13315 K16438 K19856 K19857
+['K16435', 'K13315', 'K16438', 'K19856', 'K19857']
+==
+['K16435']
+['K13315']
+['K16438']
+['K19856']
+['K19857']
+++++++++++++++++++
+M00800
+K16435 K16436 K13326 K16438 K13322
+['K16435', 'K16436', 'K13326', 'K16438', 'K13322']
+==
+['K16435']
+['K16436']
+['K13326']
+['K16438']
+['K13322']
+++++++++++++++++++
+M00801
+K16435 K13327 K19858 K13319
+['K16435', 'K13327', 'K19858', 'K13319']
+==
+['K16435']
+['K13327']
+['K19858']
+['K13319']
+++++++++++++++++++
+M00802
+K16435 K13327 K13328 K13329 K13330
+['K16435', 'K13327', 'K13328', 'K13329', 'K13330']
+==
+['K16435']
+['K13327']
+['K13328']
+['K13329']
+['K13330']
+++++++++++++++++++
+M00803
+K16435 K19859 K16436 K13332
+['K16435', 'K19859', 'K16436', 'K13332']
+==
+['K16435']
+['K19859']
+['K16436']
+['K13332']
+++++++++++++++++++
+M00672
+K12743 K04126 K10852
+['K12743', 'K04126', 'K10852']
+==
+['K12743']
+['K04126']
+['K10852']
+++++++++++++++++++
+M00673
+K12743 K04126 K04127 K12744 K12745 K04128 K18062 K18063
+['K12743', 'K04126', 'K04127', 'K12744', 'K12745', 'K04128', 'K18062', 'K18063']
+==
+['K12743']
+['K04126']
+['K04127']
+['K12744']
+['K12745']
+['K04128']
+['K18062']
+['K18063']
+++++++++++++++++++
+M00675
+K18317 K18316 K18315
+['K18317', 'K18316', 'K18315']
+==
+['K18317']
+['K18316']
+['K18315']
+++++++++++++++++++
+M00736
+K19102+K19103+K05375 K19104 K19105 K19106
+['K19102+K19103+K05375', 'K19104', 'K19105', 'K19106']
+==
+['K19102+K19103+K05375']
+['K19104']
+['K19105']
+['K19106']
+++++++++++++++++++
+M00674
+K12673 K12674 K12675 K12676
+['K12673', 'K12674', 'K12675', 'K12676']
+==
+['K12673']
+['K12674']
+['K12675']
+['K12676']
+++++++++++++++++++
+M00039
+(K10775,K13064) K00487 K01904 K13065 K09754 K00588 K09753 K09755 K13066 (K00083,K22395)
+['(K10775,K13064)', 'K00487', 'K01904', 'K13065', 'K09754', 'K00588', 'K09753', 'K09755', 'K13066', '(K00083,K22395)']
+==
+['K10775', 'K13064']
+['K00487']
+['K01904']
+['K13065']
+['K09754']
+['K00588']
+['K09753']
+['K09755']
+['K13066']
+['K00083', 'K22395']
+++++++++++++++++++
+M00137
+K10775 K00487 K01904 K00660 K01859
+['K10775', 'K00487', 'K01904', 'K00660', 'K01859']
+==
+['K10775']
+['K00487']
+['K01904']
+['K00660']
+['K01859']
+++++++++++++++++++
+M00138
+K00475 K13082 K05277
+['K00475', 'K13082', 'K05277']
+==
+['K00475']
+['K13082']
+['K05277']
+++++++++++++++++++
+M00661
+K18385 K18386 K18387
+['K18385', 'K18386', 'K18387']
+==
+['K18385']
+['K18386']
+['K18387']
+++++++++++++++++++
+M00370
+(K11812,K11813) K11818 K11819 K11820 K11821
+['(K11812,K11813)', 'K11818', 'K11819', 'K11820', 'K11821']
+==
+['K11812', 'K11813']
+['K11818']
+['K11819']
+['K11820']
+['K11821']
+++++++++++++++++++
+M00814
+K19969 K19979 K19974 K20424 K20425 K20426 K20427 K20430 -- --
+['K19969', 'K19979', 'K19974', 'K20424', 'K20425', 'K20426', 'K20427', 'K20430']
+==
+['K19969']
+['K19979']
+['K19974']
+['K20424']
+['K20425']
+['K20426']
+['K20427']
+['K20430']
+++++++++++++++++++
+M00815
+K19969 K20431 K20432 K20433 K20434 K20435 K20436 K20437 K20438
+['K19969', 'K20431', 'K20432', 'K20433', 'K20434', 'K20435', 'K20436', 'K20437', 'K20438']
+==
+['K19969']
+['K20431']
+['K20432']
+['K20433']
+['K20434']
+['K20435']
+['K20436']
+['K20437']
+['K20438']
+++++++++++++++++++
+M00786
+K18281 K14132 K17475 K18280 K17827 K17826 K14134 K17825 K18279
+['K18281', 'K14132', 'K17475', 'K18280', 'K17827', 'K17826', 'K14134', 'K17825', 'K18279']
+==
+['K18281']
+['K14132']
+['K17475']
+['K18280']
+['K17827']
+['K17826']
+['K14134']
+['K17825']
+['K18279']
+++++++++++++++++++
+M00789
+K14266 K19884 K19885 K19886+K19887 K19888 K19889
+['K14266', 'K19884', 'K19885', 'K19886+K19887', 'K19888', 'K19889']
+==
+['K14266']
+['K19884']
+['K19885']
+['K19886+K19887']
+['K19888']
+['K19889']
+++++++++++++++++++
+M00790
+K14266 K19981 K14257 K19982
+['K14266', 'K19981', 'K14257', 'K19982']
+==
+['K14266']
+['K19981']
+['K14257']
+['K19982']
+++++++++++++++++++
+M00805
+K20075 K20076 K20077+K20078 K20079 K20080 K20081 K20082
+['K20075', 'K20076', 'K20077+K20078', 'K20079', 'K20080', 'K20081', 'K20082']
+==
+['K20075']
+['K20076']
+['K20077+K20078']
+['K20079']
+['K20080']
+['K20081']
+['K20082']
+++++++++++++++++++
+M00808
+K20086 K20087+K20088 K20089 K20090
+['K20086', 'K20087+K20088', 'K20089', 'K20090']
+==
+['K20086']
+['K20087+K20088']
+['K20089']
+['K20090']
+++++++++++++++++++
+M00835
+K13063 K20261 K06998 K20260 K20262 K21103 K20940
+['K13063', 'K20261', 'K06998', 'K20260', 'K20262', 'K21103', 'K20940']
+==
+['K13063']
+['K20261']
+['K06998']
+['K20260']
+['K20262']
+['K21103']
+['K20940']
+++++++++++++++++++
+M00877
+K18652 K18653 K18654
+['K18652', 'K18653', 'K18654']
+==
+['K18652']
+['K18653']
+['K18654']
+++++++++++++++++++
+M00787
+K19546 K19547 K19550 K19549 K19548 K13037
+['K19546', 'K19547', 'K19550', 'K19549', 'K19548', 'K13037']
+==
+['K19546']
+['K19547']
+['K19550']
+['K19549']
+['K19548']
+['K13037']
+++++++++++++++++++
+M00848
+K09460 K02078+K14245+K14246+K22798 K22799 K22800 K21272 K21271 K22801 K22802
+['K09460', 'K02078+K14245+K14246+K22798', 'K22799', 'K22800', 'K21272', 'K21271', 'K22801', 'K22802']
+==
+['K09460']
+['K02078+K14245+K14246+K22798']
+['K22799']
+['K22800']
+['K21272']
+['K21271']
+['K22801']
+['K22802']
+++++++++++++++++++
+M00788
+K19835 K19834
+['K19835', 'K19834']
+==
+['K19835']
+['K19834']
+++++++++++++++++++
+M00819
+K12250 K15907 -- K18056 K17747 K18091 K18057 K17476
+['K12250', 'K15907', 'K18056', 'K17747', 'K18091', 'K18057', 'K17476']
+==
+['K12250']
+['K15907']
+['K18056']
+['K17747']
+['K18091']
+['K18057']
+['K17476']
+++++++++++++++++++
+M00876
+K21898 K23446 K23447
+['K21898', 'K23446', 'K23447']
+==
+['K21898']
+['K23446']
+['K23447']
+++++++++++++++++++
+M00875
+K23371 K21949 K21721 K23372 K23373 K23374 K23375
+['K23371', 'K21949', 'K21721', 'K23372', 'K23373', 'K23374', 'K23375']
+==
+['K23371']
+['K21949']
+['K21721']
+['K23372']
+['K23373']
+['K23374']
+['K23375']
+++++++++++++++++++
+M00538
+K15760+K15761-K15762+K15763+K15764-K15765 K00055 K00141
+['K15760+K15761-K15762+K15763+K15764-K15765', 'K00055', 'K00141']
+==
+['K15760+K15761-K15762+K15763+K15764-K15765']
+['K00055']
+['K00141']
+++++++++++++++++++
+M00537
+K15757+K15758 K00055 K00141
+['K15757+K15758', 'K00055', 'K00141']
+==
+['K15757+K15758']
+['K00055']
+['K00141']
+++++++++++++++++++
+M00419
+K10616+K18293 K10617 K10618
+['K10616+K18293', 'K10617', 'K10618']
+==
+['K10616+K18293']
+['K10617']
+['K10618']
+++++++++++++++++++
+M00547
+K03268+K16268+K18089+K18090 K16269
+['K03268+K16268+K18089+K18090', 'K16269']
+==
+['K03268+K16268+K18089+K18090']
+['K16269']
+++++++++++++++++++
+M00548
+K16249+K16243+K16244+K16242+K16245+K16246
+['K16249+K16243+K16244+K16242+K16245+K16246']
+==
+['K16249+K16243+K16244+K16242+K16245+K16246']
+++++++++++++++++++
+M00551
+K05549+K05550+K05784 K05783
+['K05549+K05550+K05784', 'K05783']
+==
+['K05549+K05550+K05784']
+['K05783']
+++++++++++++++++++
+M00637
+(K05599+K05600+K11311,K16319+K16320+K18248+K18249)
+['(K05599+K05600+K11311,K16319+K16320+K18248+K18249)']
+==
+['K05599+K05600+K11311', 'K16319+K16320+K18248+K18249']
+++++++++++++++++++
+M00568
+K03381 K01856 K03464 (K01055,K14727)
+['K03381', 'K01856', 'K03464', '(K01055,K14727)']
+==
+['K03381']
+['K01856']
+['K03464']
+['K01055', 'K14727']
+++++++++++++++++++
+M00569
+(K00446,K07104) ((K10217 K01821 K01617),K10216) (K18364,K02554) (K18365,K01666) (K18366,K04073)
+['(K00446,K07104)', '((K10217_K01821_K01617),K10216)', '(K18364,K02554)', '(K18365,K01666)', '(K18366,K04073)']
+==
+['K00446', 'K07104']
+['K10216', 'K10217_K01821_K01617']
+['K18364', 'K02554']
+['K18365', 'K01666']
+['K18366', 'K04073']
+++++++++++++++++++
+M00539
+K10619+K16303+K16304+K18227 K10620 K10621 K10622 K10623
+['K10619+K16303+K16304+K18227', 'K10620', 'K10621', 'K10622', 'K10623']
+==
+['K10619+K16303+K16304+K18227']
+['K10620']
+['K10621']
+['K10622']
+['K10623']
+++++++++++++++++++
+M00543
+K08689+K15750+K18087+K18088 K08690 K00462 K10222
+['K08689+K15750+K18087+K18088', 'K08690', 'K00462', 'K10222']
+==
+['K08689+K15750+K18087+K18088']
+['K08690']
+['K00462']
+['K10222']
+++++++++++++++++++
+M00544
+K15751-K15752-K15753 K15754+K15755 K15756
+['K15751-K15752-K15753', 'K15754+K15755', 'K15756']
+==
+['K15751-K15752-K15753']
+['K15754+K15755']
+['K15756']
+++++++++++++++++++
+M00418
+K07540 K07543+K07544 K07545 K07546 K07547+K07548 K07549+K07550
+['K07540', 'K07543+K07544', 'K07545', 'K07546', 'K07547+K07548', 'K07549+K07550']
+==
+['K07540']
+['K07543+K07544']
+['K07545']
+['K07546']
+['K07547+K07548']
+['K07549+K07550']
+++++++++++++++++++
+M00541
+(K04112+K04113+K04114+K04115,K19515+K19516) K07537 K07538 K07539
+['(K04112+K04113+K04114+K04115,K19515+K19516)', 'K07537', 'K07538', 'K07539']
+==
+['K19515+K19516', 'K04112+K04113+K04114+K04115']
+['K07537']
+['K07538']
+['K07539']
+++++++++++++++++++
+M00540
+K04116 K04117 K07534 K07535 K07536
+['K04116', 'K04117', 'K07534', 'K07535', 'K07536']
+==
+['K04116']
+['K04117']
+['K07534']
+['K07535']
+['K07536']
+++++++++++++++++++
+M00534
+K14579+K14580+K14578+K14581 K14582 K14583 K14584 K14585 K00152
+['K14579+K14580+K14578+K14581', 'K14582', 'K14583', 'K14584', 'K14585', 'K00152']
+==
+['K14579+K14580+K14578+K14581']
+['K14582']
+['K14583']
+['K14584']
+['K14585']
+['K00152']
+++++++++++++++++++
+M00638
+K18242+K18243+K14578+K14581
+['K18242+K18243+K14578+K14581']
+==
+['K18242+K18243+K14578+K14581']
+++++++++++++++++++
+M00624
+K18074+K18075+K18077 K18076
+['K18074+K18075+K18077', 'K18076']
+==
+['K18074+K18075+K18077']
+['K18076']
+++++++++++++++++++
+M00623
+K18068+K18069 K18067 K04102
+['K18068+K18069', 'K18067', 'K04102']
+==
+['K18068+K18069']
+['K18067']
+['K04102']
+++++++++++++++++++
+M00636
+K18251+K18252-K18253-K18254 K18255 K18256
+['K18251+K18252-K18253-K18254', 'K18255', 'K18256']
+==
+['K18251+K18252-K18253-K18254']
+['K18255']
+['K18256']
+++++++++++++++++++
+M00878
+K01912 K02609+K02610+K02611+K02612+K02613 K15866 K02618 K02615 K01692 K00074
+['K01912', 'K02609+K02610+K02611+K02612+K02613', 'K15866', 'K02618', 'K02615', 'K01692', 'K00074']
+==
+['K01912']
+['K02609+K02610+K02611+K02612+K02613']
+['K15866']
+['K02618']
+['K02615']
+['K01692']
+['K00074']
+++++++++++++++++++
+M00852
+K10961 K10920 K10919 K10930 K10931 K10962 K10932 K10963 K10933 K10964 K10965 K10934 K10935 K10966
+['K10961', 'K10920', 'K10919', 'K10930', 'K10931', 'K10962', 'K10932', 'K10963', 'K10933', 'K10964', 'K10965', 'K10934', 'K10935', 'K10966']
+==
+['K10961']
+['K10920']
+['K10919']
+['K10930']
+['K10931']
+['K10962']
+['K10932']
+['K10963']
+['K10933']
+['K10964']
+['K10965']
+['K10934']
+['K10935']
+['K10966']
+++++++++++++++++++
+M00850
+(K10928+K10929) K10954 K10952 K10953 K10948 K11018
+['(K10928+K10929)', 'K10954', 'K10952', 'K10953', 'K10948', 'K11018']
+==
+['K10928+K10929']
+['K10954']
+['K10952']
+['K10953']
+['K10948']
+['K11018']
+++++++++++++++++++
+M00542
+K03221+K03219+K03222+K03226+K03227+K03228+K03229+K03230+K03224+K03225 K12784 K12787 K12785 K12786 K12788 K16041 K16042
+['K03221+K03219+K03222+K03226+K03227+K03228+K03229+K03230+K03224+K03225', 'K12784', 'K12787', 'K12785', 'K12786', 'K12788', 'K16041', 'K16042']
+==
+['K03221+K03219+K03222+K03226+K03227+K03228+K03229+K03230+K03224+K03225']
+['K12784']
+['K12787']
+['K12785']
+['K12786']
+['K12788']
+['K16041']
+['K16042']
+++++++++++++++++++
+M00363
+K11006 K11007
+['K11006', 'K11007']
+==
+['K11006']
+['K11007']
+++++++++++++++++++
+M00853
+K22850 -K22851 K22852 K22853 K22854
+['K22850', '-K22851', 'K22852', 'K22853', 'K22854']
+==
+['K22850']
+['-K22851']
+['K22852']
+['K22853']
+['K22854']
+++++++++++++++++++
+M00576
+(K10928+K10929) (K16883,K16884)
+['(K10928+K10929)', '(K16883,K16884)']
+==
+['K10928+K10929']
+['K16883', 'K16884']
+++++++++++++++++++
+M00856
+K11014 K11023 K19298 K22918
+['K11014', 'K11023', 'K19298', 'K22918']
+==
+['K11014']
+['K11023']
+['K19298']
+['K22918']
+++++++++++++++++++
+M00857
+K22914 K22926 K22925 K22915 K22916 K22917 (K22924+K22921+K22923+K22922)
+['K22914', 'K22926', 'K22925', 'K22915', 'K22916', 'K22917', '(K22924+K22921+K22923+K22922)']
+==
+['K22914']
+['K22926']
+['K22925']
+['K22915']
+['K22916']
+['K22917']
+['K22924+K22921+K22923+K22922']
+++++++++++++++++++
+M00575
+K22944 K11004 K07389 K11003 K12340
+['K22944', 'K11004', 'K07389', 'K11003', 'K12340']
+==
+['K22944']
+['K11004']
+['K07389']
+['K11003']
+['K12340']
+++++++++++++++++++
+M00574
+K11023 -K11024 K11025 K11026 K11027
+['K11023', '-K11024', 'K11025', 'K11026', 'K11027']
+==
+['K11023']
+['-K11024']
+['K11025']
+['K11026']
+['K11027']
+++++++++++++++++++
+M00564
+K15842 K12086 -K12087 K12088 K12089 K12090 K03196 K12091 -K12092 K12093 K12094 K12095 K12096 K12097 K12098 -K12099 -K12100 K12101 K12102 K12103 K12104 K12105 K12106 K12107 K12108 K12109 K12110
+['K15842', 'K12086', '-K12087', 'K12088', 'K12089', 'K12090', 'K03196', 'K12091', '-K12092', 'K12093', 'K12094', 'K12095', 'K12096', 'K12097', 'K12098', '-K12099', '-K12100', 'K12101', 'K12102', 'K12103', 'K12104', 'K12105', 'K12106', 'K12107', 'K12108', 'K12109', 'K12110']
+==
+['K15842']
+['K12086']
+['-K12087']
+['K12088']
+['K12089']
+['K12090']
+['K03196']
+['K12091']
+['-K12092']
+['K12093']
+['K12094']
+['K12095']
+['K12096']
+['K12097']
+['K12098']
+['-K12099']
+['-K12100']
+['K12101']
+['K12102']
+['K12103']
+['K12104']
+['K12105']
+['K12106']
+['K12107']
+['K12108']
+['K12109']
+['K12110']
+++++++++++++++++++
+M00859
+K11030 K08645 K11029
+['K11030', 'K08645', 'K11029']
+==
+['K11030']
+['K08645']
+['K11029']
+++++++++++++++++++
+M00860
+K22976 K22977 K22980 K07282 K22116 K01932 K22981
+['K22976', 'K22977', 'K22980', 'K07282', 'K22116', 'K01932', 'K22981']
+==
+['K22976']
+['K22977']
+['K22980']
+['K07282']
+['K22116']
+['K01932']
+['K22981']
+++++++++++++++++++
+M00851
+(K18768,K18970,K19316,K22346,K18794,K19318,K18971,K18793,K19319,K19320,K19321,K19322,K18972,K19211,K18976,K21277,K18782,K18781,K18780,K19099,K19216)
+['(K18768,K18970,K19316,K22346,K18794,K19318,K18971,K18793,K19319,K19320,K19321,K19322,K18972,K19211,K18976,K21277,K18782,K18781,K18780,K19099,K19216)']
+==
+['K18768', 'K18970', 'K19316', 'K22346', 'K18794', 'K19318', 'K18971', 'K18793', 'K19319', 'K19320', 'K19321', 'K19322', 'K18972', 'K19211', 'K18976', 'K21277', 'K18782', 'K18781', 'K18780', 'K19099', 'K19216']
+++++++++++++++++++
+M00625
+K02547 K02546 K02545
+['K02547', 'K02546', 'K02545']
+==
+['K02547']
+['K02546']
+['K02545']
+++++++++++++++++++
+M00627
+K02172 K02171 (K18766,K17836)
+['K02172', 'K02171', '(K18766,K17836)']
+==
+['K02172']
+['K02171']
+['K18766', 'K17836']
+++++++++++++++++++
+M00745
+(K18072 K18073),(K07644 K07665),K18297 K18093
+['(K18072_K18073),(K07644_K07665),K18297', 'K18093']
+==
+['K18297', 'K18072_K18073', 'K07644_K07665']
+['K18093']
+++++++++++++++++++
+M00651
+(K18345 K18344 K07260 K18346),(K18351 K18352 K18354 K18353) (K18347 K15739 K08641)
+['(K18345_K18344_K07260_K18346),(K18351_K18352_K18354_K18353)', '(K18347_K15739_K08641)']
+==
+['K18345_K18344_K07260_K18346', 'K18351_K18352_K18354_K18353']
+['K18347_K15739_K08641']
+++++++++++++++++++
+M00652
+K18350 K18349 K18348 K18856 K18866
+['K18350', 'K18349', 'K18348', 'K18856', 'K18866']
+==
+['K18350']
+['K18349']
+['K18348']
+['K18856']
+['K18866']
+++++++++++++++++++
+M00704
+K18906 K08168
+['K18906', 'K08168']
+==
+['K18906']
+['K08168']
+++++++++++++++++++
+M00725
+K19077 K19078 K03367+K03739+K14188+K03740
+['K19077', 'K19078', 'K03367+K03739+K14188+K03740']
+==
+['K19077']
+['K19078']
+['K03367+K03739+K14188+K03740']
+++++++++++++++++++
+M00726
+K19077 K19078 K14205
+['K19077', 'K19078', 'K14205']
+==
+['K19077']
+['K19078']
+['K14205']
+++++++++++++++++++
+M00730
+K19077 K19078 K19079+K19080
+['K19077', 'K19078', 'K19079+K19080']
+==
+['K19077']
+['K19078']
+['K19079+K19080']
+++++++++++++++++++
+M00744
+K07637 K07660 K08477
+['K07637', 'K07660', 'K08477']
+==
+['K07637']
+['K07660']
+['K08477']
+++++++++++++++++++
+M00718
+K18131 K03585+K18138+K18139
+['K18131', 'K03585+K18138+K18139']
+==
+['K18131']
+['K03585+K18138+K18139']
+++++++++++++++++++
+M00639
+K18294 K18295+K18296-K08721
+['K18294', 'K18295+K18296-K08721']
+==
+['K18294']
+['K18295+K18296-K08721']
+++++++++++++++++++
+M00641
+K18297 K18298+K18299-K18300
+['K18297', 'K18298+K18299-K18300']
+==
+['K18297']
+['K18298+K18299-K18300']
+++++++++++++++++++
+M00642
+K18301 K18302+K18303-K18139
+['K18301', 'K18302+K18303-K18139']
+==
+['K18301']
+['K18302+K18303-K18139']
+++++++++++++++++++
+M00643
+K18129 K18094+K18095+K18139
+['K18129', 'K18094+K18095+K18139']
+==
+['K18129']
+['K18094+K18095+K18139']
+++++++++++++++++++
+M00769
+K18304 K19591 K19595+K19594+K19593
+['K18304', 'K19591', 'K19595+K19594+K19593']
+==
+['K18304']
+['K19591']
+['K19595+K19594+K19593']
+++++++++++++++++++
+M00649
+K18143 K18144 K18145+K18146-K18147
+['K18143', 'K18144', 'K18145+K18146-K18147']
+==
+['K18143']
+['K18144']
+['K18145+K18146-K18147']
+++++++++++++++++++
+M00696
+K18140 K18141+K18142+K12340
+['K18140', 'K18141+K18142+K12340']
+==
+['K18140']
+['K18141+K18142+K12340']
+++++++++++++++++++
+M00697
+K07690 K18898+K18899+K12340
+['K07690', 'K18898+K18899+K12340']
+==
+['K07690']
+['K18898+K18899+K12340']
+++++++++++++++++++
+M00698
+K18900 K18901+K18902+K18903
+['K18900', 'K18901+K18902+K18903']
+==
+['K18900']
+['K18901+K18902+K18903']
+++++++++++++++++++
+M00700
+(K18906,K18907) K18104
+['(K18906,K18907)', 'K18104']
+==
+['K18906', 'K18907']
+['K18104']
+++++++++++++++++++
+M00702
+(K18906,K18907) K08170
+['(K18906,K18907)', 'K08170']
+==
+['K18906', 'K18907']
+['K08170']
+++++++++++++++++++
+M00714
+K18938 K08167
+['K18938', 'K08167']
+==
+['K18938']
+['K08167']
+++++++++++++++++++
+M00705
+K18909 K18908
+['K18909', 'K18908']
+==
+['K18909']
+['K18908']
+++++++++++++++++++
+M00746
+K13632 K18513 K09476
+['K13632', 'K18513', 'K09476']
+==
+['K13632']
+['K18513']
+['K09476']
+++++++++++++++++++
+M00660
+K03222+K03226+K03227+K03228+K03229+K03230+K03224+K03225+K03223+K18374+K18376 K18373 K18375 K18377 K18378 K18379 K18380 K18381
+['K03222+K03226+K03227+K03228+K03229+K03230+K03224+K03225+K03223+K18374+K18376', 'K18373', 'K18375', 'K18377', 'K18378', 'K18379', 'K18380', 'K18381']
+==
+['K03222+K03226+K03227+K03228+K03229+K03230+K03224+K03225+K03223+K18374+K18376']
+['K18373']
+['K18375']
+['K18377']
+['K18378']
+['K18379']
+['K18380']
+['K18381']
+++++++++++++++++++
+M00664
+K14658 K14659 K14666 K14657
+['K14658', 'K14659', 'K14666', 'K14657']
+==
+['K14658']
+['K14659']
+['K14666']
+['K14657']
+++++++++++++++++++
diff --git a/data/MicrobeAnnotator_KEGG/01.KEGG_DB/06.Module_Groups.txt b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/06.Module_Groups.txt
new file mode 100644
index 0000000..9c7c0c4
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/01.KEGG_DB/06.Module_Groups.txt
@@ -0,0 +1,394 @@
+M00015	Arginine and proline metabolism	#8a3222
+M00028	Arginine and proline metabolism	#8a3222
+M00029	Arginine and proline metabolism	#8a3222
+M00047	Arginine and proline metabolism	#8a3222
+M00763	Arginine and proline metabolism	#8a3222
+M00844	Arginine and proline metabolism	#8a3222
+M00845	Arginine and proline metabolism	#8a3222
+M00879	Arginine and proline metabolism	#8a3222
+M00022	Aromatic amino acid metabolism	#8641b6
+M00023	Aromatic amino acid metabolism	#8641b6
+M00024	Aromatic amino acid metabolism	#8641b6
+M00025	Aromatic amino acid metabolism	#8641b6
+M00037	Aromatic amino acid metabolism	#8641b6
+M00038	Aromatic amino acid metabolism	#8641b6
+M00040	Aromatic amino acid metabolism	#8641b6
+M00042	Aromatic amino acid metabolism	#8641b6
+M00043	Aromatic amino acid metabolism	#8641b6
+M00044	Aromatic amino acid metabolism	#8641b6
+M00533	Aromatic amino acid metabolism	#8641b6
+M00545	Aromatic amino acid metabolism	#8641b6
+M00418	Aromatics degradation	#76d25b
+M00419	Aromatics degradation	#76d25b
+M00534	Aromatics degradation	#76d25b
+M00537	Aromatics degradation	#76d25b
+M00538	Aromatics degradation	#76d25b
+M00539	Aromatics degradation	#76d25b
+M00540	Aromatics degradation	#76d25b
+M00541	Aromatics degradation	#76d25b
+M00543	Aromatics degradation	#76d25b
+M00544	Aromatics degradation	#76d25b
+M00547	Aromatics degradation	#76d25b
+M00548	Aromatics degradation	#76d25b
+M00551	Aromatics degradation	#76d25b
+M00568	Aromatics degradation	#76d25b
+M00569	Aromatics degradation	#76d25b
+M00623	Aromatics degradation	#76d25b
+M00624	Aromatics degradation	#76d25b
+M00636	Aromatics degradation	#76d25b
+M00637	Aromatics degradation	#76d25b
+M00638	Aromatics degradation	#76d25b
+M00878	Aromatics degradation	#76d25b
+M00142	ATP synthesis	#cdd346
+M00143	ATP synthesis	#cdd346
+M00144	ATP synthesis	#cdd346
+M00145	ATP synthesis	#cdd346
+M00146	ATP synthesis	#cdd346
+M00147	ATP synthesis	#cdd346
+M00148	ATP synthesis	#cdd346
+M00149	ATP synthesis	#cdd346
+M00150	ATP synthesis	#cdd346
+M00151	ATP synthesis	#cdd346
+M00152	ATP synthesis	#cdd346
+M00153	ATP synthesis	#cdd346
+M00154	ATP synthesis	#cdd346
+M00155	ATP synthesis	#cdd346
+M00156	ATP synthesis	#cdd346
+M00157	ATP synthesis	#cdd346
+M00158	ATP synthesis	#cdd346
+M00159	ATP synthesis	#cdd346
+M00160	ATP synthesis	#cdd346
+M00162	ATP synthesis	#cdd346
+M00416	ATP synthesis	#cdd346
+M00417	ATP synthesis	#cdd346
+M00672	Beta-Lactam biosynthesis	#3b2882
+M00673	Beta-Lactam biosynthesis	#3b2882
+M00674	Beta-Lactam biosynthesis	#3b2882
+M00675	Beta-Lactam biosynthesis	#3b2882
+M00736	Beta-Lactam biosynthesis	#3b2882
+M00039	Biosynthesis of other secondary metabolites	#cbde82
+M00137	Biosynthesis of other secondary metabolites	#cbde82
+M00138	Biosynthesis of other secondary metabolites	#cbde82
+M00370	Biosynthesis of other secondary metabolites	#cbde82
+M00661	Biosynthesis of other secondary metabolites	#cbde82
+M00785	Biosynthesis of other secondary metabolites	#cbde82
+M00786	Biosynthesis of other secondary metabolites	#cbde82
+M00787	Biosynthesis of other secondary metabolites	#cbde82
+M00788	Biosynthesis of other secondary metabolites	#cbde82
+M00789	Biosynthesis of other secondary metabolites	#cbde82
+M00790	Biosynthesis of other secondary metabolites	#cbde82
+M00805	Biosynthesis of other secondary metabolites	#cbde82
+M00808	Biosynthesis of other secondary metabolites	#cbde82
+M00814	Biosynthesis of other secondary metabolites	#cbde82
+M00815	Biosynthesis of other secondary metabolites	#cbde82
+M00819	Biosynthesis of other secondary metabolites	#cbde82
+M00835	Biosynthesis of other secondary metabolites	#cbde82
+M00837	Biosynthesis of other secondary metabolites	#cbde82
+M00838	Biosynthesis of other secondary metabolites	#cbde82
+M00848	Biosynthesis of other secondary metabolites	#cbde82
+M00875	Biosynthesis of other secondary metabolites	#cbde82
+M00876	Biosynthesis of other secondary metabolites	#cbde82
+M00877	Biosynthesis of other secondary metabolites	#cbde82
+M00019	Branched-chain amino acid metabolism	#656cdb
+M00036	Branched-chain amino acid metabolism	#656cdb
+M00432	Branched-chain amino acid metabolism	#656cdb
+M00535	Branched-chain amino acid metabolism	#656cdb
+M00570	Branched-chain amino acid metabolism	#656cdb
+M00165	Carbon fixation	#408937
+M00166	Carbon fixation	#408937
+M00167	Carbon fixation	#408937
+M00168	Carbon fixation	#408937
+M00169	Carbon fixation	#408937
+M00170	Carbon fixation	#408937
+M00171	Carbon fixation	#408937
+M00172	Carbon fixation	#408937
+M00173	Carbon fixation	#408937
+M00374	Carbon fixation	#408937
+M00375	Carbon fixation	#408937
+M00376	Carbon fixation	#408937
+M00377	Carbon fixation	#408937
+M00579	Carbon fixation	#408937
+M00620	Carbon fixation	#408937
+M00001	Central carbohydrate metabolism	#c644a5
+M00002	Central carbohydrate metabolism	#c644a5
+M00003	Central carbohydrate metabolism	#c644a5
+M00004	Central carbohydrate metabolism	#c644a5
+M00005	Central carbohydrate metabolism	#c644a5
+M00006	Central carbohydrate metabolism	#c644a5
+M00007	Central carbohydrate metabolism	#c644a5
+M00008	Central carbohydrate metabolism	#c644a5
+M00009	Central carbohydrate metabolism	#c644a5
+M00010	Central carbohydrate metabolism	#c644a5
+M00011	Central carbohydrate metabolism	#c644a5
+M00307	Central carbohydrate metabolism	#c644a5
+M00308	Central carbohydrate metabolism	#c644a5
+M00309	Central carbohydrate metabolism	#c644a5
+M00580	Central carbohydrate metabolism	#c644a5
+M00633	Central carbohydrate metabolism	#c644a5
+M00112	Cofactor and vitamin metabolism	#5fda98
+M00115	Cofactor and vitamin metabolism	#5fda98
+M00116	Cofactor and vitamin metabolism	#5fda98
+M00117	Cofactor and vitamin metabolism	#5fda98
+M00119	Cofactor and vitamin metabolism	#5fda98
+M00120	Cofactor and vitamin metabolism	#5fda98
+M00121	Cofactor and vitamin metabolism	#5fda98
+M00122	Cofactor and vitamin metabolism	#5fda98
+M00123	Cofactor and vitamin metabolism	#5fda98
+M00124	Cofactor and vitamin metabolism	#5fda98
+M00125	Cofactor and vitamin metabolism	#5fda98
+M00126	Cofactor and vitamin metabolism	#5fda98
+M00127	Cofactor and vitamin metabolism	#5fda98
+M00128	Cofactor and vitamin metabolism	#5fda98
+M00140	Cofactor and vitamin metabolism	#5fda98
+M00141	Cofactor and vitamin metabolism	#5fda98
+M00572	Cofactor and vitamin metabolism	#5fda98
+M00573	Cofactor and vitamin metabolism	#5fda98
+M00577	Cofactor and vitamin metabolism	#5fda98
+M00622	Cofactor and vitamin metabolism	#5fda98
+M00810	Cofactor and vitamin metabolism	#5fda98
+M00811	Cofactor and vitamin metabolism	#5fda98
+M00836	Cofactor and vitamin metabolism	#5fda98
+M00840	Cofactor and vitamin metabolism	#5fda98
+M00841	Cofactor and vitamin metabolism	#5fda98
+M00842	Cofactor and vitamin metabolism	#5fda98
+M00843	Cofactor and vitamin metabolism	#5fda98
+M00846	Cofactor and vitamin metabolism	#5fda98
+M00847	Cofactor and vitamin metabolism	#5fda98
+M00868	Cofactor and vitamin metabolism	#5fda98
+M00880	Cofactor and vitamin metabolism	#5fda98
+M00017	Cysteine and methionine metabolism	#782975
+M00021	Cysteine and methionine metabolism	#782975
+M00034	Cysteine and methionine metabolism	#782975
+M00035	Cysteine and methionine metabolism	#782975
+M00338	Cysteine and methionine metabolism	#782975
+M00368	Cysteine and methionine metabolism	#782975
+M00609	Cysteine and methionine metabolism	#782975
+M00625	Drug resistance	#869534
+M00627	Drug resistance	#869534
+M00639	Drug resistance	#869534
+M00641	Drug resistance	#869534
+M00642	Drug resistance	#869534
+M00643	Drug resistance	#869534
+M00649	Drug resistance	#869534
+M00651	Drug resistance	#869534
+M00652	Drug resistance	#869534
+M00696	Drug resistance	#869534
+M00697	Drug resistance	#869534
+M00698	Drug resistance	#869534
+M00700	Drug resistance	#869534
+M00702	Drug resistance	#869534
+M00704	Drug resistance	#869534
+M00705	Drug resistance	#869534
+M00714	Drug resistance	#869534
+M00718	Drug resistance	#869534
+M00725	Drug resistance	#869534
+M00726	Drug resistance	#869534
+M00730	Drug resistance	#869534
+M00744	Drug resistance	#869534
+M00745	Drug resistance	#869534
+M00746	Drug resistance	#869534
+M00769	Drug resistance	#869534
+M00851	Drug resistance	#869534
+M00824	Enediyne biosynthesis	#d27bde
+M00825	Enediyne biosynthesis	#d27bde
+M00826	Enediyne biosynthesis	#d27bde
+M00827	Enediyne biosynthesis	#d27bde
+M00828	Enediyne biosynthesis	#d27bde
+M00829	Enediyne biosynthesis	#d27bde
+M00830	Enediyne biosynthesis	#d27bde
+M00831	Enediyne biosynthesis	#d27bde
+M00832	Enediyne biosynthesis	#d27bde
+M00833	Enediyne biosynthesis	#d27bde
+M00834	Enediyne biosynthesis	#d27bde
+M00082	Fatty acid metabolism	#d9a344
+M00083	Fatty acid metabolism	#d9a344
+M00085	Fatty acid metabolism	#d9a344
+M00086	Fatty acid metabolism	#d9a344
+M00087	Fatty acid metabolism	#d9a344
+M00415	Fatty acid metabolism	#d9a344
+M00861	Fatty acid metabolism	#d9a344
+M00873	Fatty acid metabolism	#d9a344
+M00874	Fatty acid metabolism	#d9a344
+M00055	Glycan biosynthesis	#588cd6
+M00056	Glycan biosynthesis	#588cd6
+M00065	Glycan biosynthesis	#588cd6
+M00068	Glycan biosynthesis	#588cd6
+M00069	Glycan biosynthesis	#588cd6
+M00070	Glycan biosynthesis	#588cd6
+M00071	Glycan biosynthesis	#588cd6
+M00072	Glycan biosynthesis	#588cd6
+M00073	Glycan biosynthesis	#588cd6
+M00074	Glycan biosynthesis	#588cd6
+M00075	Glycan biosynthesis	#588cd6
+M00872	Glycan biosynthesis	#588cd6
+M00057	Glycosaminoglycan metabolism	#d66432
+M00058	Glycosaminoglycan metabolism	#d66432
+M00059	Glycosaminoglycan metabolism	#d66432
+M00076	Glycosaminoglycan metabolism	#d66432
+M00077	Glycosaminoglycan metabolism	#d66432
+M00078	Glycosaminoglycan metabolism	#d66432
+M00079	Glycosaminoglycan metabolism	#d66432
+M00026	Histidine metabolism	#66d7bf
+M00045	Histidine metabolism	#66d7bf
+M00066	Lipid metabolism	#d53e55
+M00067	Lipid metabolism	#d53e55
+M00088	Lipid metabolism	#d53e55
+M00089	Lipid metabolism	#d53e55
+M00090	Lipid metabolism	#d53e55
+M00091	Lipid metabolism	#d53e55
+M00092	Lipid metabolism	#d53e55
+M00093	Lipid metabolism	#d53e55
+M00094	Lipid metabolism	#d53e55
+M00098	Lipid metabolism	#d53e55
+M00099	Lipid metabolism	#d53e55
+M00100	Lipid metabolism	#d53e55
+M00113	Lipid metabolism	#d53e55
+M00060	Lipopolysaccharide metabolism	#83d2de
+M00063	Lipopolysaccharide metabolism	#83d2de
+M00064	Lipopolysaccharide metabolism	#83d2de
+M00866	Lipopolysaccharide metabolism	#83d2de
+M00867	Lipopolysaccharide metabolism	#83d2de
+M00016	Lysine metabolism	#d84e8b
+M00030	Lysine metabolism	#d84e8b
+M00031	Lysine metabolism	#d84e8b
+M00032	Lysine metabolism	#d84e8b
+M00433	Lysine metabolism	#d84e8b
+M00525	Lysine metabolism	#d84e8b
+M00526	Lysine metabolism	#d84e8b
+M00527	Lysine metabolism	#d84e8b
+M00773	Macrolide biosynthesis	#2e4b26
+M00774	Macrolide biosynthesis	#2e4b26
+M00775	Macrolide biosynthesis	#2e4b26
+M00776	Macrolide biosynthesis	#2e4b26
+M00777	Macrolide biosynthesis	#2e4b26
+M00611	Metabolic capacity	#9378c3
+M00612	Metabolic capacity	#9378c3
+M00613	Metabolic capacity	#9378c3
+M00614	Metabolic capacity	#9378c3
+M00615	Metabolic capacity	#9378c3
+M00616	Metabolic capacity	#9378c3
+M00617	Metabolic capacity	#9378c3
+M00618	Metabolic capacity	#9378c3
+M00174	Methane metabolism	#9e7336
+M00344	Methane metabolism	#9e7336
+M00345	Methane metabolism	#9e7336
+M00346	Methane metabolism	#9e7336
+M00356	Methane metabolism	#9e7336
+M00357	Methane metabolism	#9e7336
+M00358	Methane metabolism	#9e7336
+M00378	Methane metabolism	#9e7336
+M00422	Methane metabolism	#9e7336
+M00563	Methane metabolism	#9e7336
+M00567	Methane metabolism	#9e7336
+M00608	Methane metabolism	#9e7336
+M00175	Nitrogen metabolism	#2c2351
+M00528	Nitrogen metabolism	#2c2351
+M00529	Nitrogen metabolism	#2c2351
+M00530	Nitrogen metabolism	#2c2351
+M00531	Nitrogen metabolism	#2c2351
+M00804	Nitrogen metabolism	#2c2351
+M00027	Other amino acid metabolism	#c5d7a9
+M00118	Other amino acid metabolism	#c5d7a9
+M00369	Other amino acid metabolism	#c5d7a9
+M00012	Other carbohydrate metabolism	#872b4e
+M00013	Other carbohydrate metabolism	#872b4e
+M00014	Other carbohydrate metabolism	#872b4e
+M00061	Other carbohydrate metabolism	#872b4e
+M00081	Other carbohydrate metabolism	#872b4e
+M00114	Other carbohydrate metabolism	#872b4e
+M00129	Other carbohydrate metabolism	#872b4e
+M00130	Other carbohydrate metabolism	#872b4e
+M00131	Other carbohydrate metabolism	#872b4e
+M00132	Other carbohydrate metabolism	#872b4e
+M00373	Other carbohydrate metabolism	#872b4e
+M00532	Other carbohydrate metabolism	#872b4e
+M00549	Other carbohydrate metabolism	#872b4e
+M00550	Other carbohydrate metabolism	#872b4e
+M00552	Other carbohydrate metabolism	#872b4e
+M00554	Other carbohydrate metabolism	#872b4e
+M00565	Other carbohydrate metabolism	#872b4e
+M00630	Other carbohydrate metabolism	#872b4e
+M00631	Other carbohydrate metabolism	#872b4e
+M00632	Other carbohydrate metabolism	#872b4e
+M00740	Other carbohydrate metabolism	#872b4e
+M00741	Other carbohydrate metabolism	#872b4e
+M00761	Other carbohydrate metabolism	#872b4e
+M00854	Other carbohydrate metabolism	#872b4e
+M00855	Other carbohydrate metabolism	#872b4e
+M00097	Other terpenoid biosynthesis	#6e9368
+M00371	Other terpenoid biosynthesis	#6e9368
+M00372	Other terpenoid biosynthesis	#6e9368
+M00363	Pathogenicity	#66406d
+M00542	Pathogenicity	#66406d
+M00564	Pathogenicity	#66406d
+M00574	Pathogenicity	#66406d
+M00575	Pathogenicity	#66406d
+M00576	Pathogenicity	#66406d
+M00850	Pathogenicity	#66406d
+M00852	Pathogenicity	#66406d
+M00853	Pathogenicity	#66406d
+M00856	Pathogenicity	#66406d
+M00857	Pathogenicity	#66406d
+M00859	Pathogenicity	#66406d
+M00860	Pathogenicity	#66406d
+M00161	Photosynthesis	#cfa68a
+M00163	Photosynthesis	#cfa68a
+M00597	Photosynthesis	#cfa68a
+M00598	Photosynthesis	#cfa68a
+M00660	Plant pathogenicity	#461d27
+M00133	Polyamine biosynthesis	#a5b3da
+M00134	Polyamine biosynthesis	#a5b3da
+M00135	Polyamine biosynthesis	#a5b3da
+M00136	Polyamine biosynthesis	#a5b3da
+M00793	Polyketide sugar unit biosynthesis	#5c4f24
+M00794	Polyketide sugar unit biosynthesis	#5c4f24
+M00795	Polyketide sugar unit biosynthesis	#5c4f24
+M00796	Polyketide sugar unit biosynthesis	#5c4f24
+M00797	Polyketide sugar unit biosynthesis	#5c4f24
+M00798	Polyketide sugar unit biosynthesis	#5c4f24
+M00799	Polyketide sugar unit biosynthesis	#5c4f24
+M00800	Polyketide sugar unit biosynthesis	#5c4f24
+M00801	Polyketide sugar unit biosynthesis	#5c4f24
+M00802	Polyketide sugar unit biosynthesis	#5c4f24
+M00803	Polyketide sugar unit biosynthesis	#5c4f24
+M00048	Purine metabolism	#e0a7d2
+M00049	Purine metabolism	#e0a7d2
+M00050	Purine metabolism	#e0a7d2
+M00546	Purine metabolism	#e0a7d2
+M00046	Pyrimidine metabolism	#25585e
+M00051	Pyrimidine metabolism	#25585e
+M00052	Pyrimidine metabolism	#25585e
+M00053	Pyrimidine metabolism	#25585e
+M00018	Serine and threonine metabolism	#de7d78
+M00020	Serine and threonine metabolism	#de7d78
+M00033	Serine and threonine metabolism	#de7d78
+M00555	Serine and threonine metabolism	#de7d78
+M00101	Sterol biosynthesis	#4e96a2
+M00102	Sterol biosynthesis	#4e96a2
+M00103	Sterol biosynthesis	#4e96a2
+M00104	Sterol biosynthesis	#4e96a2
+M00106	Sterol biosynthesis	#4e96a2
+M00107	Sterol biosynthesis	#4e96a2
+M00108	Sterol biosynthesis	#4e96a2
+M00109	Sterol biosynthesis	#4e96a2
+M00110	Sterol biosynthesis	#4e96a2
+M00862	Sterol biosynthesis	#4e96a2
+M00176	Sulfur metabolism	#4e96a2
+M00595	Sulfur metabolism	#4e96a2
+M00596	Sulfur metabolism	#4e96a2
+M00664	Symbiosis	#88574e
+M00095	Terpenoid backbone biosynthesis	#4e6089
+M00096	Terpenoid backbone biosynthesis	#4e6089
+M00364	Terpenoid backbone biosynthesis	#4e6089
+M00365	Terpenoid backbone biosynthesis	#4e6089
+M00366	Terpenoid backbone biosynthesis	#4e6089
+M00367	Terpenoid backbone biosynthesis	#4e6089
+M00849	Terpenoid backbone biosynthesis	#4e6089
+M00778	Type II polyketide biosynthesis	#af7194
+M00779	Type II polyketide biosynthesis	#af7194
+M00780	Type II polyketide biosynthesis	#af7194
+M00781	Type II polyketide biosynthesis	#af7194
+M00782	Type II polyketide biosynthesis	#af7194
+M00783	Type II polyketide biosynthesis	#af7194
+M00784	Type II polyketide biosynthesis	#af7194
+M00823	Type II polyketide biosynthesis	#af7194
diff --git a/data/MicrobeAnnotator_KEGG/KEGG_Bifurcating_Module_Information.pkl b/data/MicrobeAnnotator_KEGG/KEGG_Bifurcating_Module_Information.pkl
new file mode 100644
index 0000000..7535b86
Binary files /dev/null and b/data/MicrobeAnnotator_KEGG/KEGG_Bifurcating_Module_Information.pkl differ
diff --git a/data/MicrobeAnnotator_KEGG/KEGG_Module-KOs.pkl b/data/MicrobeAnnotator_KEGG/KEGG_Module-KOs.pkl
new file mode 100644
index 0000000..cba82d5
Binary files /dev/null and b/data/MicrobeAnnotator_KEGG/KEGG_Module-KOs.pkl differ
diff --git a/data/MicrobeAnnotator_KEGG/KEGG_Module_Information.txt b/data/MicrobeAnnotator_KEGG/KEGG_Module_Information.txt
new file mode 100644
index 0000000..db9ec87
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/KEGG_Module_Information.txt
@@ -0,0 +1,394 @@
+M00015	Proline biosynthesis, glutamate => proline	Arginine and proline metabolism	#8a3222
+M00028	Ornithine biosynthesis, glutamate => ornithine	Arginine and proline metabolism	#8a3222
+M00029	Urea cycle	Arginine and proline metabolism	#8a3222
+M00047	Creatine pathway	Arginine and proline metabolism	#8a3222
+M00763	Ornithine biosynthesis, mediated by LysW, glutamate => ornithine	Arginine and proline metabolism	#8a3222
+M00844	Arginine biosynthesis, ornithine => arginine	Arginine and proline metabolism	#8a3222
+M00845	Arginine biosynthesis, glutamate => acetylcitrulline => arginine	Arginine and proline metabolism	#8a3222
+M00879	Arginine succinyltransferase pathway, arginine => glutamate	Arginine and proline metabolism	#8a3222
+M00022	Shikimate pathway, phosphoenolpyruvate + erythrose-4P => chorismate	Aromatic amino acid metabolism	#8641b6
+M00023	Tryptophan biosynthesis, chorismate => tryptophan	Aromatic amino acid metabolism	#8641b6
+M00024	Phenylalanine biosynthesis, chorismate => phenylalanine	Aromatic amino acid metabolism	#8641b6
+M00025	Tyrosine biosynthesis, chorismate => tyrosine	Aromatic amino acid metabolism	#8641b6
+M00037	Melatonin biosynthesis, tryptophan => serotonin => melatonin	Aromatic amino acid metabolism	#8641b6
+M00038	Tryptophan metabolism, tryptophan => kynurenine => 2-aminomuconate	Aromatic amino acid metabolism	#8641b6
+M00040	Tyrosine biosynthesis, prephanate => pretyrosine => tyrosine	Aromatic amino acid metabolism	#8641b6
+M00042	Catecholamine biosynthesis, tyrosine => dopamine => noradrenaline => adrenaline	Aromatic amino acid metabolism	#8641b6
+M00043	Thyroid hormone biosynthesis, tyrosine => triiodothyronine--thyroxine	Aromatic amino acid metabolism	#8641b6
+M00044	Tyrosine degradation, tyrosine => homogentisate	Aromatic amino acid metabolism	#8641b6
+M00533	Homoprotocatechuate degradation, homoprotocatechuate => 2-oxohept-3-enedioate	Aromatic amino acid metabolism	#8641b6
+M00545	Trans-cinnamate degradation, trans-cinnamate => acetyl-CoA	Aromatic amino acid metabolism	#8641b6
+M00418	Toluene degradation, anaerobic, toluene => benzoyl-CoA	Aromatics degradation	#76d25b
+M00419	Cymene degradation, p-cymene => p-cumate	Aromatics degradation	#76d25b
+M00534	Naphthalene degradation, naphthalene => salicylate	Aromatics degradation	#76d25b
+M00537	Xylene degradation, xylene => methylbenzoate	Aromatics degradation	#76d25b
+M00538	Toluene degradation, toluene => benzoate	Aromatics degradation	#76d25b
+M00539	Cumate degradation, p-cumate => 2-oxopent-4-enoate + 2-methylpropanoate	Aromatics degradation	#76d25b
+M00540	Benzoate degradation, cyclohexanecarboxylic acid =>pimeloyl-CoA	Aromatics degradation	#76d25b
+M00541	Benzoyl-CoA degradation, benzoyl-CoA => 3-hydroxypimeloyl-CoA	Aromatics degradation	#76d25b
+M00543	Biphenyl degradation, biphenyl => 2-oxopent-4-enoate + benzoate	Aromatics degradation	#76d25b
+M00544	Carbazole degradation, carbazole => 2-oxopent-4-enoate + anthranilate	Aromatics degradation	#76d25b
+M00547	Benzene--toluene degradation, benzene => catechol -- toluene => 3-methylcatechol	Aromatics degradation	#76d25b
+M00548	Benzene degradation, benzene => catechol	Aromatics degradation	#76d25b
+M00551	Benzoate degradation, benzoate => catechol -- methylbenzoate => methylcatechol	Aromatics degradation	#76d25b
+M00568	Catechol ortho-cleavage, catechol => 3-oxoadipate	Aromatics degradation	#76d25b
+M00569	Catechol meta-cleavage, catechol => acetyl-CoA -- 4-methylcatechol => propanoyl-CoA	Aromatics degradation	#76d25b
+M00623	Phthalate degradation 1, phthalate => protocatechuate	Aromatics degradation	#76d25b
+M00624	Terephthalate degradation, terephthalate => 3,4-dihydroxybenzoate	Aromatics degradation	#76d25b
+M00636	Phthalate degradation 2, phthalate => protocatechuate	Aromatics degradation	#76d25b
+M00637	Anthranilate degradation, anthranilate => catechol	Aromatics degradation	#76d25b
+M00638	Salicylate degradation, salicylate => gentisate	Aromatics degradation	#76d25b
+M00878	Phenylacetate degradation, phenylaxetate => acetyl-CoA--succinyl-CoA	Aromatics degradation	#76d25b
+M00142	NADH:ubiquinone oxidoreductase, mitochondria	ATP synthesis	#cdd346
+M00143	NADH dehydrogenase (ubiquinone) Fe-S protein--flavoprotein complex, mitochondria	ATP synthesis	#cdd346
+M00144	NADH:quinone oxidoreductase, prokaryotes	ATP synthesis	#cdd346
+M00145	NAD(P)H:quinone oxidoreductase, chloroplasts and cyanobacteria	ATP synthesis	#cdd346
+M00146	NADH dehydrogenase (ubiquinone) 1 alpha subcomplex	ATP synthesis	#cdd346
+M00147	NADH dehydrogenase (ubiquinone) 1 beta subcomplex	ATP synthesis	#cdd346
+M00148	Succinate dehydrogenase (ubiquinone)	ATP synthesis	#cdd346
+M00149	Succinate dehydrogenase, prokaryotes	ATP synthesis	#cdd346
+M00150	Fumarate reductase, prokaryotes	ATP synthesis	#cdd346
+M00151	Cytochrome bc1 complex respiratory unit	ATP synthesis	#cdd346
+M00152	Cytochrome bc1 complex	ATP synthesis	#cdd346
+M00153	Cytochrome bd ubiquinol oxidase	ATP synthesis	#cdd346
+M00154	Cytochrome c oxidase	ATP synthesis	#cdd346
+M00155	Cytochrome c oxidase, prokaryotes	ATP synthesis	#cdd346
+M00156	Cytochrome c oxidase, cbb3-type	ATP synthesis	#cdd346
+M00157	F-type ATPase, prokaryotes and chloroplasts	ATP synthesis	#cdd346
+M00158	F-type ATPase, eukaryotes	ATP synthesis	#cdd346
+M00159	V-type ATPase, prokaryotes	ATP synthesis	#cdd346
+M00160	V-type ATPase, eukaryotes	ATP synthesis	#cdd346
+M00162	Cytochrome b6f complex	ATP synthesis	#cdd346
+M00416	Cytochrome aa3-600 menaquinol oxidase	ATP synthesis	#cdd346
+M00417	Cytochrome o ubiquinol oxidase	ATP synthesis	#cdd346
+M00672	Penicillin biosynthesis, aminoadipate + cycteine + valine => penicillin	Beta-Lactam biosynthesis	#3b2882
+M00673	Cephamycin C biosynthesis, aminoadipate + cycteine + valine => cephamycin C	Beta-Lactam biosynthesis	#3b2882
+M00674	Clavaminate biosynthesis, arginine + glyceraldehyde-3P => clavaminate	Beta-Lactam biosynthesis	#3b2882
+M00675	Carbapenem-3-carboxylate biosynthesis, pyrroline-5-carboxylate + malonyl-CoA => carbapenem-3-carboxylate	Beta-Lactam biosynthesis	#3b2882
+M00736	Nocardicin A biosynthesis, L-pHPG + arginine + serine => nocardicin A	Beta-Lactam biosynthesis	#3b2882
+M00039	Monolignol biosynthesis, phenylalanine--tyrosine => monolignol	Biosynthesis of other secondary metabolites	#cbde82
+M00137	Flavanone biosynthesis, phenylalanine => naringenin	Biosynthesis of other secondary metabolites	#cbde82
+M00138	Flavonoid biosynthesis, naringenin => pelargonidin	Biosynthesis of other secondary metabolites	#cbde82
+M00370	Glucosinolate biosynthesis, tryptophan => glucobrassicin	Biosynthesis of other secondary metabolites	#cbde82
+M00661	Paspaline biosynthesis, geranylgeranyl-PP + indoleglycerol phosphate => paspaline	Biosynthesis of other secondary metabolites	#cbde82
+M00785	Cycloserine biosynthesis, arginine--serine => cycloserine	Biosynthesis of other secondary metabolites	#cbde82
+M00786	Fumitremorgin alkaloid biosynthesis, tryptophan + proline => fumitremorgin C--A	Biosynthesis of other secondary metabolites	#cbde82
+M00787	Bacilysin biosynthesis, prephenate => bacilysin	Biosynthesis of other secondary metabolites	#cbde82
+M00788	Terpentecin biosynthesis, GGAP => terpentecin	Biosynthesis of other secondary metabolites	#cbde82
+M00789	Rebeccamycin biosynthesis, tryptophan => rebeccamycin	Biosynthesis of other secondary metabolites	#cbde82
+M00790	Pyrrolnitrin biosynthesis, tryptophan => pyrrolnitrin	Biosynthesis of other secondary metabolites	#cbde82
+M00805	Staurosporine biosynthesis, tryptophan => staurosporine	Biosynthesis of other secondary metabolites	#cbde82
+M00808	Violacein biosynthesis, tryptophan => violacein	Biosynthesis of other secondary metabolites	#cbde82
+M00814	Acarbose biosynthesis, sedoheptulopyranose-7P => acarbose	Biosynthesis of other secondary metabolites	#cbde82
+M00815	Validamycin A biosynthesis, sedoheptulopyranose-7P => validamycin A	Biosynthesis of other secondary metabolites	#cbde82
+M00819	Pentalenolactone biosynthesis, farnesyl-PP => pentalenolactone	Biosynthesis of other secondary metabolites	#cbde82
+M00835	Pyocyanine biosynthesis, chorismate => pyocyanine	Biosynthesis of other secondary metabolites	#cbde82
+M00837	Prodigiosin biosynthesis, L-proline => prodigiosin	Biosynthesis of other secondary metabolites	#cbde82
+M00838	Undecylprodigiosin biosynthesis, L-proline => undecylprodigiosin	Biosynthesis of other secondary metabolites	#cbde82
+M00848	Aurachin biosynthesis, anthranilate => aurachin A	Biosynthesis of other secondary metabolites	#cbde82
+M00875	Staphyloferrin B biosynthesis, L-serine => staphyloferrin B	Biosynthesis of other secondary metabolites	#cbde82
+M00876	Staphyloferrin A biosynthesis, L-ornithine => staphyloferrin A	Biosynthesis of other secondary metabolites	#cbde82
+M00877	Kanosamine biosynthesis glucose 6-phosphate => kanosamine	Biosynthesis of other secondary metabolites	#cbde82
+M00019	Valine--isoleucine biosynthesis, pyruvate => valine -- 2-oxobutanoate => isoleucine	Branched-chain amino acid metabolism	#656cdb
+M00036	Leucine degradation, leucine => acetoacetate + acetyl-CoA	Branched-chain amino acid metabolism	#656cdb
+M00432	Leucine biosynthesis, 2-oxoisovalerate => 2-oxoisocaproate	Branched-chain amino acid metabolism	#656cdb
+M00535	Isoleucine biosynthesis, pyruvate => 2-oxobutanoate	Branched-chain amino acid metabolism	#656cdb
+M00570	Isoleucine biosynthesis, threonine => 2-oxobutanoate => isoleucine	Branched-chain amino acid metabolism	#656cdb
+M00165	Reductive pentose phosphate cycle (Calvin cycle)	Carbon fixation	#408937
+M00166	Reductive pentose phosphate cycle, ribulose-5P => glyceraldehyde-3P	Carbon fixation	#408937
+M00167	Reductive pentose phosphate cycle, glyceraldehyde-3P => ribulose-5P	Carbon fixation	#408937
+M00168	CAM (Crassulacean acid metabolism), dark	Carbon fixation	#408937
+M00169	CAM (Crassulacean acid metabolism), light	Carbon fixation	#408937
+M00170	C4-dicarboxylic acid cycle, phosphoenolpyruvate carboxykinase type	Carbon fixation	#408937
+M00171	C4-dicarboxylic acid cycle, NAD - malic enzyme type	Carbon fixation	#408937
+M00172	C4-dicarboxylic acid cycle, NADP - malic enzyme type	Carbon fixation	#408937
+M00173	Reductive citrate cycle (Arnon-Buchanan cycle)	Carbon fixation	#408937
+M00374	Dicarboxylate-hydroxybutyrate cycle	Carbon fixation	#408937
+M00375	Hydroxypropionate-hydroxybutylate cycle	Carbon fixation	#408937
+M00376	3-Hydroxypropionate bi-cycle	Carbon fixation	#408937
+M00377	Reductive acetyl-CoA pathway (Wood-Ljungdahl pathway)	Carbon fixation	#408937
+M00579	Phosphate acetyltransferase-acetate kinase pathway, acetyl-CoA => acetate	Carbon fixation	#408937
+M00620	Incomplete reductive citrate cycle, acetyl-CoA => oxoglutarate	Carbon fixation	#408937
+M00001	Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate	Central carbohydrate metabolism	#c644a5
+M00002	Glycolysis, core module involving three-carbon compounds	Central carbohydrate metabolism	#c644a5
+M00003	Gluconeogenesis, oxaloacetate => fructose-6P	Central carbohydrate metabolism	#c644a5
+M00004	Pentose phosphate pathway (Pentose phosphate cycle)	Central carbohydrate metabolism	#c644a5
+M00005	PRPP biosynthesis, ribose 5P => PRPP	Central carbohydrate metabolism	#c644a5
+M00006	Pentose phosphate pathway, oxidative phase, glucose 6P => ribulose 5P	Central carbohydrate metabolism	#c644a5
+M00007	Pentose phosphate pathway, non-oxidative phase, fructose 6P => ribose 5P	Central carbohydrate metabolism	#c644a5
+M00008	Entner-Doudoroff pathway, glucose-6P => glyceraldehyde-3P + pyruvate	Central carbohydrate metabolism	#c644a5
+M00009	Citrate cycle (TCA cycle, Krebs cycle)	Central carbohydrate metabolism	#c644a5
+M00010	Citrate cycle, first carbon oxidation, oxaloacetate => 2-oxoglutarate	Central carbohydrate metabolism	#c644a5
+M00011	Citrate cycle, second carbon oxidation, 2-oxoglutarate => oxaloacetate	Central carbohydrate metabolism	#c644a5
+M00307	Pyruvate oxidation, pyruvate => acetyl-CoA	Central carbohydrate metabolism	#c644a5
+M00308	Semi-phosphorylative Entner-Doudoroff pathway, gluconate => glycerate-3P	Central carbohydrate metabolism	#c644a5
+M00309	Non-phosphorylative Entner-Doudoroff pathway, gluconate--galactonate => glycerate	Central carbohydrate metabolism	#c644a5
+M00580	Pentose phosphate pathway, archaea, fructose 6P => ribose 5P	Central carbohydrate metabolism	#c644a5
+M00633	Semi-phosphorylative Entner-Doudoroff pathway, gluconate--galactonate => glycerate-3P	Central carbohydrate metabolism	#c644a5
+M00112	Tocopherol--tocotorienol biosynthesis	Cofactor and vitamin metabolism	#5fda98
+M00115	NAD biosynthesis, aspartate => NAD	Cofactor and vitamin metabolism	#5fda98
+M00116	Menaquinone biosynthesis, chorismate => menaquinol	Cofactor and vitamin metabolism	#5fda98
+M00117	Ubiquinone biosynthesis, prokaryotes, chorismate => ubiquinone	Cofactor and vitamin metabolism	#5fda98
+M00119	Pantothenate biosynthesis, valine--L-aspartate => pantothenate	Cofactor and vitamin metabolism	#5fda98
+M00120	Coenzyme A biosynthesis, pantothenate => CoA	Cofactor and vitamin metabolism	#5fda98
+M00121	Heme biosynthesis, plants and bacteria, glutamate => heme	Cofactor and vitamin metabolism	#5fda98
+M00122	Cobalamin biosynthesis, cobinamide => cobalamin	Cofactor and vitamin metabolism	#5fda98
+M00123	Biotin biosynthesis, pimeloyl-ACP--CoA => biotin	Cofactor and vitamin metabolism	#5fda98
+M00124	Pyridoxal biosynthesis, erythrose-4P => pyridoxal-5P	Cofactor and vitamin metabolism	#5fda98
+M00125	Riboflavin biosynthesis, GTP => riboflavin--FMN--FAD	Cofactor and vitamin metabolism	#5fda98
+M00126	Tetrahydrofolate biosynthesis, GTP => THF	Cofactor and vitamin metabolism	#5fda98
+M00127	Thiamine biosynthesis, AIR => thiamine-P--thiamine-2P	Cofactor and vitamin metabolism	#5fda98
+M00128	Ubiquinone biosynthesis, eukaryotes, 4-hydroxybenzoate => ubiquinone	Cofactor and vitamin metabolism	#5fda98
+M00140	C1-unit interconversion, prokaryotes	Cofactor and vitamin metabolism	#5fda98
+M00141	C1-unit interconversion, eukaryotes	Cofactor and vitamin metabolism	#5fda98
+M00572	Pimeloyl-ACP biosynthesis, BioC-BioH pathway, malonyl-ACP => pimeloyl-ACP	Cofactor and vitamin metabolism	#5fda98
+M00573	Biotin biosynthesis, BioI pathway, long-chain-acyl-ACP => pimeloyl-ACP => biotin	Cofactor and vitamin metabolism	#5fda98
+M00577	Biotin biosynthesis, BioW pathway, pimelate => pimeloyl-CoA => biotin	Cofactor and vitamin metabolism	#5fda98
+M00622	Nicotinate degradation, nicotinate => fumarate	Cofactor and vitamin metabolism	#5fda98
+M00810	Nicotine degradation, pyridine pathway, nicotine => 2,6-dihydroxypyridine--succinate semialdehyde	Cofactor and vitamin metabolism	#5fda98
+M00811	Nicotine degradation, pyrrolidine pathway, nicotine => succinate semialdehyde	Cofactor and vitamin metabolism	#5fda98
+M00836	Coenzyme F430 biosynthesis, sirohydrochlorin => coenzyme F430	Cofactor and vitamin metabolism	#5fda98
+M00840	Tetrahydrofolate biosynthesis, mediated by ribA and trpF, GTP => THF	Cofactor and vitamin metabolism	#5fda98
+M00841	Tetrahydrofolate biosynthesis, mediated by PTPS, GTP => THF	Cofactor and vitamin metabolism	#5fda98
+M00842	Tetrahydrobiopterin biosynthesis, GTP => BH4	Cofactor and vitamin metabolism	#5fda98
+M00843	L-threo-Tetrahydrobiopterin biosynthesis, GTP => L-threo-BH4	Cofactor and vitamin metabolism	#5fda98
+M00846	Siroheme biosynthesis, glutamate => siroheme	Cofactor and vitamin metabolism	#5fda98
+M00847	Heme biosynthesis, archaea, siroheme => heme	Cofactor and vitamin metabolism	#5fda98
+M00868	Heme biosynthesis, animals and fungi, glycine => heme	Cofactor and vitamin metabolism	#5fda98
+M00880	Molybdenum cofactor biosynthesis, GTP => molybdenum cofactor	Cofactor and vitamin metabolism	#5fda98
+M00017	Methionine biosynthesis, apartate => homoserine => methionine	Cysteine and methionine metabolism	#782975
+M00021	Cysteine biosynthesis, serine => cysteine	Cysteine and methionine metabolism	#782975
+M00034	Methionine salvage pathway	Cysteine and methionine metabolism	#782975
+M00035	Methionine degradation	Cysteine and methionine metabolism	#782975
+M00338	Cysteine biosynthesis, homocysteine + serine => cysteine	Cysteine and methionine metabolism	#782975
+M00368	Ethylene biosynthesis, methionine => ethylene	Cysteine and methionine metabolism	#782975
+M00609	Cysteine biosynthesis, methionine => cysteine	Cysteine and methionine metabolism	#782975
+M00625	Methicillin resistance	Drug resistance	#869534
+M00627	beta-Lactam resistance, Bla system	Drug resistance	#869534
+M00639	Multidrug resistance, efflux pump MexCD-OprJ	Drug resistance	#869534
+M00641	Multidrug resistance, efflux pump MexEF-OprN	Drug resistance	#869534
+M00642	Multidrug resistance, efflux pump MexJK-OprM	Drug resistance	#869534
+M00643	Multidrug resistance, efflux pump MexXY-OprM	Drug resistance	#869534
+M00649	Multidrug resistance, efflux pump AdeABC	Drug resistance	#869534
+M00651	Vancomycin resistance, D-Ala-D-Lac type	Drug resistance	#869534
+M00652	Vancomycin resistance, D-Ala-D-Ser type	Drug resistance	#869534
+M00696	Multidrug resistance, efflux pump AcrEF-TolC	Drug resistance	#869534
+M00697	Multidrug resistance, efflux pump MdtEF-TolC	Drug resistance	#869534
+M00698	Multidrug resistance, efflux pump BpeEF-OprC	Drug resistance	#869534
+M00700	Multidrug resistance, efflux pump AbcA	Drug resistance	#869534
+M00702	Multidrug resistance, efflux pump NorB	Drug resistance	#869534
+M00704	Tetracycline resistance, efflux pump Tet38	Drug resistance	#869534
+M00705	Multidrug resistance, efflux pump MepA	Drug resistance	#869534
+M00714	Multidrug resistance, efflux pump QacA	Drug resistance	#869534
+M00718	Multidrug resistance, efflux pump MexAB-OprM	Drug resistance	#869534
+M00725	Cationic antimicrobial peptide (CAMP) resistance, dltABCD operon	Drug resistance	#869534
+M00726	Cationic antimicrobial peptide (CAMP) resistance, lysyl-phosphatidylglycerol (L-PG) synthase MprF	Drug resistance	#869534
+M00730	Cationic antimicrobial peptide (CAMP) resistance, VraFG transporter	Drug resistance	#869534
+M00744	Cationic antimicrobial peptide (CAMP) resistance, protease PgtE	Drug resistance	#869534
+M00745	Imipenem resistance, repression of porin OprD	Drug resistance	#869534
+M00746	Multidrug resistance, repression of porin OmpF	Drug resistance	#869534
+M00769	Multidrug resistance, efflux pump MexPQ-OpmE	Drug resistance	#869534
+M00851	Carbapenem resistance	Drug resistance	#869534
+M00824	9-membered enediyne core biosynthesis, malonyl-CoA => 3-hydroxyhexadeca-4,6,8,10,12,14-hexaenoyl-ACP => 9-membered enediyne core	Enediyne biosynthesis	#d27bde
+M00825	10-membered enediyne core biosynthesis, malonyl-CoA => 3-hydroxyhexadeca-4,6,8,10,12,14-hexaenoyl-ACP => 10-membered enediyne core	Enediyne biosynthesis	#d27bde
+M00826	C-1027 benzoxazolinate moiety biosynthesis, chorismate => benzoxazolinyl-CoA	Enediyne biosynthesis	#d27bde
+M00827	C-1027 beta-amino acid moiety biosynthesis, tyrosine => 3-chloro-4,5-dihydroxy-beta-phenylalanyl-PCP	Enediyne biosynthesis	#d27bde
+M00828	Maduropeptin beta-hydroxy acid moiety biosynthesis, tyrosine => 3-(4-hydroxyphenyl)-3-oxopropanoyl-PCP	Enediyne biosynthesis	#d27bde
+M00829	3,6-Dimethylsalicylyl-CoA biosynthesis, malonyl-CoA => 6-methylsalicylate => 3,6-dimethylsalicylyl-CoA	Enediyne biosynthesis	#d27bde
+M00830	Neocarzinostatin naphthoate moiety biosynthesis, malonyl-CoA => 2-hydroxy-5-methyl-1-naphthoate => 2-hydroxy-7-methoxy-5-methyl-1-naphthoyl-CoA	Enediyne biosynthesis	#d27bde
+M00831	Kedarcidin 2-hydroxynaphthoate moiety biosynthesis, malonyl-CoA => 3,6,8-trihydroxy-2-naphthoate => 3-hydroxy-7,8-dimethoxy-6-isopropoxy-2-naphthoyl-CoA	Enediyne biosynthesis	#d27bde
+M00832	Kedarcidin 2-aza-3-chloro-beta-tyrosine moiety biosynthesis, azatyrosine => 2-aza-3-chloro-beta-tyrosyl-PCP	Enediyne biosynthesis	#d27bde
+M00833	Calicheamicin biosynthesis, calicheamicinone => calicheamicin	Enediyne biosynthesis	#d27bde
+M00834	Calicheamicin orsellinate moiety biosynthesis, malonyl-CoA => orsellinate-ACP => 5-iodo-2,3-dimethoxyorsellinate-ACP	Enediyne biosynthesis	#d27bde
+M00082	Fatty acid biosynthesis, initiation	Fatty acid metabolism	#d9a344
+M00083	Fatty acid biosynthesis, elongation	Fatty acid metabolism	#d9a344
+M00085	Fatty acid elongation in mitochondria	Fatty acid metabolism	#d9a344
+M00086	beta-Oxidation, acyl-CoA synthesis	Fatty acid metabolism	#d9a344
+M00087	beta-Oxidation	Fatty acid metabolism	#d9a344
+M00415	Fatty acid elongation in endoplasmic reticulum	Fatty acid metabolism	#d9a344
+M00861	beta-Oxidation, peroxisome, VLCFA	Fatty acid metabolism	#d9a344
+M00873	Fatty acid biosynthesis in mitochondria, animals	Fatty acid metabolism	#d9a344
+M00874	Fatty acid biosynthesis in mitochondria, fungi	Fatty acid metabolism	#d9a344
+M00055	N-glycan precursor biosynthesis	Glycan biosynthesis	#588cd6
+M00056	O-glycan biosynthesis, mucin type core	Glycan biosynthesis	#588cd6
+M00065	GPI-anchor biosynthesis, core oligosaccharide	Glycan biosynthesis	#588cd6
+M00068	Glycosphingolipid biosynthesis, globo-series, LacCer => Gb4Cer	Glycan biosynthesis	#588cd6
+M00069	Glycosphingolipid biosynthesis, ganglio series, LacCer => GT3	Glycan biosynthesis	#588cd6
+M00070	Glycosphingolipid biosynthesis, lacto-series, LacCer => Lc4Cer	Glycan biosynthesis	#588cd6
+M00071	Glycosphingolipid biosynthesis, neolacto-series, LacCer => nLc4Cer	Glycan biosynthesis	#588cd6
+M00072	N-glycosylation by oligosaccharyltransferase	Glycan biosynthesis	#588cd6
+M00073	N-glycan precursor trimming	Glycan biosynthesis	#588cd6
+M00074	N-glycan biosynthesis, high-mannose type	Glycan biosynthesis	#588cd6
+M00075	N-glycan biosynthesis, complex type	Glycan biosynthesis	#588cd6
+M00872	O-glycan biosynthesis, mannose type (core M3)	Glycan biosynthesis	#588cd6
+M00057	Glycosaminoglycan biosynthesis, linkage tetrasaccharide	Glycosaminoglycan metabolism	#d66432
+M00058	Glycosaminoglycan biosynthesis, chondroitin sulfate backbone	Glycosaminoglycan metabolism	#d66432
+M00059	Glycosaminoglycan biosynthesis, heparan sulfate backbone	Glycosaminoglycan metabolism	#d66432
+M00076	Dermatan sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00077	Chondroitin sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00078	Heparan sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00079	Keratan sulfate degradation	Glycosaminoglycan metabolism	#d66432
+M00026	Histidine biosynthesis, PRPP => histidine	Histidine metabolism	#66d7bf
+M00045	Histidine degradation, histidine => N-formiminoglutamate => glutamate	Histidine metabolism	#66d7bf
+M00066	Lactosylceramide biosynthesis	Lipid metabolism	#d53e55
+M00067	Sulfoglycolipids biosynthesis, ceramide--1-alkyl-2-acylglycerol => sulfatide--seminolipid	Lipid metabolism	#d53e55
+M00088	Ketone body biosynthesis, acetyl-CoA => acetoacetate--3-hydroxybutyrate--acetone	Lipid metabolism	#d53e55
+M00089	Triacylglycerol biosynthesis	Lipid metabolism	#d53e55
+M00090	Phosphatidylcholine (PC) biosynthesis, choline => PC	Lipid metabolism	#d53e55
+M00091	Phosphatidylcholine (PC) biosynthesis, PE => PC	Lipid metabolism	#d53e55
+M00092	Phosphatidylethanolamine (PE) biosynthesis, ethanolamine => PE	Lipid metabolism	#d53e55
+M00093	Phosphatidylethanolamine (PE) biosynthesis, PA => PS => PE	Lipid metabolism	#d53e55
+M00094	Ceramide biosynthesis	Lipid metabolism	#d53e55
+M00098	Acylglycerol degradation	Lipid metabolism	#d53e55
+M00099	Sphingosine biosynthesis	Lipid metabolism	#d53e55
+M00100	Sphingosine degradation	Lipid metabolism	#d53e55
+M00113	Jasmonic acid biosynthesis	Lipid metabolism	#d53e55
+M00060	KDO2-lipid A biosynthesis, Raetz pathway, LpxL-LpxM type	Lipopolysaccharide metabolism	#83d2de
+M00063	CMP-KDO biosynthesis	Lipopolysaccharide metabolism	#83d2de
+M00064	ADP-L-glycero-D-manno-heptose biosynthesis	Lipopolysaccharide metabolism	#83d2de
+M00866	KDO2-lipid A biosynthesis, Raetz pathway, non-LpxL-LpxM type	Lipopolysaccharide metabolism	#83d2de
+M00867	KDO2-lipid A modification pathway	Lipopolysaccharide metabolism	#83d2de
+M00016	Lysine biosynthesis, succinyl-DAP pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00030	Lysine biosynthesis, AAA pathway, 2-oxoglutarate => 2-aminoadipate => lysine	Lysine metabolism	#d84e8b
+M00031	Lysine biosynthesis, mediated by LysW, 2-aminoadipate => lysine	Lysine metabolism	#d84e8b
+M00032	Lysine degradation, lysine => saccharopine => acetoacetyl-CoA	Lysine metabolism	#d84e8b
+M00433	Lysine biosynthesis, 2-oxoglutarate => 2-oxoadipate	Lysine metabolism	#d84e8b
+M00525	Lysine biosynthesis, acetyl-DAP pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00526	Lysine biosynthesis, DAP dehydrogenase pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00527	Lysine biosynthesis, DAP aminotransferase pathway, aspartate => lysine	Lysine metabolism	#d84e8b
+M00773	Tylosin biosynthesis, methylmalonyl-CoA + malonyl-CoA => tylactone => tylosin	Macrolide biosynthesis	#2e4b26
+M00774	Erythromycin biosynthesis, propanoyl-CoA + methylmalonyl-CoA => deoxyerythronolide B => erythromycin A--B	Macrolide biosynthesis	#2e4b26
+M00775	Oleandomycin biosynthesis, malonyl-CoA + methylmalonyl-CoA => 8,8a-deoxyoleandolide => oleandomycin	Macrolide biosynthesis	#2e4b26
+M00776	Pikromycin--methymycin biosynthesis, methylmalonyl-CoA + malonyl-CoA => narbonolide--10-deoxymethynolide => pikromycin--methymycin	Macrolide biosynthesis	#2e4b26
+M00777	Avermectin biosynthesis, 2-methylbutanoyl-CoA--isobutyryl-CoA => 6,8a-Seco-6,8a-deoxy-5-oxoavermectin 1a--1b aglycone => avermectin A1a--B1a--A1b--B1b	Macrolide biosynthesis	#2e4b26
+M00611	Oxygenic photosynthesis in plants and cyanobacteria	Metabolic capacity	#9378c3
+M00612	Anoxygenic photosynthesis in purple bacteria	Metabolic capacity	#9378c3
+M00613	Anoxygenic photosynthesis in green nonsulfur bacteria	Metabolic capacity	#9378c3
+M00614	Anoxygenic photosynthesis in green sulfur bacteria	Metabolic capacity	#9378c3
+M00615	Nitrate assimilation	Metabolic capacity	#9378c3
+M00616	Sulfate-sulfur assimilation	Metabolic capacity	#9378c3
+M00617	Methanogen	Metabolic capacity	#9378c3
+M00618	Acetogen	Metabolic capacity	#9378c3
+M00174	Methane oxidation, methanotroph, methane => formaldehyde	Methane metabolism	#9e7336
+M00344	Formaldehyde assimilation, xylulose monophosphate pathway	Methane metabolism	#9e7336
+M00345	Formaldehyde assimilation, ribulose monophosphate pathway	Methane metabolism	#9e7336
+M00346	Formaldehyde assimilation, serine pathway	Methane metabolism	#9e7336
+M00356	Methanogenesis, methanol => methane	Methane metabolism	#9e7336
+M00357	Methanogenesis, acetate => methane	Methane metabolism	#9e7336
+M00358	Coenzyme M biosynthesis	Methane metabolism	#9e7336
+M00378	F420 biosynthesis	Methane metabolism	#9e7336
+M00422	Acetyl-CoA pathway, CO2 => acetyl-CoA	Methane metabolism	#9e7336
+M00563	Methanogenesis, methylamine--dimethylamine--trimethylamine => methane	Methane metabolism	#9e7336
+M00567	Methanogenesis, CO2 => methane	Methane metabolism	#9e7336
+M00608	2-Oxocarboxylic acid chain extension, 2-oxoglutarate => 2-oxoadipate => 2-oxopimelate => 2-oxosuberate	Methane metabolism	#9e7336
+M00175	Nitrogen fixation, nitrogen => ammonia	Nitrogen metabolism	#2c2351
+M00528	Nitrification, ammonia => nitrite	Nitrogen metabolism	#2c2351
+M00529	Denitrification, nitrate => nitrogen	Nitrogen metabolism	#2c2351
+M00530	Dissimilatory nitrate reduction, nitrate => ammonia	Nitrogen metabolism	#2c2351
+M00531	Assimilatory nitrate reduction, nitrate => ammonia	Nitrogen metabolism	#2c2351
+M00804	Complete nitrification, comammox, ammonia => nitrite => nitrate	Nitrogen metabolism	#2c2351
+M00027	GABA (gamma-Aminobutyrate) shunt	Other amino acid metabolism	#c5d7a9
+M00118	Glutathione biosynthesis, glutamate => glutathione	Other amino acid metabolism	#c5d7a9
+M00369	Cyanogenic glycoside biosynthesis, tyrosine => dhurrin	Other amino acid metabolism	#c5d7a9
+M00012	Glyoxylate cycle	Other carbohydrate metabolism	#872b4e
+M00013	Malonate semialdehyde pathway, propanoyl-CoA => acetyl-CoA	Other carbohydrate metabolism	#872b4e
+M00014	Glucuronate pathway (uronate pathway)	Other carbohydrate metabolism	#872b4e
+M00061	D-Glucuronate degradation, D-glucuronate => pyruvate + D-glyceraldehyde 3P	Other carbohydrate metabolism	#872b4e
+M00081	Pectin degradation	Other carbohydrate metabolism	#872b4e
+M00114	Ascorbate biosynthesis, plants, glucose-6P => ascorbate	Other carbohydrate metabolism	#872b4e
+M00129	Ascorbate biosynthesis, animals, glucose-1P => ascorbate	Other carbohydrate metabolism	#872b4e
+M00130	Inositol phosphate metabolism, PI=> PIP2 => Ins(1,4,5)P3 => Ins(1,3,4,5)P4	Other carbohydrate metabolism	#872b4e
+M00131	Inositol phosphate metabolism, Ins(1,3,4,5)P4 => Ins(1,3,4)P3 => myo-inositol	Other carbohydrate metabolism	#872b4e
+M00132	Inositol phosphate metabolism, Ins(1,3,4)P3 => phytate	Other carbohydrate metabolism	#872b4e
+M00373	Ethylmalonyl pathway	Other carbohydrate metabolism	#872b4e
+M00532	Photorespiration	Other carbohydrate metabolism	#872b4e
+M00549	Nucleotide sugar biosynthesis, glucose => UDP-glucose	Other carbohydrate metabolism	#872b4e
+M00550	Ascorbate degradation, ascorbate => D-xylulose-5P	Other carbohydrate metabolism	#872b4e
+M00552	D-galactonate degradation, De Ley-Doudoroff pathway, D-galactonate => glycerate-3P	Other carbohydrate metabolism	#872b4e
+M00554	Nucleotide sugar biosynthesis, galactose => UDP-galactose	Other carbohydrate metabolism	#872b4e
+M00565	Trehalose biosynthesis, D-glucose 1P => trehalose	Other carbohydrate metabolism	#872b4e
+M00630	D-Galacturonate degradation (fungi), D-galacturonate => glycerol	Other carbohydrate metabolism	#872b4e
+M00631	D-Galacturonate degradation (bacteria), D-galacturonate => pyruvate + D-glyceraldehyde 3P	Other carbohydrate metabolism	#872b4e
+M00632	Galactose degradation, Leloir pathway, galactose => alpha-D-glucose-1P	Other carbohydrate metabolism	#872b4e
+M00740	Methylaspartate cycle	Other carbohydrate metabolism	#872b4e
+M00741	Propanoyl-CoA metabolism, propanoyl-CoA => succinyl-CoA	Other carbohydrate metabolism	#872b4e
+M00761	Undecaprenylphosphate alpha-L-Ara4N biosynthesis, UDP-GlcA => undecaprenyl phosphate alpha-L-Ara4N	Other carbohydrate metabolism	#872b4e
+M00854	Glycogen biosynthesis, glucose-1P => glycogen--starch	Other carbohydrate metabolism	#872b4e
+M00855	Glycogen degradation, glycogen => glucose-6P	Other carbohydrate metabolism	#872b4e
+M00097	beta-Carotene biosynthesis, GGAP => beta-carotene	Other terpenoid biosynthesis	#6e9368
+M00371	Castasterone biosynthesis, campesterol => castasterone	Other terpenoid biosynthesis	#6e9368
+M00372	Abscisic acid biosynthesis, beta-carotene => abscisic acid	Other terpenoid biosynthesis	#6e9368
+M00363	EHEC pathogenicity signature, Shiga toxin	Pathogenicity	#66406d
+M00542	EHEC--EPEC pathogenicity signature, T3SS and effectors	Pathogenicity	#66406d
+M00564	Helicobacter pylori pathogenicity signature, cagA pathogenicity island	Pathogenicity	#66406d
+M00574	Pertussis pathogenicity signature, pertussis toxin	Pathogenicity	#66406d
+M00575	Pertussis pathogenicity signature, T1SS	Pathogenicity	#66406d
+M00576	ETEC pathogenicity signature, heat-labile and heat-stable enterotoxins	Pathogenicity	#66406d
+M00850	Vibrio cholerae pathogenicity signature, cholera toxins	Pathogenicity	#66406d
+M00852	Vibrio cholerae pathogenicity signature, toxin coregulated pilus	Pathogenicity	#66406d
+M00853	ETEC pathogenicity signature, colonization factors	Pathogenicity	#66406d
+M00856	Salmonella enterica pathogenicity signature, typhoid toxin	Pathogenicity	#66406d
+M00857	Salmonella enterica pathogenicity signature, Vi antigen	Pathogenicity	#66406d
+M00859	Bacillus anthracis pathogenicity signature, anthrax toxin	Pathogenicity	#66406d
+M00860	Bacillus anthracis pathogenicity signature, polyglutamic acid capsule biosynthesis	Pathogenicity	#66406d
+M00161	Photosystem II	Photosynthesis	#cfa68a
+M00163	Photosystem I	Photosynthesis	#cfa68a
+M00597	Anoxygenic photosystem II [BR:ko00194]	Photosynthesis	#cfa68a
+M00598	Anoxygenic photosystem I [BR:ko00194]	Photosynthesis	#cfa68a
+M00660	Xanthomonas spp. pathogenicity signature, T3SS and effectors	Plant pathogenicity	#461d27
+M00133	Polyamine biosynthesis, arginine => agmatine => putrescine => spermidine	Polyamine biosynthesis	#a5b3da
+M00134	Polyamine biosynthesis, arginine => ornithine => putrescine	Polyamine biosynthesis	#a5b3da
+M00135	GABA biosynthesis, eukaryotes, putrescine => GABA	Polyamine biosynthesis	#a5b3da
+M00136	GABA biosynthesis, prokaryotes, putrescine => GABA	Polyamine biosynthesis	#a5b3da
+M00793	dTDP-L-rhamnose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00794	dTDP-6-deoxy-D-allose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00795	dTDP-beta-L-noviose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00796	dTDP-D-mycaminose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00797	dTDP-D-desosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00798	dTDP-L-mycarose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00799	dTDP-L-oleandrose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00800	dTDP-L-megosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00801	dTDP-L-olivose biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00802	dTDP-D-forosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00803	dTDP-D-angolosamine biosynthesis	Polyketide sugar unit biosynthesis	#5c4f24
+M00048	Inosine monophosphate biosynthesis, PRPP + glutamine => IMP	Purine metabolism	#e0a7d2
+M00049	Adenine ribonucleotide biosynthesis, IMP => ADP,ATP	Purine metabolism	#e0a7d2
+M00050	Guanine ribonucleotide biosynthesis IMP => GDP,GTP	Purine metabolism	#e0a7d2
+M00546	Purine degradation, xanthine => urea	Purine metabolism	#e0a7d2
+M00046	Pyrimidine degradation, uracil => beta-alanine, thymine => 3-aminoisobutanoate	Pyrimidine metabolism	#25585e
+M00051	Uridine monophosphate biosynthesis, glutamine (+ PRPP) => UMP	Pyrimidine metabolism	#25585e
+M00052	Pyrimidine ribonucleotide biosynthesis, UMP => UDP--UTP,CDP--CTP	Pyrimidine metabolism	#25585e
+M00053	Pyrimidine deoxyribonuleotide biosynthesis, CDP--CTP => dCDP--dCTP,dTDP--dTTP	Pyrimidine metabolism	#25585e
+M00018	Threonine biosynthesis, aspartate => homoserine => threonine	Serine and threonine metabolism	#de7d78
+M00020	Serine biosynthesis, glycerate-3P => serine	Serine and threonine metabolism	#de7d78
+M00033	Ectoine biosynthesis, aspartate => ectoine	Serine and threonine metabolism	#de7d78
+M00555	Betaine biosynthesis, choline => betaine	Serine and threonine metabolism	#de7d78
+M00101	Cholesterol biosynthesis, squalene 2,3-epoxide => cholesterol	Sterol biosynthesis	#4e96a2
+M00102	Ergocalciferol biosynthesis	Sterol biosynthesis	#4e96a2
+M00103	Cholecalciferol biosynthesis	Sterol biosynthesis	#4e96a2
+M00104	Bile acid biosynthesis, cholesterol => cholate--chenodeoxycholate	Sterol biosynthesis	#4e96a2
+M00106	Conjugated bile acid biosynthesis, cholate => taurocholate--glycocholate	Sterol biosynthesis	#4e96a2
+M00107	Steroid hormone biosynthesis, cholesterol => prognenolone => progesterone	Sterol biosynthesis	#4e96a2
+M00108	C21-Steroid hormone biosynthesis, progesterone => corticosterone--aldosterone	Sterol biosynthesis	#4e96a2
+M00109	C21-Steroid hormone biosynthesis, progesterone => cortisol--cortisone	Sterol biosynthesis	#4e96a2
+M00110	C19--C18-Steroid hormone biosynthesis, pregnenolone => androstenedione => estrone	Sterol biosynthesis	#4e96a2
+M00862	beta-Oxidation, peroxisome, tri--dihydroxycholestanoyl-CoA => choloyl--chenodeoxycholoyl-CoA	Sterol biosynthesis	#4e96a2
+M00176	Assimilatory sulfate reduction, sulfate => H2S	Sulfur metabolism	#4e96a2
+M00595	Thiosulfate oxidation by SOX complex, thiosulfate => sulfate	Sulfur metabolism	#4e96a2
+M00596	Dissimilatory sulfate reduction, sulfate => H2S	Sulfur metabolism	#4e96a2
+M00664	Nodulation	Symbiosis	#88574e
+M00095	C5 isoprenoid biosynthesis, mevalonate pathway	Terpenoid backbone biosynthesis	#4e6089
+M00096	C5 isoprenoid biosynthesis, non-mevalonate pathway	Terpenoid backbone biosynthesis	#4e6089
+M00364	C10-C20 isoprenoid biosynthesis, bacteria	Terpenoid backbone biosynthesis	#4e6089
+M00365	C10-C20 isoprenoid biosynthesis, archaea	Terpenoid backbone biosynthesis	#4e6089
+M00366	C10-C20 isoprenoid biosynthesis, plants	Terpenoid backbone biosynthesis	#4e6089
+M00367	C10-C20 isoprenoid biosynthesis, non-plant eukaryotes	Terpenoid backbone biosynthesis	#4e6089
+M00849	C5 isoprenoid biosynthesis, mevalonate pathway, archaea	Terpenoid backbone biosynthesis	#4e6089
+M00778	Type II polyketide backbone biosynthesis, acyl-CoA + malonyl-CoA => polyketide	Type II polyketide biosynthesis	#af7194
+M00779	Dihydrokalafungin biosynthesis, octaketide => dihydrokalafungin	Type II polyketide biosynthesis	#af7194
+M00780	Tetracycline--oxytetracycline biosynthesis, pretetramide => tetracycline--oxytetracycline	Type II polyketide biosynthesis	#af7194
+M00781	Nogalavinone--aklavinone biosynthesis, deoxynogalonate--deoxyaklanonate => nogalavinone--aklavinone	Type II polyketide biosynthesis	#af7194
+M00782	Mithramycin biosynthesis, 4-demethylpremithramycinone => mithramycin	Type II polyketide biosynthesis	#af7194
+M00783	Tetracenomycin C--8-demethyltetracenomycin C biosynthesis, tetracenomycin F2 => tetracenomycin C--8-demethyltetracenomycin C	Type II polyketide biosynthesis	#af7194
+M00784	Elloramycin biosynthesis, 8-demethyltetracenomycin C => elloramycin A	Type II polyketide biosynthesis	#af7194
+M00823	Chlortetracycline biosynthesis, pretetramide => chlortetracycline	Type II polyketide biosynthesis	#af7194
\ No newline at end of file
diff --git a/data/MicrobeAnnotator_KEGG/KEGG_Regular_Module_Information.pkl b/data/MicrobeAnnotator_KEGG/KEGG_Regular_Module_Information.pkl
new file mode 100644
index 0000000..c2ff119
Binary files /dev/null and b/data/MicrobeAnnotator_KEGG/KEGG_Regular_Module_Information.pkl differ
diff --git a/data/MicrobeAnnotator_KEGG/KEGG_Structural_Module_Information.pkl b/data/MicrobeAnnotator_KEGG/KEGG_Structural_Module_Information.pkl
new file mode 100644
index 0000000..ba85377
Binary files /dev/null and b/data/MicrobeAnnotator_KEGG/KEGG_Structural_Module_Information.pkl differ
diff --git a/data/MicrobeAnnotator_KEGG/MicrobeAnnotator-KEGG.tar.gz b/data/MicrobeAnnotator_KEGG/MicrobeAnnotator-KEGG.tar.gz
new file mode 100644
index 0000000..8c3f1d8
Binary files /dev/null and b/data/MicrobeAnnotator_KEGG/MicrobeAnnotator-KEGG.tar.gz differ
diff --git a/data/MicrobeAnnotator_KEGG/MicrobeAnnotator-KEGG.tar.gz.md5 b/data/MicrobeAnnotator_KEGG/MicrobeAnnotator-KEGG.tar.gz.md5
new file mode 100644
index 0000000..12fdf2c
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/MicrobeAnnotator-KEGG.tar.gz.md5
@@ -0,0 +1 @@
+7207b9efe0124c6e9781cf4cf4fa24de  MicrobeAnnotator-KEGG.tar.gz
diff --git a/data/MicrobeAnnotator_KEGG/README.md b/data/MicrobeAnnotator_KEGG/README.md
new file mode 100644
index 0000000..3c1a62d
--- /dev/null
+++ b/data/MicrobeAnnotator_KEGG/README.md
@@ -0,0 +1,69 @@
+# MicrobeAnnotator-KEGG
+
+**If this is used in any way, please cite the source publication:** 
+
+Ruiz-Perez, C.A., Conrad, R.E. & Konstantinidis, K.T. MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes. BMC Bioinformatics 22, 11 (2021). https://doi.org/10.1186/s12859-020-03940-5
+
+**This data has been incorporated from the following source:** 
+
+https://github.com/cruizperez/MicrobeAnnotator/tree/master/microbeannotator/data
+
+**File Descriptions:**
+
+* `KEGG_Regular_Module_Information.pkl` - Python dictionary of regular modules from `MicrobeAnnotator` of `{id_module:structured_kegg_orthologs}`
+* `KEGG_Bifurcating_Module_Information.pkl` - Python dictionary of bifurcating modules from `MicrobeAnnotator` of `{id_module:structured_kegg_orthologs}`
+* `KEGG_Structural_Module_Information.pkl` - Python dictionary of structural modules from `MicrobeAnnotator` of `{id_module:structured_kegg_orthologs}`
+* `KEGG_Module_Information.txt` - - Table containing KEGG ortholog, higher level categories, and module color
+* `KEGG_Module-KOs.pkl` - Flattened dictionary which includes `{id_module:{KO_1, KO_2, ..., KO_M}`.  Note: This is not structured and should be used cautiously as KEGG modules and completion calculations are complex.  Generated with the Python code below:
+
+```python
+import pickle, glob, os
+
+kegg_directory = "{}/MicrobeAnnotator_KEGG/".format(os.environ["VEBA_DATABASE"]
+
+delimiters = [",","_","-","+"]
+
+# Load MicrobeAnnotator KEGG dictionaries
+module_to_kos__unprocessed = defaultdict(set)
+for fp in glob.glob(os.path.join(kegg_directory, "*.pkl")):
+    with open(fp, "rb") as f:
+        d = pickle.load(f)
+        
+    for id_module, v1 in d.items():
+        if isinstance(v1, list):
+            try:
+                module_to_kos__unprocessed[id_module].update(v1)
+            except TypeError:
+                for v2 in v1:
+                    module_to_kos__unprocessed[id_module].update(v2)
+        else:
+            for k2, v2 in v1.items():
+                if isinstance(v2, list):
+                    try:
+                        module_to_kos__unprocessed[id_module].update(v2)
+                    except TypeError:
+                        for v3 in v2:
+                            module_to_kos__unprocessed[id_module].update(v3)
+
+# Flatten the KEGG orthologs
+module_to_kos__processed = dict()
+for id_module, kos_unprocessed in module_to_kos__unprocessed.items():
+    kos_processed = set()
+    for id_ko in kos:
+        composite=False
+        for sep in delimiters:
+            if sep in id_ko:
+                id_ko = id_ko.replace(sep,";")
+                composite = True
+        if composite:
+            kos_composite = set(map(str.strip, filter(bool, id_ko.split(";"))))
+            kos_processed.update(kos_composite)
+        else:
+            kos_processed.add(id_ko)
+    module_to_kos__processed[id_module] = kos_processed
+
+
+# Write
+with open(os.path.join(kegg_directory, "KEGG_Module-KOs.pkl"), "wb") as f:
+    pickle.dump(module_to_kos__processed, f)
+```
\ No newline at end of file
diff --git a/data/README.md b/data/README.md
index 10b3b4b..6525e90 100644
--- a/data/README.md
+++ b/data/README.md
@@ -9,4 +9,13 @@ The following fastq files are subsets of the original SRA sequences designed for
 | S3        | SRR17458630 | FASTQ  | DNA  | 2389989  | 75      | 150.4   | 151     | 56.38 |
 | S4        | SRR17458638 | FASTQ  | DNA  | 3142566  | 75      | 150.5   | 151     | 46.34 |
 
-[**Download**](https://zenodo.org/record/7946802#.ZGVSpuzMKDU)
\ No newline at end of file
+Also includes the following: 
+
+* Metagenomic assemblies using metaSPAdes with sorted BAM files from Bowtie2
+* Genomes, gene models, etc.
+* Taxonomy classifications at the genome and genome cluster level
+* Annotations for genes and protein clusters
+* Biosynthetic gene clusters
+* Clusters for genomes and proteins
+
+[**Download**](https://zenodo.org/records/10094990)
\ No newline at end of file
diff --git a/install/README.md b/install/README.md
index d7d56b0..f92e986 100644
--- a/install/README.md
+++ b/install/README.md
@@ -3,16 +3,18 @@ ____________________________________________________________
 #### Software installation
 One issue with having large-scale pipeline suites with open-source software is the issue of dependencies.  One solution for this is to have a modular software structure where each module has its own `conda` environment.  This allows for minimizing dependency constraints as this software suite uses an array of diverse packages from different developers. 
 
-The basis for these environments is creating a separate environment for each module with the `VEBA-` prefix and `_env` as the suffix.  For example `VEBA-assembly_env` or `VEBA-binning-prokaryotic_env`.  Because of this, `VEBA` is currently not available as a `conda` package but each module will be in the near future.  In the meantime, please use the `veba/install/install_veba.sh` script which installs each environment from the yaml files in `veba/install/environments/`. After installing the environments, use the `veba/install/download_databases.sh` script to download and configure the databases while also adding the environment variables to the activate/deactivate scripts in each environment.  To install anything manually, just read the scripts as they are well documented and refer to different URL and paths for specific installation options.
+The basis for these environments is creating a separate environment for each module with the `VEBA-` prefix and `_env` as the suffix.  For example `VEBA-assembly_env` or `VEBA-binning-prokaryotic_env`.  Because of this, `VEBA` is currently not available as a `conda` package but each module will be in the near future.  In the meantime, please use the `veba/install/install.sh` script which installs each environment from the yaml files in `veba/install/environments/`. After installing the environments, use the `veba/install/download_databases.sh` script to download and configure the databases while also adding the environment variables to the activate/deactivate scripts in each environment.  To install anything manually, just read the scripts as they are well documented and refer to different URL and paths for specific installation options.
 
-The majority of the time taken to build database is downloading/decompressing large archives, `Diamond` database creation of `UniRef`, and `MMSEQS2` database creation of microeukaryotic protein database.
+The majority of the time taken to build database is downloading/decompressing large archives (e.g., `UniRef` & `GTDB`), `Diamond` database creation of `UniRef`, and `MMSEQS2` database creation of `MicroEuk` database.
 
 Total size is `243 GB` but if you have certain databases installed already then you can just symlink them so the `VEBA_DATABASE` path has the correct structure.  Note, the exact size may vary as Pfam and UniRef are updated regularly.
 
 Each major version will be packaged as a [release](https://github.com/jolespin/veba/releases) which will include a log of module and script versions. 
 
-**Download Anaconda:** 
-[https://www.anaconda.com/products/distribution](https://www.anaconda.com/products/distribution)
+**Download Miniconda (or Anaconda):** 
+
+* [https://docs.conda.io/projects/miniconda/en/latest/](https://docs.conda.io/projects/miniconda/en/latest/) (Recommended)
+* [https://www.anaconda.com/products/distribution](https://www.anaconda.com/products/distribution)
 
 ____________________________________________________________
 
@@ -33,7 +35,7 @@ Currently, **Conda environments for VEBA are ONLY configured for Linux** and, du
 
 * Download/configure databases
 
-**0. Clean up your conda installation [Optional, but recommended]**
+**0. Clean up your conda installation [Optional, but highly recommended]**
 
 The `VEBA` installation is going to configure some `conda` environments for you and some of them have quite a bit of packages.  To minimize the likelihood of [weird errors](https://forum.qiime2.org/t/valueerror-unsupported-format-character-t-0x54-at-index-3312-when-creating-environment-from-environment-file/25237), it's recommended to do the following:
 
@@ -83,7 +85,7 @@ The `VEBA` installation is going to configure some `conda` environments for you
 ```
 # For stable version, download and decompress the tarball:
 
-VERSION="1.3.0"
+VERSION="1.4.0"
 wget https://github.com/jolespin/veba/archive/refs/tags/v${VERSION}.tar.gz
 tar -xvf v${VERSION}.tar.gz && mv veba-${VERSION} veba
 
@@ -106,14 +108,16 @@ cd veba/install
 The update from `CheckM1` -> `CheckM2` and installation of `antiSMASH` require more memory and may require grid access if head node is limited.
 
 ```
-bash install_veba.sh
+bash install.sh
 ```
 
 **3. Activate the database conda environment, download, and configure databases**
 
 **Recommended resource allocatation:**  48 GB memory (time is dependent on I/O of database repositories)
 
-⚠️ **This step should use ~48 GB memory** and should be run using a compute grid via SLURM or SunGridEngine.  If this command is run on the head node it will likely fail or timeout if a connection is interrupted. The most computationally intensive steps are creating a `Diamond` database of `UniRef` and a `MMSEQS2` database of the microeukaryotic protein database.  Note the duration will depend on several factors including your internet connection speed and the I/O of public repositories.
+⚠️ **This step should use ~48 GB memory** and should be run using a compute grid via `SLURM` or `SunGridEngine`.  **If this command is run on the head node it will likely fail or timeout if a connection is interrupted.** The most computationally intensive steps are creating a `Diamond` database of `UniRef` and a `MMSEQS2` database of the `MicroEuk100/90/50`.  
+
+Note the duration will depend on several factors including your internet connection speed and the I/O of public repositories.
 
 **Future releases will split the downloading and configuration to better make use of resources.**
 
@@ -163,7 +167,7 @@ qsub -o logs/${N}.o -e logs/${N}.e -cwd -N ${N} -j y -pe threaded ${N_JOBS} "${C
 PARTITION=[partition name]
 ACCOUNT=[account name]
 
-sbatch -A ${ACCOUNT} -p ${PARTITION} -J ${N} -N 1 -c ${N_JOBS} --ntasks-per-node=1 -o logs/${N}.o -e logs/${N}.e --export=ALL -t 12:00:00 --mem=64G --wrap="${CMD}"
+sbatch -A ${ACCOUNT} -p ${PARTITION} -J ${N} -N 1 -c ${N_JOBS} --ntasks-per-node=1 -o logs/${N}.o -e logs/${N}.e --export=ALL -t 16:00:00 --mem=24G --wrap="${CMD}"
 ```
 
 Now, you should have the following environments:
@@ -183,6 +187,7 @@ VEBA-phylogeny_env
 VEBA-preprocess_env
 VEBA-profile_env
 ```
+
 All the environments should have the `VEBA_DATABASE` environment variable set. If not, then add it manually to ~/.bash_profile: `export VEBA_DATABASE=/path/to/veba_database`.
 
 You can check to make sure the `conda` environments were created and all of the environment variables were created using the following command:
@@ -218,7 +223,7 @@ ____________________________________________________________
 
 ```
 # Remove conda enivronments
-bash uninstall_veba.sh
+bash uninstall.sh
 
 # Remove VEBA database
 rm -rfv /path/to/veba_database
@@ -230,6 +235,6 @@ ____________________________________________________________
 There are currently 2 ways to update veba:
 
 1. Basic uninstall reinstall - You can uninstall and reinstall using the scripts in `veba/install/` directory.  It's recomended to do a fresh reinstall when updating from `v1.0.x` → `v1.2.x`.
-2. Patching existing installation - Complete reinstalls of *VEBA* environments and databases is time consuming so [we've detailed how to do specific patches **for advanced users**](PATCHES.md). If you don't feel comfortable running these commands, then just do a fresh install if you would like to update. 
+2. Patching existing installation - TBD Guide for updating specific modules in an installation.  
 
 
diff --git a/install/PATCHES.md b/install/deprecated/PATCHES.md
similarity index 100%
rename from install/PATCHES.md
rename to install/deprecated/PATCHES.md
diff --git a/install/download_databases.sh b/install/download_databases.sh
index 12833fd..06c4d48 100644
--- a/install/download_databases.sh
+++ b/install/download_databases.sh
@@ -1,11 +1,12 @@
 #!/bin/bash
-# __version__ = "2023.10.23"
-# VEBA_DATABASE_VERSION = "VDB_v5.2"
-# MICROEUKAYROTIC_DATABASE_VERSION = "VDB-Microeukaryotic_v2.1"
+# __version__ = "2023.12.11"
+# VEBA_DATABASE_VERSION = "VDB_v6"
+# MICROEUKAYROTIC_DATABASE_VERSION = "MicroEuk_v3"
 
 # Create database
 DATABASE_DIRECTORY=${1:-"."}
 REALPATH_DATABASE_DIRECTORY=$(realpath $DATABASE_DIRECTORY)
+SCRIPT_DIRECTORY=$(dirname "$0")
 
 # N_JOBS=$(2:-"1")
 
@@ -28,7 +29,7 @@ echo ". .. ... ..... ........ ............."
 echo "i * Processing NCBITaxonomy"
 echo ". .. ... ..... ........ ............."
 mkdir -v -p ${DATABASE_DIRECTORY}/Classify/NCBITaxonomy
-wget -v -P ${DATABASE_DIRECTORY}/Classify/NCBITaxonomy https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz 
+# wget -v -P ${DATABASE_DIRECTORY}/Classify/NCBITaxonomy https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz 
 wget -v -P ${DATABASE_DIRECTORY} https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
 # python -c 'import sys; from ete3 import NCBITaxa; NCBITaxa(taxdump_file="%s/taxdump.tar.gz"%(sys.argv[1]), dbfile="%s/Classify/NCBITaxonomy/taxa.sqlite"%(sys.argv[1]))' $DATABASE_DIRECTORY
 tar xzfv ${DATABASE_DIRECTORY}/taxdump.tar.gz -C ${DATABASE_DIRECTORY}/Classify/NCBITaxonomy/
@@ -86,18 +87,56 @@ echo ". .. ... ..... ........ ............."
 echo "v * Processing Microeukaryotic MMSEQS2 database"
 echo ". .. ... ..... ........ ............."
 
-# Download v2.1 from Zenodo
-wget -v -O ${DATABASE_DIRECTORY}/Microeukaryotic.tar.gz https://zenodo.org/record/7485114/files/VDB-Microeukaryotic_v2.tar.gz?download=1
-mkdir -p ${DATABASE_DIRECTORY}/Classify/Microeukaryotic && tar -xvzf ${DATABASE_DIRECTORY}/Microeukaryotic.tar.gz -C ${DATABASE_DIRECTORY}/Classify/Microeukaryotic --strip-components=1
-mmseqs createdb ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.faa.gz ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/microeukaryotic
-rm -rf ${DATABASE_DIRECTORY}/Microeukaryotic.tar.gz
+## Download v2.1 from Zenodo
+# wget -v -O ${DATABASE_DIRECTORY}/Microeukaryotic.tar.gz https://zenodo.org/record/7485114/files/VDB-Microeukaryotic_v2.tar.gz?download=1
+# mkdir -p ${DATABASE_DIRECTORY}/Classify/Microeukaryotic && tar -xvzf ${DATABASE_DIRECTORY}/Microeukaryotic.tar.gz -C ${DATABASE_DIRECTORY}/Classify/Microeukaryotic --strip-components=1
+# mmseqs createdb --compressed 1 ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.faa.gz ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/microeukaryotic
+# rm -rf ${DATABASE_DIRECTORY}/Microeukaryotic.tar.gz
 
-# eukaryota_odb10 subset of Microeukaryotic Protein Database
-wget -v -O ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.list https://zenodo.org/record/7485114/files/reference.eukaryota_odb10.list?download=1
-seqkit grep -f ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.list ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.faa.gz > ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.faa
-mmseqs createdb ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.faa ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/microeukaryotic.eukaryota_odb10
-rm -rf ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.faa
-rm -rf ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.faa.gz # Comment this out if you want to keep the actual protein sequences
+# # eukaryota_odb10 subset of Microeukaryotic Protein Database
+# wget -v -O ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.list https://zenodo.org/record/7485114/files/reference.eukaryota_odb10.list?download=1
+# seqkit grep -f ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.list ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.faa.gz > ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.faa
+# mmseqs createdb --compressed 1 ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.faa ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/microeukaryotic.eukaryota_odb10
+# rm -rf ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.eukaryota_odb10.faa
+# rm -rf ${DATABASE_DIRECTORY}/Classify/Microeukaryotic/reference.faa.gz # Comment this out if you want to keep the actual protein sequences
+
+# Download MicroEuk_v3 from Zenodo
+wget -v -O ${DATABASE_DIRECTORY}/MicroEuk_v3.tar.gz https://zenodo.org/records/10139451/files/MicroEuk_v3.tar.gz?download=1 
+tar xvzf ${DATABASE_DIRECTORY}/MicroEuk_v3.tar.gz -C ${DATABASE_DIRECTORY}
+mkdir -p ${DATABASE_DIRECTORY}/Classify/MicroEuk
+
+# Source Taxonomy
+cp -rf ${DATABASE_DIRECTORY}/MicroEuk_v3/source_taxonomy.tsv.gz ${DATABASE_DIRECTORY}/Classify/MicroEuk
+
+# MicroEuk100
+gzip -d ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.faa.gz
+mmseqs createdb --compressed 1 ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.faa ${DATABASE_DIRECTORY}/Classify/MicroEuk/MicroEuk100
+
+# MicroEuk100.eukaryota_odb10
+gzip -d ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.eukaryota_odb10.list.gz
+seqkit grep -f ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.eukaryota_odb10.list ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.faa | mmseqs createdb --compressed 1 stdin ${DATABASE_DIRECTORY}/Classify/MicroEuk/MicroEuk100
+
+# MicroEuk90
+gzip -d -c ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk90_clusters.tsv.gz | cut -f1 | sort -u > ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk90.list
+seqkit grep -f ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk90.list ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.faa | mmseqs createdb --compressed 1 stdin ${DATABASE_DIRECTORY}/Classify/MicroEuk/MicroEuk90
+
+# MicroEuk90
+gzip -d -c ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk90_clusters.tsv.gz | cut -f1 | sort -u > ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk90.list
+seqkit grep -f ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk90.list ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.faa | mmseqs createdb --compressed 1 stdin ${DATABASE_DIRECTORY}/Classify/MicroEuk/MicroEuk90
+
+# MicroEuk50
+gzip -d -c ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk50_clusters.tsv.gz | cut -f1 | sort -u > ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk50.list
+seqkit grep -f ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk50.list ${DATABASE_DIRECTORY}/MicroEuk_v3/MicroEuk100.faa | mmseqs createdb --compressed 1 stdin ${DATABASE_DIRECTORY}/Classify/MicroEuk/MicroEuk50
+
+# source_to_lineage.dict.pkl.gz
+build_source_to_lineage_dictionary.py -i ${DATABASE_DIRECTORY}/MicroEuk_v3/source_taxonomy.tsv.gz -o ${DATABASE_DIRECTORY}/Classify/MicroEuk/source_to_lineage.dict.pkl.gz
+
+# target_to_source.dict.pkl.gz
+build_target_to_source_dictionary.py -i ${DATABASE_DIRECTORY}/MicroEuk_v3/identifier_mapping.proteins.tsv.gz -o ${DATABASE_DIRECTORY}/Classify/MicroEuk/target_to_source.dict.pkl.gz
+
+# Remove intermediate files
+rm -rf ${DATABASE_DIRECTORY}/MicroEuk_v3/
+rm -rf ${DATABASE_DIRECTORY}/MicroEuk_v3.tar.gz
 
 # MarkerSets
 echo ". .. ... ..... ........ ............."
@@ -213,11 +252,17 @@ rm -rf ${DATABASE_DIRECTORY}/Contamination/AntiFam/*.seed
 mkdir -v -p ${DATABASE_DIRECTORY}/Contamination/kmers
 wget -v -O ${DATABASE_DIRECTORY}/Contamination/kmers/ribokmers.fa.gz https://figshare.com/ndownloader/files/36220587
 
-# Replacing GRCh38 with CHM13v2.0 in v2022.10.18
+# T2T-CHM13v2.0
+# Bowtie2 Index
 wget -v -P ${DATABASE_DIRECTORY} https://genome-idx.s3.amazonaws.com/bt/chm13v2.0.zip
 unzip -d ${DATABASE_DIRECTORY}/Contamination/ ${DATABASE_DIRECTORY}/chm13v2.0.zip
 rm -rf ${DATABASE_DIRECTORY}/chm13v2.0.zip
 
+# # MiniMap2 Index (Uncomment if you plan on using long reads (7.1 GB))
+# wget -v -P ${DATABASE_DIRECTORY} https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz
+# minimap2 -d ${DATABASE_DIRECTORY}/Contamination/chm13v2.0/chm13v2.0.mmi ${DATABASE_DIRECTORY}/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz
+# rm -rf ${DATABASE_DIRECTORY}/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz
+
 echo ". .. ... ..... ........ ............."
 echo "xii * Adding the following environment variable to VEBA environments: export VEBA_DATABASE=${REALPATH_DATABASE_DIRECTORY}"
 # CONDA_BASE=$(which conda | python -c "import sys; print('/'.join(sys.stdin.read().split('/')[:-2]))")
diff --git a/install/environments/VEBA-assembly_env.yml b/install/environments/VEBA-assembly_env.yml
index 692c79e..6d5a013 100644
--- a/install/environments/VEBA-assembly_env.yml
+++ b/install/environments/VEBA-assembly_env.yml
@@ -1,4 +1,4 @@
-name: VEBA-assembly_env__2023.5.15
+name: VEBA-assembly_env__2023.11.30
 channels:
   - conda-forge
   - bioconda
@@ -16,15 +16,16 @@ dependencies:
   - bz2file=0.98=py_0
   - bzip2=1.0.8=h7f98852_4
   - c-ares=1.18.1=h7f98852_0
-  - ca-certificates=2022.12.7=ha878542_0
+  - ca-certificates=2023.11.17=hbcca054_0
   - cairo=1.16.0=ha61ee94_1014
-  - certifi=2022.12.7=pyhd8ed1ab_0
+  - certifi=2023.11.17=pyhd8ed1ab_0
   - cffi=1.15.1=py39he91dace_2
   - charset-normalizer=2.1.1=pyhd8ed1ab_0
   - colorama=0.4.6=pyhd8ed1ab_0
   - coreutils=9.3=h0b41bf4_0
-  - cryptography=38.0.4=py39hd97740a_0
+  - cryptography=41.0.7=py39hd4f0224_0
   - expat=2.5.0=h27087fc_0
+  - flye=2.9.3=py39hd65a603_0
   - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
   - font-ttf-inconsolata=3.000=h77eed37_0
   - font-ttf-source-code-pro=2.038=h77eed37_0
@@ -38,21 +39,22 @@ dependencies:
   - giflib=5.2.1=h36c2ea0_2
   - graphite2=1.3.13=h58526e2_1001
   - harfbuzz=5.3.0=h418a68e_0
-  - htslib=1.16=h6bc39ce_0
+  - htslib=1.18=h81da01d_0
   - icu=70.1=h27087fc_0
   - idna=3.4=pyhd8ed1ab_0
   - jpeg=9e=h166bdaf_2
+  - k8=0.2.5=hdcf5f25_4
   - kernel-headers_linux-64=3.10.0=h4a8ded7_13
   - keyutils=1.6.1=h166bdaf_0
-  - krb5=1.19.3=h3790be6_0
-  - lcms2=2.14=h6ed2654_0
+  - krb5=1.21.2=h659d440_0
+  - lcms2=2.12=hddcbb42_0
   - ld_impl_linux-64=2.39=hcc3a1bd_1
   - lerc=4.0.0=h27087fc_0
   - libblas=3.9.0=16_linux64_openblas
   - libcblas=3.9.0=16_linux64_openblas
-  - libcups=2.3.3=h3e49a29_2
-  - libcurl=7.86.0=h7bff187_1
-  - libdeflate=1.13=h166bdaf_0
+  - libcups=2.3.3=h4637d8d_4
+  - libcurl=8.2.1=hca28451_0
+  - libdeflate=1.19=hd590300_0
   - libedit=3.1.20191231=he28a2e2_2
   - libev=4.33=h516909a_1
   - libffi=3.4.2=h7f98852_5
@@ -64,14 +66,14 @@ dependencies:
   - libhwloc=2.8.0=h32351e8_1
   - libiconv=1.17=h166bdaf_0
   - liblapack=3.9.0=16_linux64_openblas
-  - libnghttp2=1.47.0=hdcd2b5c_1
+  - libnghttp2=1.52.0=h61bc06f_0
   - libnsl=2.0.0=h7f98852_0
   - libopenblas=0.3.21=pthreads_h78a6416_3
   - libpng=1.6.39=h753d276_0
   - libsqlite=3.40.0=h753d276_0
-  - libssh2=1.10.0=haa6b8db_3
+  - libssh2=1.11.0=h0841786_0
   - libstdcxx-ng=12.2.0=h46fd767_19
-  - libtiff=4.4.0=h0e0dad5_3
+  - libtiff=4.2.0=hf544144_3
   - libuuid=2.32.1=h7f98852_1000
   - libwebp-base=1.2.4=h166bdaf_0
   - libxcb=1.13=h7f98852_1004
@@ -79,11 +81,12 @@ dependencies:
   - libzlib=1.2.13=h166bdaf_4
   - llvm-openmp=8.0.1=hc9558a2_0
   - megahit=1.2.9=h2e03b76_1
+  - minimap2=2.26=he4a0461_2
   - ncurses=6.3=h27087fc_1
   - numpy=1.23.5=py39h3d75532_0
-  - openjdk=17.0.3=hafdced1_4
+  - openjdk=11.0.1=h516909a_1016
   - openmp=8.0.1=0
-  - openssl=1.1.1t=h0b41bf4_0
+  - openssl=3.2.0=hd590300_1
   - pandas=1.5.2=py39h4661b88_0
   - pathlib2=2.3.7.post1=py39hf3d152e_2
   - pbzip2=1.1.13=0
@@ -93,9 +96,9 @@ dependencies:
   - pixman=0.40.0=h36c2ea0_0
   - pthread-stubs=0.4=h36c2ea0_1001
   - pycparser=2.21=pyhd8ed1ab_0
-  - pyopenssl=22.1.0=pyhd8ed1ab_0
+  - pyopenssl=23.3.0=pyhd8ed1ab_0
   - pysocks=1.7.1=pyha2e5f31_6
-  - python=3.9.15=h47a2c10_0_cpython
+  - python=3.9.16=h2782a2a_0_cpython
   - python-dateutil=2.8.2=pyhd8ed1ab_0
   - python-tzdata=2022.7=pyhd8ed1ab_0
   - python_abi=3.9=3_cp39
diff --git a/install/environments/VEBA-cluster_env.yml b/install/environments/VEBA-cluster_env.yml
index 2f2d189..b9ff294 100644
--- a/install/environments/VEBA-cluster_env.yml
+++ b/install/environments/VEBA-cluster_env.yml
@@ -1,4 +1,4 @@
-name: VEBA-cluster_env__v2023.5.15
+name: VEBA-cluster_env__v2023.12.8
 channels:
   - conda-forge
   - bioconda
@@ -9,27 +9,36 @@ dependencies:
   - _openmp_mutex=4.5=2_gnu
   - aria2=1.36.0=h1e4e653_3
   - biopython=1.80=py311hd4cff14_0
+  - blast=2.14.1=pl5321h6f7f691_0
   - brotlipy=0.7.0=py311hd4cff14_1005
   - bz2file=0.98=py_0
   - bzip2=1.0.8=h7f98852_4
   - c-ares=1.18.1=h7f98852_0
-  - ca-certificates=2022.12.7=ha878542_0
-  - certifi=2022.12.7=pyhd8ed1ab_0
+  - ca-certificates=2023.11.17=hbcca054_0
+  - certifi=2023.11.17=pyhd8ed1ab_0
   - cffi=1.15.1=py311h409f033_3
   - charset-normalizer=2.1.1=pyhd8ed1ab_0
   - colorama=0.4.6=pyhd8ed1ab_0
   - coreutils=9.3=h0b41bf4_0
   - cryptography=39.0.0=py311h9b4c7bb_0
-  - fastani=1.33=h0fdf51a_1
+  - curl=8.1.2=h409715c_0
+  - diamond=2.1.8=h43eeafb_0
+  - entrez-direct=16.2=he881be0_1
+  - fastani=1.34=h4dfc31f_1
   - gawk=5.1.0=h7f98852_0
   - genopype=2023.5.15=py_0
   - gettext=0.21.1=h27087fc_0
   - gsl=2.7=he838d99_0
   - icu=70.1=h27087fc_0
   - idna=3.4=pyhd8ed1ab_0
+  - keyutils=1.6.1=h166bdaf_0
+  - krb5=1.20.1=h81ceb04_0
   - ld_impl_linux-64=2.40=h41732ed_0
   - libblas=3.9.0=16_linux64_openblas
   - libcblas=3.9.0=16_linux64_openblas
+  - libcurl=8.1.2=h409715c_0
+  - libedit=3.1.20191231=he28a2e2_2
+  - libev=4.33=h516909a_1
   - libffi=3.4.2=h7f98852_5
   - libgcc-ng=12.2.0=h65d4601_19
   - libgfortran-ng=12.2.0=h69a702a_19
@@ -38,6 +47,7 @@ dependencies:
   - libiconv=1.17=h166bdaf_0
   - libidn2=2.3.4=h166bdaf_0
   - liblapack=3.9.0=16_linux64_openblas
+  - libnghttp2=1.52.0=h61bc06f_0
   - libnsl=2.0.0=h7f98852_0
   - libopenblas=0.3.21=pthreads_h78a6416_3
   - libsqlite=3.40.0=h753d276_0
@@ -48,13 +58,34 @@ dependencies:
   - libxml2=2.10.3=h7463322_0
   - libzlib=1.2.13=h166bdaf_4
   - mmseqs2=14.7e284=pl5321hf1761c0_0
+  - ncbi-vdb=3.0.0=pl5321h87f3376_0
   - ncurses=6.3=h27087fc_1
   - networkx=3.0=pyhd8ed1ab_0
   - numpy=1.24.1=py311h8e6699e_0
-  - openssl=3.0.8=h0b41bf4_0
+  - openssl=3.2.0=hd590300_1
   - pandas=1.5.3=py311h2872171_0
   - pathlib2=2.3.7.post1=py311h38be061_2
+  - pcre=8.45=h9c3ff4c_0
   - perl=5.32.1=2_h7f98852_perl5
+  - perl-archive-tar=2.40=pl5321hdfd78af_0
+  - perl-carp=1.38=pl5321hdfd78af_4
+  - perl-common-sense=3.75=pl5321hdfd78af_0
+  - perl-compress-raw-bzip2=2.201=pl5321h87f3376_1
+  - perl-compress-raw-zlib=2.105=pl5321h87f3376_0
+  - perl-encode=3.19=pl5321hec16e2b_1
+  - perl-exporter=5.72=pl5321hdfd78af_2
+  - perl-exporter-tiny=1.002002=pl5321hdfd78af_0
+  - perl-extutils-makemaker=7.70=pl5321hd8ed1ab_0
+  - perl-io-compress=2.201=pl5321hdbdd923_2
+  - perl-io-zlib=1.14=pl5321hdfd78af_0
+  - perl-json=4.10=pl5321hdfd78af_0
+  - perl-json-xs=2.34=pl5321h4ac6f70_6
+  - perl-list-moreutils=0.430=pl5321hdfd78af_0
+  - perl-list-moreutils-xs=0.430=pl5321h031d066_2
+  - perl-parent=0.236=pl5321hdfd78af_2
+  - perl-pathtools=3.75=pl5321hec16e2b_3
+  - perl-scalar-list-utils=1.62=pl5321hec16e2b_1
+  - perl-types-serialiser=1.01=pl5321hdfd78af_0
   - pip=23.0=pyhd8ed1ab_0
   - pycparser=2.21=pyhd8ed1ab_0
   - pyopenssl=23.0.0=pyhd8ed1ab_0
@@ -71,6 +102,7 @@ dependencies:
   - seqkit=2.3.1=h9ee0642_0
   - setuptools=66.1.1=pyhd8ed1ab_0
   - six=1.16.0=pyh6c4a22f_0
+  - skani=0.2.1=h4ac6f70_0
   - soothsayer_utils=2022.6.24=py_0
   - tk=8.6.12=h27826a3_0
   - tqdm=4.64.1=pyhd8ed1ab_0
@@ -80,4 +112,5 @@ dependencies:
   - wget=1.20.3=ha35d2d1_1
   - wheel=0.38.4=pyhd8ed1ab_0
   - xz=5.2.6=h166bdaf_0
-  - zlib=1.2.13=h166bdaf_4
\ No newline at end of file
+  - zlib=1.2.13=h166bdaf_4
+  - zstd=1.5.5=hfc55251_0
\ No newline at end of file
diff --git a/install/environments/VEBA-database_env.yml b/install/environments/VEBA-database_env.yml
index 8e56e1c..f78e9c4 100644
--- a/install/environments/VEBA-database_env.yml
+++ b/install/environments/VEBA-database_env.yml
@@ -1,4 +1,4 @@
-name: VEBA-database_env__v2023.6.20
+name: VEBA-database_env__v2023.11.30
 channels:
   - conda-forge
   - bioconda
@@ -14,8 +14,8 @@ dependencies:
   - bz2file=0.98=py_0
   - bzip2=1.0.8=h7f98852_4
   - c-ares=1.19.1=hd590300_0
-  - ca-certificates=2023.5.7=hbcca054_0
-  - certifi=2023.5.7=pyhd8ed1ab_0
+  - ca-certificates=2023.11.17=hbcca054_0
+  - certifi=2023.11.17=pyhd8ed1ab_0
   - charset-normalizer=3.1.0=pyhd8ed1ab_0
   - colorama=0.4.6=pyhd8ed1ab_0
   - coreutils=9.3=h0b41bf4_0
@@ -27,6 +27,7 @@ dependencies:
   - gettext=0.21.1=h27087fc_0
   - icu=72.1=hcb278e6_0
   - idna=3.4=pyhd8ed1ab_0
+  - k8=0.2.5=hdcf5f25_4
   - keyutils=1.6.1=h166bdaf_0
   - krb5=1.20.1=h81ceb04_0
   - ld_impl_linux-64=2.40=h41732ed_0
@@ -57,10 +58,11 @@ dependencies:
   - libuuid=2.38.1=h0b41bf4_0
   - libxml2=2.11.4=h0d562d8_0
   - libzlib=1.2.13=hd590300_5
+  - minimap2=2.26=he4a0461_2
   - mmseqs2=14.7e284=pl5321h6a68c12_2
   - ncurses=6.4=hcb278e6_0
   - numpy=1.25.0=py311h64a7726_0
-  - openssl=3.1.1=hd590300_1
+  - openssl=3.2.0=hd590300_1
   - pandas=2.0.2=py311h320fe9a_0
   - pathlib2=2.3.7.post1=py311h38be061_2
   - pcre=8.45=h9c3ff4c_0
diff --git a/install/environments/VEBA-mapping_env.yml b/install/environments/VEBA-mapping_env.yml
index 5af32f1..feb6918 100644
--- a/install/environments/VEBA-mapping_env.yml
+++ b/install/environments/VEBA-mapping_env.yml
@@ -1,106 +1,84 @@
-name: VEBA-mapping_env__v2023.7.25
+name: VEBA-mapping_env__v2023.11.17
 channels:
   - conda-forge
   - bioconda
   - jolespin
   - defaults
+  - qiime2
 dependencies:
   - _libgcc_mutex=0.1=conda_forge
-  - _openmp_mutex=4.5=1_gnu
-  - anndata=0.9.0=pyhd8ed1ab_0
-  - bbmap=38.95=h5c4e2a8_1
-  - biom-format=2.1.14=py39h72bdee0_2
-  - biopython=1.79=py39h3811e60_1
-  - bowtie2=2.5.1=py39h6fed5c7_2
-  - brotlipy=0.7.0=py39h3811e60_1003
+  - _openmp_mutex=4.5=2_gnu
+  - biopython=1.81=py310h2372a71_1
+  - bowtie2=2.5.2=py310ha0a81b8_0
+  - brotli-python=1.1.0=py310hc6cd4ac_1
   - bz2file=0.98=py_0
-  - bzip2=1.0.8=h7f98852_4
-  - c-ares=1.18.1=h7f98852_0
+  - bzip2=1.0.8=hd590300_5
+  - c-ares=1.21.0=hd590300_0
   - ca-certificates=2023.7.22=hbcca054_0
-  - cached-property=1.5.2=hd8ed1ab_1
-  - cached_property=1.5.2=pyha770c72_1
   - certifi=2023.7.22=pyhd8ed1ab_0
-  - cffi=1.15.0=py39h4bc2ebd_0
-  - charset-normalizer=2.0.12=pyhd8ed1ab_0
-  - click=8.1.3=unix_pyhd8ed1ab_2
-  - colorama=0.4.4=pyh9f0ad1d_0
-  - coreutils=9.3=h0b41bf4_0
-  - cryptography=41.0.2=py39hd4f0224_0
+  - charset-normalizer=3.3.2=pyhd8ed1ab_0
+  - colorama=0.4.6=pyhd8ed1ab_0
+  - coreutils=9.4=hd590300_0
   - genopype=2023.5.15=py_0
-  - h5py=3.7.0=nompi_py39h63b1161_100
-  - hdf5=1.12.1=nompi_h4df4325_104
-  - htslib=1.17=h81da01d_2
-  - icu=72.1=hcb278e6_0
-  - idna=3.3=pyhd8ed1ab_0
-  - importlib-metadata=6.3.0=pyha770c72_0
-  - importlib_metadata=6.3.0=hd8ed1ab_0
+  - htslib=1.18=h81da01d_0
+  - icu=73.2=h59595ed_0
+  - idna=3.4=pyhd8ed1ab_0
   - keyutils=1.6.1=h166bdaf_0
-  - krb5=1.21.1=h659d440_0
-  - ld_impl_linux-64=2.36.1=hea4e1c9_2
-  - libblas=3.9.0=13_linux64_openblas
-  - libcblas=3.9.0=13_linux64_openblas
-  - libcurl=8.2.0=hca28451_0
-  - libdeflate=1.18=h0b41bf4_0
+  - krb5=1.21.2=h659d440_0
+  - ld_impl_linux-64=2.40=h41732ed_0
+  - libblas=3.9.0=19_linux64_openblas
+  - libcblas=3.9.0=19_linux64_openblas
+  - libcurl=8.4.0=hca28451_0
+  - libdeflate=1.19=hd590300_0
   - libedit=3.1.20191231=he28a2e2_2
   - libev=4.33=h516909a_1
   - libffi=3.4.2=h7f98852_5
-  - libgcc-ng=12.2.0=h65d4601_19
-  - libgfortran-ng=11.2.0=h69a702a_12
-  - libgfortran5=11.2.0=h5c6108e_12
-  - libgomp=12.2.0=h65d4601_19
-  - libhwloc=2.9.1=nocuda_h7313eea_6
+  - libgcc-ng=13.2.0=h807b86a_3
+  - libgfortran-ng=13.2.0=h69a702a_3
+  - libgfortran5=13.2.0=ha4646dd_3
+  - libgomp=13.2.0=h807b86a_3
+  - libhwloc=2.9.3=default_h554bfaf_1009
   - libiconv=1.17=h166bdaf_0
-  - liblapack=3.9.0=13_linux64_openblas
-  - libnghttp2=1.52.0=h61bc06f_0
-  - libnsl=2.0.0=h7f98852_0
-  - libopenblas=0.3.18=pthreads_h8fe5266_0
-  - libsqlite=3.42.0=h2797004_0
+  - liblapack=3.9.0=19_linux64_openblas
+  - libnghttp2=1.58.0=h47da74e_0
+  - libnsl=2.0.1=hd590300_0
+  - libopenblas=0.3.24=pthreads_h413a1c8_0
+  - libsqlite=3.44.0=h2797004_0
   - libssh2=1.11.0=h0841786_0
-  - libstdcxx-ng=12.2.0=h46fd767_19
-  - libuuid=2.32.1=h7f98852_1000
-  - libxml2=2.11.4=h0d562d8_0
-  - libzlib=1.2.13=h166bdaf_4
-  - lz4-c=1.9.3=h9c3ff4c_1
-  - natsort=8.3.1=pyhd8ed1ab_0
-  - ncurses=6.3=h9c3ff4c_0
-  - numpy=1.24.2=py39h7360e5f_0
-  - openjdk=8.0.312=h7f98852_0
-  - openssl=3.1.1=hd590300_1
-  - packaging=23.0=pyhd8ed1ab_0
-  - pandas=1.4.1=py39hde0f152_0
-  - pathlib2=2.3.7.post1=py39hf3d152e_0
-  - pbzip2=1.1.13=0
-  - perl=5.32.1=2_h7f98852_perl5
-  - pip=22.0.3=pyhd8ed1ab_0
-  - pycparser=2.21=pyhd8ed1ab_0
-  - pyopenssl=23.2.0=pyhd8ed1ab_1
-  - pysocks=1.7.1=py39hf3d152e_4
-  - python=3.9.16=h2782a2a_0_cpython
+  - libstdcxx-ng=13.2.0=h7e041cc_3
+  - libuuid=2.38.1=h0b41bf4_0
+  - libxml2=2.11.5=h232c23b_1
+  - libzlib=1.2.13=hd590300_5
+  - ncurses=6.4=h59595ed_2
+  - numpy=1.26.0=py310hb13e2d6_0
+  - openssl=3.1.4=hd590300_0
+  - pandas=2.1.3=py310hcc13569_0
+  - pathlib2=2.3.7.post1=py310hff52083_3
+  - perl=5.32.1=4_hd590300_perl5
+  - pip=23.3.1=pyhd8ed1ab_0
+  - pysocks=1.7.1=pyha2e5f31_6
+  - python=3.10.13=hd12c33a_0_cpython
   - python-dateutil=2.8.2=pyhd8ed1ab_0
-  - python-tzdata=2021.5=pyhd8ed1ab_0
-  - python_abi=3.9=2_cp39
-  - pytz=2021.3=pyhd8ed1ab_0
-  - pytz-deprecation-shim=0.1.0.post0=py39hf3d152e_1
+  - python-tzdata=2023.3=pyhd8ed1ab_0
+  - python_abi=3.10=4_cp310
+  - pytz=2023.3.post1=pyhd8ed1ab_0
   - readline=8.2=h8228510_1
-  - requests=2.27.1=pyhd8ed1ab_0
-  - samtools=1.17=hd87286a_1
-  - scandir=1.10.0=py39h3811e60_4
-  - scipy=1.9.3=py39hddc5342_2
-  - setuptools=60.9.3=py39hf3d152e_0
+  - requests=2.31.0=pyhd8ed1ab_0
+  - salmon=0.8.1=0
+  - samtools=1.18=h50ea8bc_1
+  - scandir=1.10.0=py310h2372a71_7
+  - seqkit=2.6.0=h9ee0642_0
+  - setuptools=68.2.2=pyhd8ed1ab_0
   - six=1.16.0=pyh6c4a22f_0
   - soothsayer_utils=2022.6.24=py_0
-  - sqlite=3.37.0=h9cd32fc_0
-  - star=2.7.10a=h9ee0642_0
-  - subread=2.0.3=h7132678_1
-  - tbb=2021.9.0=hf52228f_0
-  - tk=8.6.12=h27826a3_0
-  - tqdm=4.62.3=pyhd8ed1ab_0
-  - typing_extensions=4.5.0=pyha770c72_0
-  - tzdata=2021e=he74cb21_0
-  - tzlocal=4.1=py39hf3d152e_1
-  - urllib3=1.26.8=pyhd8ed1ab_1
-  - wheel=0.37.1=pyhd8ed1ab_0
+  - subread=2.0.6=he4a0461_0
+  - tbb=2021.10.0=h00ab1b0_2
+  - tk=8.6.13=noxft_h4845f30_101
+  - tqdm=4.66.1=pyhd8ed1ab_0
+  - tzdata=2023c=h71feb2d_0
+  - tzlocal=5.2=py310hff52083_0
+  - urllib3=2.1.0=pyhd8ed1ab_0
+  - wheel=0.41.3=pyhd8ed1ab_0
   - xz=5.2.6=h166bdaf_0
-  - zipp=3.15.0=pyhd8ed1ab_0
-  - zlib=1.2.13=h166bdaf_4
-  - zstd=1.5.2=ha95c52a_0
+  - zlib=1.2.13=hd590300_5
+  - zstd=1.5.5=hfc55251_0
\ No newline at end of file
diff --git a/install/environments/VEBA-preprocess_env.yml b/install/environments/VEBA-preprocess_env.yml
index d2f59b2..a7f174b 100644
--- a/install/environments/VEBA-preprocess_env.yml
+++ b/install/environments/VEBA-preprocess_env.yml
@@ -1,4 +1,4 @@
-name: VEBA-preprocess_env__v2023.8.21
+name: VEBA-preprocess_env__v2023.12.12
 channels:
   - conda-forge
   - bioconda
@@ -7,46 +7,50 @@ channels:
 dependencies:
   - _libgcc_mutex=0.1=conda_forge
   - _openmp_mutex=4.5=2_gnu
-  - alsa-lib=1.2.8=h166bdaf_0
+  - alsa-lib=1.2.7.2=h166bdaf_0
   - argparse-manpage-birdtools=1.7.0=pyhd8ed1ab_0
-  - aria2=1.36.0=h8b6cd97_3
-  - arrow-cpp=10.0.1=h3e2b116_1_cpu
-  - aws-c-auth=0.6.21=h3cb7b9d_0
-  - aws-c-cal=0.5.20=hd3b2fe5_3
-  - aws-c-common=0.8.5=h166bdaf_0
-  - aws-c-compression=0.2.16=hf5f93bc_0
-  - aws-c-event-stream=0.2.15=h2c1f3d0_11
-  - aws-c-http=0.6.27=hb11a807_3
-  - aws-c-io=0.13.11=hf1b0a34_1
-  - aws-c-mqtt=0.7.13=h93e60df_9
-  - aws-c-s3=0.1.51=h1222a00_14
-  - aws-c-sdkutils=0.1.7=hf5f93bc_0
-  - aws-checksums=0.1.13=hf5f93bc_5
-  - aws-crt-cpp=0.18.16=hb1454fd_1
-  - aws-sdk-cpp=1.9.379=hdc6349a_5
+  - aria2=1.36.0=h1e4e653_3
+  - arrow-cpp=12.0.0=ha770c72_1_cpu
+  - aws-c-auth=0.6.26=h2c7c9e7_6
+  - aws-c-cal=0.5.26=h71eb795_0
+  - aws-c-common=0.8.17=hd590300_0
+  - aws-c-compression=0.2.16=h4f47f36_6
+  - aws-c-event-stream=0.2.20=h69ce273_6
+  - aws-c-http=0.7.7=h7b8353a_3
+  - aws-c-io=0.13.21=h2c99d58_4
+  - aws-c-mqtt=0.8.6=h3a1964a_15
+  - aws-c-s3=0.2.8=h0933b68_4
+  - aws-c-sdkutils=0.1.9=h4f47f36_1
+  - aws-checksums=0.1.14=h4f47f36_6
+  - aws-crt-cpp=0.19.9=h85076f6_5
+  - aws-sdk-cpp=1.10.57=hf40e4db_10
   - awscli=1.27.23=py39hf3d152e_0
   - bbmap=39.01=h5c4e2a8_0
+  - binutils_impl_linux-64=2.39=he00db2b_1
   - bird_tool_utils_python=0.4.1=pyhdfd78af_0
   - botocore=1.29.23=pyhd8ed1ab_0
   - bowtie2=2.5.1=py39h3321a2d_0
   - brotlipy=0.7.0=py39hb9d737c_1005
   - bz2file=0.98=py_0
   - bzip2=1.0.8=h7f98852_4
-  - c-ares=1.18.1=h7f98852_0
-  - ca-certificates=2023.7.22=hbcca054_0
+  - c-ares=1.22.1=hd590300_0
+  - ca-certificates=2023.11.17=hbcca054_0
   - cairo=1.16.0=ha61ee94_1014
-  - certifi=2023.7.22=pyhd8ed1ab_0
+  - certifi=2023.11.17=pyhd8ed1ab_0
   - cffi=1.15.1=py39he91dace_2
   - charset-normalizer=2.1.1=pyhd8ed1ab_0
+  - chopper=0.7.0=hdcf5f25_0
+  - clang=15.0.3=ha770c72_0
+  - clang-15=15.0.3=default_h2e3cab8_0
   - colorama=0.4.4=pyh9f0ad1d_0
   - coreutils=9.3=h0b41bf4_0
-  - cryptography=38.0.4=py39hd97740a_0
-  - curl=7.86.0=h7bff187_1
+  - cryptography=41.0.7=py39hd4f0224_0
+  - curl=8.4.0=hca28451_0
   - docutils=0.16=py39hf3d152e_3
   - expat=2.5.0=h27087fc_0
   - extern=0.4.1=py_0
   - fastp=0.23.4=h5f740d0_0
-  - fastq_preprocessor=2023.7.24=py_0
+  - fastq_preprocessor=2023.12.12=py_0
   - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
   - font-ttf-inconsolata=3.000=h77eed37_0
   - font-ttf-source-code-pro=2.038=h77eed37_0
@@ -55,6 +59,7 @@ dependencies:
   - fonts-conda-ecosystem=1=0
   - fonts-conda-forge=1=0
   - freetype=2.12.1=hca18f0e_1
+  - gcc_impl_linux-64=12.2.0=hcc96c02_19
   - genopype=2023.5.15=py_0
   - gettext=0.21.1=h27087fc_0
   - gflags=2.2.2=he1b5a44_1004
@@ -62,54 +67,62 @@ dependencies:
   - glog=0.6.0=h6f12383_0
   - graphite2=1.3.13=h58526e2_1001
   - harfbuzz=5.3.0=h418a68e_0
-  - hdf5=1.12.1=nompi_h2386368_104
-  - htslib=1.16=h6bc39ce_0
+  - hdf5=1.14.2=nompi_h4f84152_100
+  - htslib=1.18=h81da01d_0
   - icu=70.1=h27087fc_0
   - idna=3.4=pyhd8ed1ab_0
   - isa-l=2.30.0=ha770c72_4
   - jmespath=1.0.1=pyhd8ed1ab_0
   - jpeg=9e=h166bdaf_2
+  - k8=0.2.5=hdcf5f25_4
+  - kernel-headers_linux-64=2.6.32=he073ed8_16
   - keyutils=1.6.1=h166bdaf_0
   - kingfisher=0.1.0=pyh7cba7a3_1
-  - krb5=1.19.3=h3790be6_0
-  - lcms2=2.14=h6ed2654_0
+  - krb5=1.21.2=h659d440_0
+  - lcms2=2.12=hddcbb42_0
   - ld_impl_linux-64=2.39=hcc3a1bd_1
   - lerc=4.0.0=h27087fc_0
-  - libabseil=20220623.0=cxx17_h48a1fff_5
-  - libarrow=10.0.1=hcf5dfb8_1_cpu
+  - libabseil=20230125.0=cxx17_hcb278e6_1
+  - libaec=1.1.2=h59595ed_1
+  - libarrow=12.0.0=h1cdf7b0_1_cpu
   - libblas=3.9.0=16_linux64_openblas
   - libbrotlicommon=1.0.9=h166bdaf_8
   - libbrotlidec=1.0.9=h166bdaf_8
   - libbrotlienc=1.0.9=h166bdaf_8
   - libcblas=3.9.0=16_linux64_openblas
+  - libclang-cpp15=15.0.3=default_h2e3cab8_0
   - libcrc32c=1.1.2=h9c3ff4c_0
-  - libcups=2.3.3=h3e49a29_2
-  - libcurl=7.86.0=h7bff187_1
-  - libdeflate=1.13=h166bdaf_0
+  - libcups=2.3.3=h4637d8d_4
+  - libcurl=8.4.0=hca28451_0
+  - libdeflate=1.19=hd590300_0
   - libedit=3.1.20191231=he28a2e2_2
   - libev=4.33=h516909a_1
-  - libevent=2.1.10=h9b69904_4
+  - libevent=2.1.12=hf998b51_1
   - libffi=3.4.2=h7f98852_5
+  - libgcc-devel_linux-64=12.2.0=h3b97bd3_19
   - libgcc-ng=12.2.0=h65d4601_19
-  - libgfortran-ng=12.2.0=h69a702a_19
-  - libgfortran5=12.2.0=h337968e_19
+  - libgfortran-ng=13.2.0=h69a702a_0
+  - libgfortran5=13.2.0=ha4646dd_0
   - libglib=2.74.1=h606061b_1
   - libgomp=12.2.0=h65d4601_19
-  - libgoogle-cloud=2.5.0=hcb5eced_0
-  - libgrpc=1.49.1=h05bd8bd_1
+  - libgoogle-cloud=2.10.0=hac9eb74_0
+  - libgrpc=1.54.2=hcf146ea_0
   - libhwloc=2.8.0=h32351e8_1
   - libiconv=1.17=h166bdaf_0
   - liblapack=3.9.0=16_linux64_openblas
-  - libnghttp2=1.47.0=hdcd2b5c_1
+  - libllvm15=15.0.3=h503ea73_0
+  - libnghttp2=1.58.0=h47da74e_0
   - libnsl=2.0.0=h7f98852_0
+  - libnuma=2.0.16=h0b41bf4_1
   - libopenblas=0.3.21=pthreads_h78a6416_3
   - libpng=1.6.39=h753d276_0
-  - libprotobuf=3.21.10=h6239696_0
+  - libprotobuf=3.21.12=hfc55251_2
+  - libsanitizer=12.2.0=h46fd767_19
   - libsqlite=3.40.0=h753d276_0
-  - libssh2=1.10.0=haa6b8db_3
+  - libssh2=1.11.0=h0841786_0
   - libstdcxx-ng=12.2.0=h46fd767_19
-  - libthrift=0.16.0=h491838f_2
-  - libtiff=4.4.0=h0e0dad5_3
+  - libthrift=0.18.1=h8fd135c_2
+  - libtiff=4.2.0=hf544144_3
   - libutf8proc=2.8.0=h166bdaf_0
   - libuuid=2.32.1=h7f98852_1000
   - libwebp-base=1.2.4=h166bdaf_0
@@ -117,12 +130,14 @@ dependencies:
   - libxml2=2.9.14=h22db469_4
   - libzlib=1.2.13=h166bdaf_4
   - lz4-c=1.9.3=h9c3ff4c_1
+  - minimap2=2.26=he4a0461_2
   - ncbi-ngs-sdk=2.9.0=0
+  - ncbi-vdb=3.0.9=hdbdd923_0
   - ncurses=6.3=h27087fc_1
   - numpy=1.23.5=py39h3d75532_0
-  - openjdk=17.0.3=hafdced1_4
-  - openssl=1.1.1u=hd590300_0
-  - orc=1.8.0=h09e0d61_0
+  - openjdk=17.0.3=hea3dc9f_3
+  - openssl=3.2.0=hd590300_1
+  - orc=1.8.3=h2f23424_1
   - ossuuid=1.6.2=hf484d3e_1000
   - pandas=1.5.2=py39h4661b88_0
   - parquet-cpp=1.5.1=2
@@ -164,38 +179,41 @@ dependencies:
   - pip=22.3.1=pyhd8ed1ab_0
   - pixman=0.40.0=h36c2ea0_0
   - pthread-stubs=0.4=h36c2ea0_1001
-  - pyarrow=10.0.1=py39h33d4778_1_cpu
+  - pyarrow=12.0.0=py39he4327e9_1_cpu
   - pyasn1=0.4.8=py_0
   - pycparser=2.21=pyhd8ed1ab_0
-  - pyopenssl=22.1.0=pyhd8ed1ab_0
+  - pyopenssl=23.3.0=pyhd8ed1ab_0
   - pysocks=1.7.1=pyha2e5f31_6
-  - python=3.9.15=h47a2c10_0_cpython
+  - python=3.9.16=h2782a2a_0_cpython
   - python-dateutil=2.8.2=pyhd8ed1ab_0
   - python-tzdata=2022.7=pyhd8ed1ab_0
   - python_abi=3.9=3_cp39
   - pytz=2022.6=pyhd8ed1ab_0
   - pytz-deprecation-shim=0.1.0.post0=py39hf3d152e_3
   - pyyaml=5.4.1=py39hb9d737c_4
-  - re2=2022.06.01=h27087fc_1
+  - rdma-core=28.9=h59595ed_1
+  - re2=2023.02.02=hcb278e6_0
   - readline=8.1.2=h0f457ee_0
   - requests=2.28.1=pyhd8ed1ab_1
   - rsa=4.7.2=pyh44b312d_0
-  - s2n=1.3.28=h8d01263_0
+  - s2n=1.3.44=h06160fa_0
   - s3transfer=0.6.0=pyhd8ed1ab_0
   - samtools=1.16.1=h6899075_1
   - scandir=1.10.0=py39hb9d737c_6
   - seqkit=2.3.1=h9ee0642_0
   - setuptools=65.5.1=pyhd8ed1ab_0
   - six=1.16.0=pyh6c4a22f_0
-  - snappy=1.1.9=hbd366e4_2
+  - snappy=1.1.10=h9fff704_0
   - soothsayer_utils=2022.6.24=py_0
-  - sra-tools=3.0.0=pl5321hd0d85c6_1
+  - sra-tools=3.0.9=h9f5acd7_0
   - sracat=0.2=h9f5acd7_1
+  - sysroot_linux-64=2.12=he073ed8_16
   - tbb=2021.7.0=h924138e_1
   - tk=8.6.12=h27826a3_0
   - tqdm=4.64.1=pyhd8ed1ab_0
   - tzdata=2022g=h191b570_0
   - tzlocal=4.2=py39hf3d152e_2
+  - ucx=1.14.1=h64cca9d_5
   - urllib3=1.26.13=pyhd8ed1ab_0
   - wheel=0.38.4=pyhd8ed1ab_0
   - xorg-fixesproto=5.0=h7f98852_1002
@@ -218,4 +236,4 @@ dependencies:
   - xz=5.2.6=h166bdaf_0
   - yaml=0.2.5=h7f98852_2
   - zlib=1.2.13=h166bdaf_4
-  - zstd=1.5.2=h6239696_4
\ No newline at end of file
+  - zstd=1.5.5=hfc55251_0
\ No newline at end of file
diff --git a/install/environments/VEBA-profile_env.yml b/install/environments/VEBA-profile_env.yml
index bccbda2..f6f3fab 100644
--- a/install/environments/VEBA-profile_env.yml
+++ b/install/environments/VEBA-profile_env.yml
@@ -1,4 +1,4 @@
-name: VEBA-profile_env__v2023.10.16
+name: VEBA-profile_env__v2023.12.14
 channels:
   - conda-forge
   - bioconda
@@ -21,12 +21,12 @@ dependencies:
   - bz2file=0.98=py_0
   - bzip2=1.0.8=h7f98852_4
   - c-ares=1.20.1=hd590300_0
-  - ca-certificates=2023.7.22=hbcca054_0
+  - ca-certificates=2023.11.17=hbcca054_0
   - cached-property=1.5.2=hd8ed1ab_1
   - cached_property=1.5.2=pyha770c72_1
   - cairo=1.16.0=hb05425b_5
   - capnproto=0.9.1=ha19adfc_4
-  - certifi=2023.7.22=pyhd8ed1ab_0
+  - certifi=2023.11.17=pyhd8ed1ab_0
   - charset-normalizer=3.3.0=pyhd8ed1ab_0
   - click=8.1.7=unix_pyh707e725_0
   - cmseq=1.0.4=pyhb7b1952_0
@@ -119,7 +119,7 @@ dependencies:
   - numpy=1.26.0=py310hb13e2d6_0
   - openjdk=17.0.3=h4335b31_6
   - openjpeg=2.5.0=h488ebb8_3
-  - openssl=3.1.3=hd590300_0
+  - openssl=3.2.0=hd590300_1
   - ossuuid=1.6.2=hf484d3e_1000
   - packaging=23.2=pyhd8ed1ab_0
   - pandas=2.1.1=py310hcc13569_1
@@ -198,6 +198,7 @@ dependencies:
   - six=1.16.0=pyh6c4a22f_0
   - soothsayer_utils=2022.6.24=py_0
   - statsmodels=0.14.0=py310h1f7b6fc_2
+  - sylph=0.4.1=h4ac6f70_0
   - tbb=2021.7.0=h924138e_1
   - tk=8.6.13=h2797004_0
   - tqdm=4.66.1=pyhd8ed1ab_0
diff --git a/install/install_veba.sh b/install/install.sh
similarity index 57%
rename from install/install_veba.sh
rename to install/install.sh
index 8c8fa6d..ac81638 100644
--- a/install/install_veba.sh
+++ b/install/install.sh
@@ -1,12 +1,14 @@
 #!/bin/bash
-# __version__ = "2023.3.27"
+# __version__ = "2023.12.19"
 
 SCRIPT_PATH=$(realpath $0)
 PREFIX=$(echo $SCRIPT_PATH | python -c "import sys; print('/'.join(sys.stdin.read().split('/')[:-1]))")
-CONDA_BASE=$(conda run -n base bash -c "echo \${CONDA_PREFIX}")
+# CONDA_BASE=$(conda run -n base bash -c "echo \${CONDA_PREFIX}")
+CONDA_BASE=$(conda info --base)
 
 # Update permissions
 echo "Updating permissions for scripts in ${PREFIX}/../src"
+chmod 755 ${PREFIX}/../src/veba
 chmod 755 ${PREFIX}/../src/*.py
 chmod 755 ${PREFIX}/../src/scripts/*
 
@@ -15,12 +17,34 @@ conda install -c conda-forge mamba -y
 # conda update mamba -y  # Recommended
 
 # Environemnts
+# Main environment
+echo "Creating ${VEBA} main environment"
+
+ENV_NAME="VEBA"
+mamba create -y -n $ENV_NAME -c conda-forge -c bioconda -c jolespin seqkit genopype networkx biopython biom-format anndata || (echo "Error when creating main VEBA environment" ; exit 1) &> ${PREFIX}/environments/VEBA.log
+
+# Copy main executable
+echo -e "\t*Copying main VEBA executable into ${ENV_NAME} environment path"
+cp -r ${PREFIX}/../src/veba ${CONDA_BASE}/envs/${ENV_NAME}/bin/
+# Copy over files to environment bin/
+echo -e "\t*Copying VEBA modules into ${ENV_NAME} environment path"
+cp -r ${PREFIX}/../src/*.py ${CONDA_BASE}/envs/${ENV_NAME}/bin/
+echo -e "\t*Copying VEBA utility scripts into ${ENV_NAME} environment path"
+cp -r ${PREFIX}/../src/scripts/ ${CONDA_BASE}/envs/${ENV_NAME}/bin/
+# Symlink the utility scripts to bin/
+echo -e "\t*Symlinking VEBA utility scripts into ${ENV_NAME} environment path"
+ln -sf ${CONDA_BASE}/envs/${ENV_NAME}/bin/scripts/* ${CONDA_BASE}/envs/${ENV_NAME}/bin/
+
+# Version
+cp -rf ${PREFIX}/../VERSION ${CONDA_BASE}/envs/${ENV_NAME}/bin/VEBA_VERSION
+
+# Module environments
 for ENV_YAML in ${PREFIX}/environments/VEBA*.yml; do
     # Get environment name
     ENV_NAME=$(basename $ENV_YAML .yml)
 
     # Create conda environment
-    echo "Creating ${ENV_NAME} environment"
+    echo "Creating ${ENV_NAME} module environment"
     mamba env create -n $ENV_NAME -f $ENV_YAML || (echo "Error when creating VEBA environment: ${ENV_YAML}" ; exit 1) &> ${ENV_YAML}.log
 
     # Copy over files to environment bin/
@@ -32,6 +56,9 @@ for ENV_YAML in ${PREFIX}/environments/VEBA*.yml; do
     echo -e "\t*Symlinking VEBA utility scripts into ${ENV_NAME} environment path"
     ln -sf ${CONDA_BASE}/envs/${ENV_NAME}/bin/scripts/* ${CONDA_BASE}/envs/${ENV_NAME}/bin/
 
+    # Version
+    cp -rf ${PREFIX}/../VERSION ${CONDA_BASE}/envs/${ENV_NAME}/bin/VEBA_VERSION
+
     done
 
 echo -e " _    _ _______ ______  _______\n  \  /  |______ |_____] |_____|\n   \/   |______ |_____] |     |"
diff --git a/install/uninstall_veba.sh b/install/uninstall.sh
similarity index 100%
rename from install/uninstall_veba.sh
rename to install/uninstall.sh
diff --git a/install/update_environment_scripts.sh b/install/update_environment_scripts.sh
index 2c98bc7..59a98b0 100644
--- a/install/update_environment_scripts.sh
+++ b/install/update_environment_scripts.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# __version__ = "2023.01.05"
+# __version__ = "2023.12.18"
 
 # Usage: git clone https://github.com/jolespin/veba && update_environment_scripts.sh /path/to/veba_repository
 echo "-----------------------------------------------------------------------------------------------------"
@@ -17,13 +17,14 @@ if [ $# -eq 0 ]; then
     chmod 775 ${VEBA_REPOSITORY_DIRECTORY}/src/*
     chmod 775 ${VEBA_REPOSITORY_DIRECTORY}/src/scripts/*
 
-                 else
+    else
 
     VEBA_REPOSITORY_DIRECTORY=$1
 
 fi
 
-CONDA_BASE=$(conda run -n base bash -c "echo \${CONDA_PREFIX}")
+# CONDA_BASE=$(conda run -n base bash -c "echo \${CONDA_PREFIX}")
+CONDA_BASE=$(conda info --base)
 
 echo "-----------------------------------------------------------------------------------------------------"
 echo " * Source VEBA: ${VEBA_REPOSITORY_DIRECTORY}"
@@ -31,9 +32,10 @@ echo " * Destination VEBA environments CONDA_BASE: ${CONDA_BASE}"
 echo "-----------------------------------------------------------------------------------------------------"
 
 # Environemnts
-for ENV_PREFIX in ${CONDA_BASE}/envs/VEBA-*; do
+for ENV_PREFIX in ${CONDA_BASE}/envs/VEBA ${CONDA_BASE}/envs/VEBA-*; 
+do
     echo $ENV_PREFIX
     cp ${VEBA_REPOSITORY_DIRECTORY}/src/*.py ${ENV_PREFIX}/bin/
     cp -r ${VEBA_REPOSITORY_DIRECTORY}/src/scripts/ ${ENV_PREFIX}/bin/
     ln -sf ${ENV_PREFIX}/bin/scripts/* ${ENV_PREFIX}/bin/
-    done
+done
diff --git a/src/MODULE_RESOURCES b/src/MODULE_RESOURCES
deleted file mode 100644
index a30553e..0000000
--- a/src/MODULE_RESOURCES
+++ /dev/null
@@ -1,18 +0,0 @@
-Status	Environment	Module	Resources	Recommended Threads	Description
-Stable	VEBA-preprocess_env	preprocess.py	4GB-16GB	4	Fastq quality trimming, adapter removal, decontamination, and read statistics calculations
-Stable	VEBA-assembly_env	assembly.py	32GB-128GB+	16	Assemble reads, align reads to assembly, and count mapped reads
-Stable	VEBA-assembly_env	coverage.py	24GB	16	Align reads to (concatenated) reference and counts mapped reads
-Stable	VEBA-binning-prokaryotic_env	binning-prokaryotic.py	16GB	4	Iterative consensus binning for recovering prokaryotic genomes with lineage-specific quality assessment
-Stable	VEBA-binning-eukaryotic_env	binning-eukaryotic.py	128GB	4	Binning for recovering eukaryotic genomes with exon-aware gene modeling and lineage-specific quality assessment
-Stable	VEBA-binning-viral_env	binning-viral.py	16GB	4	Detection of viral genomes and quality assessment
-Stable	VEBA-classify_env	classify-prokaryotic.py	64GB	32	Taxonomic classification of prokaryotic genomes 
-Stable	VEBA-classify_env	classify-eukaryotic.py	32GB	1	Taxonomic classification of eukaryotic genomes
-Stable	VEBA-classify_env	classify-viral.py	16GB	4	Taxonomic classification of viral genomes
-Stable	VEBA-cluster_env	cluster.py	32GB+	32	Species-level clustering of genomes and lineage-specific orthogroup detection
-Stable	VEBA-annotate_env	annotate.py	64GB	32	Annotates translated gene calls against NR, Pfam, and KOFAM
-Stable	VEBA-phylogeny_env	phylogeny.py	16GB+	32	Constructs phylogenetic trees given a marker set
-Stable	VEBA-mapping_env	index.py	16GB	4	Builds local or global index for alignment to genomes
-Stable	VEBA-mapping_env	mapping.py	16GB	4	Aligns reads to local or global index of genomes
-Stable	VEBA-biosynthetic_env	biosynthetic.py	16GB	16	Identify biosynthetic gene clusters in prokaryotes and fungi
-Developmental	VEBA-assembly_env	assembly-sequential.py	32GB-128GB+	16	Assemble metagenomes sequentially
-Developmental	VEBA-amplicon_env	amplicon.py	96GB	16	Automated read trim position detection, DADA2 ASV detection, taxonomic classification, and file conversion
\ No newline at end of file
diff --git a/src/README.md b/src/README.md
index 574149f..790091e 100755
--- a/src/README.md
+++ b/src/README.md
@@ -3,25 +3,29 @@
 # Modules
 [![Schematic](../images/Schematic.png)](../images/Schematic.pdf)
 
-| Status        | Environment                  | Module                  | Resources   | Recommended Threads | Description                                                                                                     |
-|---------------|------------------------------|-------------------------|-------------|---------------------|-----------------------------------------------------------------------------------------------------------------|
-| Stable        | [VEBA-preprocess_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-preprocess_env.yml)          | [preprocess.py](https://github.com/jolespin/veba/tree/main/src#preprocesspy)           | 4GB-16GB    | 4                   | Fastq quality trimming, adapter removal, decontamination, and read statistics calculations                      |
-| Stable        | [VEBA-assembly_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-assembly_env.yml)            | [assembly.py](https://github.com/jolespin/veba/tree/main/src#assemblypy)             | 32GB-128GB+ | 4-16                  | Assemble reads, align reads to assembly, and count mapped reads                                                 |
-| Stable        | [VEBA-assembly_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-assembly_env.yml)            | [coverage.py](https://github.com/jolespin/veba/tree/main/src#coveragepy)             | 24GB        | 16                  | Align reads to (concatenated) reference and counts mapped reads                                                 |
-| Stable        | [VEBA-binning-prokaryotic_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-binning-prokaryotic_env.yml) | [binning-prokaryotic.py](https://github.com/jolespin/veba/tree/main/src#binning-prokaryoticpy)  | 16GB        | 4                   | Iterative consensus binning for recovering prokaryotic genomes with lineage-specific quality assessment         |
-| Stable        | [VEBA-binning-eukaryotic_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-binning-eukaryotic_env.yml)  | [binning-eukaryotic.py](https://github.com/jolespin/veba/tree/main/src#binning-eukaryoticpy)   | 128GB       | 4                   | Binning for recovering eukaryotic genomes with exon-aware gene modeling and lineage-specific quality assessment |
-| Stable        | [VEBA-binning-viral_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-binning-viral_env.yml)       | [binning-viral.py](https://github.com/jolespin/veba/tree/main/src#binning-viralpy)        | 16GB        | 4                   | Detection of viral genomes and quality assessment                                                               |
-| Stable        | [VEBA-classify_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-classify_env.yml)            | [classify-prokaryotic.py](https://github.com/jolespin/veba/tree/main/src#classify-prokaryoticpy) | 72GB        | 32                  | Taxonomic classification of prokaryotic genomes                                         |
-| Stable        | [VEBA-classify_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-classify_env.yml)            | [classify-eukaryotic.py](https://github.com/jolespin/veba/tree/main/src#classify-eukaryoticpy)  | 32GB        | 1                   | Taxonomic classification of eukaryotic genomes                                                                  |
-| Stable        | [VEBA-classify_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-classify_env.yml)            | [classify-viral.py](https://github.com/jolespin/veba/tree/main/src#classify-viralpy)       | 16GB        | 4                   | Taxonomic classification of viral genomes                                                  |
-| Stable        | [VEBA-cluster_env](https://github.com/jolespin/veba/blob/main/install/environments/[VEBA-cluster_env.yml)             | [cluster.py](https://github.com/jolespin/veba/tree/main/src#clusterpy)              | 32GB+       | 32                  | Species-level clustering of genomes and lineage-specific orthogroup detection                                   |
-| Stable        | [VEBA-annotate_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-annotate_env.yml)            | [annotate.py](https://github.com/jolespin/veba/tree/main/src#annotatepy)             | 64GB        | 32                  | Annotates translated gene calls against UniRef, MiBIG, VFDB, Pfam, AntiFam, and KOFAM                                                     |
-| Stable        | [VEBA-phylogeny_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-phylogeny_env.yml)           | [phylogeny.py](https://github.com/jolespin/veba/tree/main/src#phylogenypy)            | 16GB+       | 32                  | Constructs phylogenetic trees given a marker set                                                                |
-| Stable        | [VEBA-mapping_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-mapping_env.yml)             | [index.py](https://github.com/jolespin/veba/tree/main/src#indexpy)                | 16GB        | 4                   | Builds local or global index for alignment to genomes                                                           |
-| Stable        | [VEBA-mapping_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-mapping_env.yml)             | [mapping.py](https://github.com/jolespin/veba/tree/main/src#mappingpy)              | 16GB        | 4                   | Aligns reads to local or global index of genomes                                                                |
-| Stable | [VEBA-biosynthetic_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-biosynthetic_env.yml)        | [biosynthetic.py](https://github.com/jolespin/veba/tree/main/src#biosyntheticpy)         | 16GB        | 16                  | Identify biosynthetic gene clusters in prokaryotes and fungi                                                    |
-| Developmental | [VEBA-assembly_env](https://github.com/jolespin/veba/blob/main/install/environments/VEBA-assembly_env.yml)            | [assembly-sequential.py](https://github.com/jolespin/veba/tree/main/src#assembly-sequentialpy)  | 32GB-128GB+ | 16                  | Assemble metagenomes sequentially                                                                               |
-| Developmental | [VEBA-amplicon_env](https://github.com/jolespin/veba/blob/main/install/environments/devel/VEBA-amplicon_env.yml)            | [amplicon.py](https://github.com/jolespin/veba/tree/main/src#ampliconpy)             | 96GB        | 16                  | Automated read trim position detection, DADA2 ASV detection, taxonomic classification, and file conversion      |
+| Status        | Module               | Environment                  | Executable              | Resources   | Recommended Threads | Description                                                                                                       |
+|---------------|----------------------|------------------------------|-------------------------|-------------|---------------------|-------------------------------------------------------------------------------------------------------------------|
+| Stable        | preprocess           | VEBA-preprocess_env          | preprocess.py           | 4GB-16GB    | 4                   | Fastq quality trimming, adapter removal, decontamination, and read   statistics calculations (Short Reads)        |
+| Stable        | preproces-long       | VEBA-preprocess_env          | preproces-long.py       | 4GB-16GB    | 4                   | Fastq quality trimming, adapter removal, decontamination, and read   statistics calculations (Long Reads)         |
+| Stable        | assembly             | VEBA-assembly_env            | assembly.py             | 32GB-128GB+ | 16                  | Assemble short reads, align reads to assembly, and count mapped reads                                             |
+| Stable        | assembly-long        | VEBA-assembly_env            | assembly-long.py        | 32GB-128GB+ | 16                  | Assemble long reads, align reads to assembly, and count mapped reads                                              |
+| Stable        | coverage             | VEBA-assembly_env            | coverage.py             | 24GB        | 16                  | Align short reads to (concatenated) reference and counts mapped reads                                             |
+| Stable        | coverage-long        | VEBA-assembly_env            | coverage-long.py        | 24GB        | 16                  | Align long reads to (concatenated) reference and counts mapped reads                                              |
+| Stable        | binning-prokaryotic  | VEBA-binning-prokaryotic_env | binning-prokaryotic.py  | 16GB        | 4                   | Iterative consensus binning for recovering prokaryotic genomes with   lineage-specific quality assessment         |
+| Stable        | binning-eukaryotic   | VEBA-binning-eukaryotic_env  | binning-eukaryotic.py   | 128GB       | 4                   | Binning for recovering eukaryotic genomes with exon-aware gene modeling   and lineage-specific quality assessment |
+| Stable        | binning-viral        | VEBA-binning-viral_env       | binning-viral.py        | 16GB        | 4                   | Detection of viral genomes and quality assessment                                                                 |
+| Stable        | classify-prokaryotic | VEBA-classify_env            | classify-prokaryotic.py | 64GB        | 32                  | Taxonomic classification of prokaryotic genomes                                                                   |
+| Stable        | classify-eukaryotic  | VEBA-classify_env            | classify-eukaryotic.py  | 32GB        | 1                   | Taxonomic classification of eukaryotic genomes                                                                    |
+| Stable        | classify-viral       | VEBA-classify_env            | classify-viral.py       | 16GB        | 4                   | Taxonomic classification of viral genomes                                                                         |
+| Stable        | cluster              | VEBA-cluster_env             | cluster.py              | 32GB+       | 32                  | Species-level clustering of genomes and lineage-specific orthogroup   detection                                   |
+| Stable        | annotate             | VEBA-annotate_env            | annotate.py             | 64GB        | 32                  | Annotates translated gene calls against NR, Pfam, and KOFAM                                                       |
+| Stable        | phylogeny            | VEBA-phylogeny_env           | phylogeny.py            | 16GB+       | 32                  | Constructs phylogenetic trees given a marker set                                                                  |
+| Stable        | index                | VEBA-mapping_env             | index.py                | 16GB        | 4                   | Builds local or global index for alignment to genomes                                                             |
+| Stable        | mapping              | VEBA-mapping_env             | mapping.py              | 16GB        | 4                   | Aligns reads to local or global index of genomes                                                                  |
+| Stable        | biosynthetic         | VEBA-biosynthetic_env        | biosynthetic.py         | 16GB        | 16                  | Identify biosynthetic gene clusters in prokaryotes and fungi                                                      |
+| Stable        | profile-pathway      | VEBA-profile_env             | profile-pathway.py      | 16GB        | 4                   | Pathway profiling of de novo genomes                                                                              |
+| Deprecated    | assembly-sequential  | VEBA-assembly_env            | assembly-sequential.py  | 32GB-128GB+ | 16                  | Assemble metagenomes sequentially                                                                                 |
+| Developmental | amplicon             | VEBA-amplicon_env            | amplicon.py             | 96GB        | 16                  | Automated read trim position detection, DADA2 ASV detection, taxonomic   classification, and file conversion      |
 
 <p align="right"><a href="#readme-top">^__^</a></p>
 
diff --git a/src/amplicon.py b/src/amplicon.py
index c673ee7..f66abf9 100755
--- a/src/amplicon.py
+++ b/src/amplicon.py
@@ -14,7 +14,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 # Reads archive
 def get_reads_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -626,6 +626,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/annotate.py b/src/annotate.py
index c050e86..eda3cf5 100755
--- a/src/annotate.py
+++ b/src/annotate.py
@@ -15,7 +15,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.25"
+__version__ = "2023.11.30"
 
 def get_preprocess_cmd( input_filepaths, output_filepaths, output_directory, directories, opts, program):
     cmd = [
@@ -880,6 +880,7 @@ def main(args=None):
     print("VEBA Database:", opts.veba_database, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
 
diff --git a/src/assembly-long.py b/src/assembly-long.py
new file mode 100755
index 0000000..0c35cc2
--- /dev/null
+++ b/src/assembly-long.py
@@ -0,0 +1,627 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, glob
+from collections import OrderedDict, defaultdict
+
+import pandas as pd
+
+# Soothsayer Ecosystem
+from genopype import *
+from genopype import __version__ as genopype_version
+from soothsayer_utils import *
+
+pd.options.display.max_colwidth = 100
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.12.14"
+
+# Assembly
+def get_assembly_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
+    # Command
+    cmd = [
+        os.environ["flye"],
+        "--{} {}".format(opts.reads_type, input_filepaths[0]),
+        "-g {}".format(opts.estimated_assembly_size) if opts.estimated_assembly_size else "",
+        "-o {}".format(output_directory),
+        "-t {}".format(opts.n_jobs),
+        "--deterministic" if not opts.no_deterministic else "",
+        "--meta" if opts.program == "metaflye" else "",
+        opts.assembler_options,
+
+            # Get failed length cutoff fasta
+                "&&",
+
+            "mv",
+            os.path.join(output_directory, "assembly.fasta"),
+            os.path.join(output_directory, "assembly_original.fasta"),
+
+                "&&",
+
+            "cat",
+            os.path.join(output_directory, "assembly_original.fasta"),
+            "|",
+            os.environ["seqkit"],
+            "seq",
+            "-M {}".format(max(opts.minimum_contig_length - 1, 1)),
+            "|",
+            "gzip",
+            ">",
+            os.path.join(output_directory, "assembly_failed_length_cutoff.fasta.gz"),
+
+            # Filter out small scaffolds and add prefix if applicable
+                "&&",
+
+            "cat",
+            os.path.join(output_directory, "assembly_original.fasta"),
+            "|",
+            os.environ["seqkit"],
+            "seq",
+            "-m {}".format(opts.minimum_contig_length),
+            "|",
+            os.environ["seqkit"],
+            "replace",
+            "-r {}".format(opts.scaffold_prefix),
+            "-p '^'",
+            ">",
+            os.path.join(output_directory, "assembly.fasta"),
+
+                "&&",
+
+            "rm -rf",
+            os.path.join(output_directory, "assembly_original.fasta"),
+
+                "&&",
+                
+            os.environ["fasta_to_saf.py"],
+            "-i",
+            os.path.join(output_directory, "assembly.fasta"),
+            ">",
+            os.path.join(output_directory, "assembly.fasta.saf"),
+        ]
+
+
+
+    # files_to_remove = [ 
+    # ]
+
+    # for fn in files_to_remove:
+    #     cmd += [ 
+    #         "&&",
+    #         "rm -rf {}".format(os.path.join(output_directory, fn)),
+    #     ]
+    return cmd
+
+# Bowtie2
+def get_alignment_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+    cmd = [
+    # Clear temporary directory just in case
+    "rm -rf {}".format(os.path.join(directories["tmp"], "*")),
+    "&&",
+
+    # MiniMap2 Index
+    "(",
+    os.environ["minimap2"],
+    "-t {}".format(opts.n_jobs),
+    "-d {}".format(output_filepaths[0]), # Index
+    opts.minimap2_index_options,
+    input_filepaths[1], # Reference
+    ")",
+
+    "&&",
+
+    # MiniMap2
+    "(",
+    os.environ["minimap2"],
+    "-a",
+    "-t {}".format(opts.n_jobs),
+    "-x {}".format(opts.minimap2_preset),
+    opts.minimap2_options,
+    output_filepaths[0],
+    input_filepaths[0],
+
+
+
+    # Convert to sorted BAM
+    "|",
+
+    os.environ["samtools"],
+    "view",
+    "-b",
+    "-h",
+    "-F 4",
+
+    "|",
+
+    os.environ["samtools"],
+    "sort",
+    "--threads {}".format(opts.n_jobs),
+    "--reference {}".format(input_filepaths[1]),
+    "-T {}".format(os.path.join(directories["tmp"], "samtools_sort")),
+    ">",
+    output_filepaths[1],
+    ")",
+
+    "&&",
+
+    "(",
+    os.environ["samtools"],
+    "index",
+    "-@ {}".format(opts.n_jobs),
+    output_filepaths[1],
+    ")",
+    ]
+
+    return cmd
+
+
+# featureCounts
+def get_featurecounts_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+
+    # ORF-Level Counts
+    cmd = [
+    "mkdir -p {}".format(os.path.join(directories["tmp"], "featurecounts")),
+    "&&",
+    "(",
+        os.environ["featureCounts"],
+        # "-G {}".format(input_filepaths[0]),
+        "-a {}".format(input_filepaths[1]),
+        "-o {}".format(os.path.join(output_directory, "featurecounts.tsv")),
+        "-F SAF",
+        "-L",
+        "--tmpDir {}".format(os.path.join(directories["tmp"], "featurecounts")),
+        "-T {}".format(opts.n_jobs),
+        opts.featurecounts_options,
+        input_filepaths[2],
+    ")",
+        "&&",
+    "gzip -f {}".format(os.path.join(output_directory, "featurecounts.tsv")),
+        ]
+    return cmd
+
+# seqkit
+def get_seqkit_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+
+    # ORF-Level Counts
+    cmd = [
+
+        os.environ["seqkit"],
+        "stats",
+        "-a", 
+        "-j {}".format(opts.n_jobs),
+        "-T",
+        "-b",
+        os.path.join(directories[("intermediate","1__assembly")], "*.fasta"),
+        "|",
+        "gzip",
+        ">",
+        output_filepaths[0],
+        ]
+    return cmd
+
+# Symlink
+def get_symlink_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+    # Command
+    cmd = [
+    "DST={}; (for SRC in {}; do SRC=$(realpath --relative-to $DST $SRC); ln -sf $SRC $DST; done)".format(
+        output_directory,
+        " ".join(input_filepaths), 
+        )
+    ]
+    return cmd
+
+# ============
+# Run Pipeline
+# ============
+# Set environment variables
+def add_executables_to_environment(opts):
+    """
+    Adapted from Soothsayer: https://github.com/jolespin/soothsayer
+    """
+    accessory_scripts = {
+                "fasta_to_saf.py",
+                }
+
+    required_executables={
+                "flye",
+                "minimap2",
+                "samtools",
+                "featureCounts",
+                "seqkit",
+     } | accessory_scripts
+
+    if opts.path_config == "CONDA_PREFIX":
+        executables = dict()
+        for name in required_executables:
+            executables[name] = os.path.join(os.environ["CONDA_PREFIX"], "bin", name)
+    else:
+        if opts.path_config is None:
+            opts.path_config = os.path.join(opts.script_directory, "veba_config.tsv")
+        opts.path_config = format_path(opts.path_config)
+        assert os.path.exists(opts.path_config), "config file does not exist.  Have you created one in the following directory?\n{}\nIf not, either create one, check this filepath:{}, or give the path to a proper config file using --path_config".format(opts.script_directory, opts.path_config)
+        assert os.stat(opts.path_config).st_size > 1, "config file seems to be empty.  Please add 'name' and 'executable' columns for the following program names: {}".format(required_executables)
+        df_config = pd.read_csv(opts.path_config, sep="\t")
+        assert {"name", "executable"} <= set(df_config.columns), "config must have `name` and `executable` columns.  Please adjust file: {}".format(opts.path_config)
+        df_config = df_config.loc[:,["name", "executable"]].dropna(how="any", axis=0).applymap(str)
+        # Get executable paths
+        executables = OrderedDict(zip(df_config["name"], df_config["executable"]))
+        assert required_executables <= set(list(executables.keys())), "config must have the required executables for this run.  Please adjust file: {}\nIn particular, add info for the following: {}".format(opts.path_config, required_executables - set(list(executables.keys())))
+
+    # Display
+    for name in sorted(accessory_scripts):
+        executables[name] = "'{}'".format(os.path.join(opts.script_directory, "scripts", name)) # Can handle spaces in path
+
+    print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
+    for name, executable in executables.items():
+        if name in required_executables:
+            print(name, executable, sep = " --> ", file=sys.stdout)
+            os.environ[name] = executable.strip()
+    print("", file=sys.stdout)
+
+
+# Pipeline
+def create_pipeline(opts, directories, f_cmds):
+
+    # .................................................................
+    # Primordial
+    # .................................................................
+    # Commands file
+    pipeline = ExecutablePipeline(name=__program__, description=opts.name, f_cmds=f_cmds, checkpoint_directory=directories["checkpoints"], log_directory=directories["log"])
+
+    # ==========
+    # Assembly
+    # ==========
+    
+    step = 1
+
+    # Info
+    program = "assembly"
+    program_label = "{}__{}".format(step, program)
+    description = "Assembling long reads via {}".format(opts.program.capitalize())
+    
+    # Add to directories
+    output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
+
+
+    # i/o
+    input_filepaths = [opts.reads]
+    output_filenames = ["assembly.fasta", "assembly.fasta.saf"]
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_assembly_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+    # ==========
+    # Alignment
+    # ==========
+    
+    step = 2
+
+    # Info
+    program = "alignment"
+    program_label = "{}__{}".format(step, program)
+    description = "Aligning reads to assembly"
+
+    # Add to directories
+    output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
+
+
+    # i/o
+    input_filepaths = [
+        opts.reads,
+        os.path.join(directories[("intermediate", "1__assembly")], "assembly.fasta"),
+        ] 
+
+    output_filepaths = [
+        os.path.join(directories[("intermediate", "1__assembly")], "assembly.fasta.mmi"),
+        os.path.join(output_directory, "mapped.sorted.bam"),
+    ]
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_alignment_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+
+
+    # ==========
+    # featureCounts
+    # ==========
+    step = 3 
+
+    # Info
+    program = "featurecounts"
+    program_label = "{}__{}".format(step, program)
+    description = "Counting reads"
+
+    # Add to directories
+    output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
+
+    # i/o
+
+    input_filepaths = [ 
+        os.path.join(directories[("intermediate", "1__assembly")], "assembly.fasta"),
+        os.path.join(directories[("intermediate", "1__assembly")], "assembly.fasta.saf"),
+        os.path.join(directories[("intermediate", "2__alignment")], "mapped.sorted.bam"),
+    ]
+
+    output_filenames = ["featurecounts.tsv.gz"]
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_featurecounts_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+    # ==========
+    # stats
+    # ==========
+    
+    step = 4
+
+    # Info
+    program = "seqkit"
+    program_label = "{}__{}".format(step, program)
+    description = "Assembly statistics"
+
+    # Add to directories
+    output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
+
+
+    # i/o
+    input_filepaths = [
+                    os.path.join(directories[("intermediate", "1__assembly")], "*.fasta"),
+
+    ]
+
+    output_filenames = ["seqkit_stats.tsv.gz"]
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_seqkit_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+
+    # =============
+    # Symlink
+    # =============
+    step = 5
+
+    # Info
+    program = "symlink"
+    program_label = "{}__{}".format(step, program)
+    description = "Symlinking relevant output files"
+
+    # Add to directories
+    output_directory = directories["output"]
+
+    # i/o
+
+    input_filepaths = [ 
+        os.path.join(directories[("intermediate", "1__assembly")], "assembly.fasta"),
+        os.path.join(directories[("intermediate", "1__assembly")], "assembly.fasta.mmi"),
+        os.path.join(directories[("intermediate", "2__alignment")], "mapped.sorted.bam"),
+        os.path.join(directories[("intermediate", "2__alignment")], "mapped.sorted.bam.bai"),
+        os.path.join(directories[("intermediate", "3__featurecounts")], "featurecounts.tsv.gz"),
+        os.path.join(directories[("intermediate", "4__seqkit")], "seqkit_stats.tsv.gz"),
+    ]
+
+    output_filenames =  map(lambda fp: fp.split("/")[-1], input_filepaths)
+    output_filepaths = list(map(lambda fn:os.path.join(directories["output"], fn), output_filenames))
+
+    params = {
+    "input_filepaths":input_filepaths,
+    "output_filepaths":output_filepaths,
+    "output_directory":output_directory,
+    "opts":opts,
+    "directories":directories,
+    }
+
+    cmd = get_symlink_cmd(**params)
+    pipeline.add_step(
+            id=program_label,
+            description = description,
+            step=step,
+            cmd=cmd,
+            input_filepaths = input_filepaths,
+            output_filepaths = output_filepaths,
+            validate_inputs=True,
+            validate_outputs=True,
+            log_prefix=program_label,
+
+    )
+
+    return pipeline
+
+# Configure parameters
+def configure_parameters(opts, directories):
+    # os.environ[]
+
+    # Scaffold prefix
+    if opts.scaffold_prefix == "NONE":
+        opts.scaffold_prefix = ""
+    else:
+        if "NAME" in opts.scaffold_prefix:
+            opts.scaffold_prefix = opts.scaffold_prefix.replace("NAME", opts.name)
+        print("Using the following prefix for all {} scaffolds: {}".format(opts.program, opts.scaffold_prefix), file=sys.stdout)
+    
+    # Set environment variables
+    add_executables_to_environment(opts=opts)
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <reads.fq[.gz]> -n <name> -g <estimated_genome_size> -o <output_directory>".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser_io = parser.add_argument_group('Required I/O arguments')
+    parser_io.add_argument("-i","--reads", type=str, required=True, help = "path/to/reads.fq[.gz]")
+    parser_io.add_argument("-n", "--name", type=str, required=True, help="Name of sample")
+    parser_io.add_argument("-o","--project_directory", type=str, default="veba_output/assembly", help = "path/to/project_directory [Default: veba_output/assembly]")
+
+    # Utility
+    parser_utility = parser.add_argument_group('Utility arguments')
+    parser_utility.add_argument("--path_config", type=str,  default="CONDA_PREFIX", help="path/to/config.tsv [Default: CONDA_PREFIX]")  #site-packges in future
+    parser_utility.add_argument("-p", "--n_jobs", type=int, default=1, help = "Number of threads [Default: 1]")
+    parser_utility.add_argument("--random_state", type=int, default=0, help = "Random state [Default: 0]")
+    parser_utility.add_argument("--restart_from_checkpoint", type=str, default=None, help = "Restart from a particular checkpoint [Default: None]")
+    parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
+    parser_utility.add_argument("--tmpdir", type=str, help="Set temporary directory")  #site-packges in future
+
+    # Assembler
+    parser_assembler = parser.add_argument_group('Assembler arguments')
+    parser_assembler.add_argument("-P", "--program", type=str, default="flye", choices={"flye", "metaflye"}, help="Assembler |  {flye, metaflye}} [Default: 'flye']")
+    parser_assembler.add_argument("-s", "--scaffold_prefix", type=str, default="NAME__", help="Assembler |  Special options:  Use NAME to use --name.  Use NONE to not include a prefix. [Default: 'NAME__']")
+    parser_assembler.add_argument("-m", "--minimum_contig_length", type=int, default=1, help="Minimum contig length.  Should be lenient here because longer thresholds can be used for binning downstream. Recommended for metagenomes to use 1000 here. [Default: 1] ")
+    parser_assembler.add_argument("-t", "--reads_type", type=str, default="nano-hq", choices={"nano-hq", "nano-corr", "nano-raw", "pacbio-hifi", "pacbio-corr", "pacbio-raw"}, help="Reads type for (meta)flye.  {nano-hq, nano-corr, nano-raw, pacbio-hifi, pacbio-corr, pacbio-raw} [Default: nano-hq] ")
+    parser_assembler.add_argument("-g", "--estimated_assembly_size", type=str,  help="Estimated assembly size (e.g., 5m, 2.6g)")
+    parser_assembler.add_argument("--no_deterministic", action="store_true", help="Do not use deterministic mode.  This will result in a faster assembly since it will be threaded but can get different assemblies upon rerunning")
+    parser_assembler.add_argument("--assembler_options", type=str, default="", help="Assembler options for Flye-based programs (e.g. --arg 1 ) [Default: '']")
+
+    # Aligner
+    parser_aligner = parser.add_argument_group('MiniMap2 arguments')
+    parser_aligner.add_argument("--minimap2_preset", type=str, default="map-ont", help="MiniMap2 | MiniMap2 preset {map-pb, map-ont, map-hifi} [Default: map-ont]")
+    # parser_aligner.add_argument("--no_create_index", action="store_true", help="Do not create a MiniMap2 index")
+    parser_aligner.add_argument("--minimap2_index_options", type=str, default="", help="MiniMap2 | More options (e.g. --arg 1 ) [Default: '']\nhttps://github.com/lh3/minimap2")
+    parser_aligner.add_argument("--minimap2_options", type=str, default="", help="MiniMap2 | More options (e.g. --arg 1 ) [Default: '']\nhttps://github.com/lh3/minimap2")
+
+    # featureCounts
+    parser_featurecounts = parser.add_argument_group('featureCounts arguments')
+    parser_featurecounts.add_argument("--featurecounts_options", type=str, default="", help="featureCounts | More options (e.g. --arg 1 ) [Default: ''] | http://bioinf.wehi.edu.au/featureCounts/")
+
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    # Threads
+    if opts.n_jobs == -1:
+        from multiprocessing import cpu_count 
+        opts.n_jobs = cpu_count()
+    assert opts.n_jobs >= 1, "--n_jobs must be ≥ 1.  To select all available threads, use -1."
+
+
+    # Directories
+    directories = dict()
+    directories["project"] = create_directory(opts.project_directory)
+    directories["sample"] = create_directory(os.path.join(directories["project"], opts.name))
+    directories["output"] = create_directory(os.path.join(directories["sample"], "output"))
+    directories["log"] = create_directory(os.path.join(directories["sample"], "log"))
+    if not opts.tmpdir:
+        opts.tmpdir = os.path.join(directories["sample"], "tmp")
+    directories["tmp"] = create_directory(opts.tmpdir)
+    directories["checkpoints"] = create_directory(os.path.join(directories["sample"], "checkpoints"))
+    directories["intermediate"] = create_directory(os.path.join(directories["sample"], "intermediate"))
+    # os.environ["TMPDIR"] = directories["tmp"]
+
+    # Info
+    print(format_header(__program__, "="), file=sys.stdout)
+    print(format_header("Configuration:", "-"), file=sys.stdout)
+    print(format_header("Name: {}".format(opts.name), "."), file=sys.stdout)
+    print("Python version:", sys.version.replace("\n"," "), file=sys.stdout)
+    print("Python path:", sys.executable, file=sys.stdout) #sys.path[2]
+    print("GenoPype version:", genopype_version, file=sys.stdout) #sys.path[2]
+    print("Script version:", __version__, file=sys.stdout)
+    print("Moment:", get_timestamp(), file=sys.stdout)
+    print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
+    print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
+    configure_parameters(opts, directories)
+    sys.stdout.flush()
+
+    # Run pipeline
+    with open(os.path.join(directories["sample"], "commands.sh"), "w") as f_cmds:
+        pipeline = create_pipeline(
+                     opts=opts,
+                     directories=directories,
+                     f_cmds=f_cmds,
+        )
+        pipeline.compile()
+        pipeline.execute(restart_from_checkpoint=opts.restart_from_checkpoint)
+
+if __name__ == "__main__":
+    main()
diff --git a/src/assembly.py b/src/assembly.py
index 5156eff..32fc4fd 100755
--- a/src/assembly.py
+++ b/src/assembly.py
@@ -13,7 +13,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 # Assembly
 def get_assembly_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -683,8 +683,8 @@ def main(args=None):
     parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
     # Pipeline
     parser_io = parser.add_argument_group('Required I/O arguments')
-    parser_io.add_argument("-1","--forward_reads", type=str, help = "path/to/forward_reads.fq")
-    parser_io.add_argument("-2","--reverse_reads", type=str, help = "path/to/reverse_reads.fq")
+    parser_io.add_argument("-1","--forward_reads", type=str, help = "path/to/forward_reads.fq[.gz]")
+    parser_io.add_argument("-2","--reverse_reads", type=str, help = "path/to/reverse_reads.fq[.gz]")
     parser_io.add_argument("-n", "--name", type=str, help="Name of sample", required=True)
     parser_io.add_argument("-o","--project_directory", type=str, default="veba_output/assembly", help = "path/to/project_directory [Default: veba_output/assembly]")
 
@@ -758,6 +758,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/binning-eukaryotic.py b/src/binning-eukaryotic.py
index f8cfaf2..9fdc054 100755
--- a/src/binning-eukaryotic.py
+++ b/src/binning-eukaryotic.py
@@ -14,7 +14,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.12.2"
 
 # DATABASE_METAEUK="/usr/local/scratch/CORE/jespinoz/db/veba/v1.0/Classify/Eukaryotic/eukaryotic"
 
@@ -310,11 +310,13 @@ def get_eukaryotic_gene_modeling_cmd(input_filepaths, output_filepaths, output_d
 
     # Run Eukaryotic Gene Modeling
         "&&",
+
     os.environ["eukaryotic_gene_modeling_wrapper.py"],
     "--fasta {}".format(os.path.join(directories["tmp"], "scaffolds.binned.eukaryotic.fasta")),
     "--scaffolds_to_bins {}".format(input_filepaths[1]),
     "--tiara_results {}".format(input_filepaths[2]),
     "--metaeuk_database {}".format(opts.metaeuk_database),
+    "--metaeuk_split_memory_limit {}".format(opts.metaeuk_split_memory_limit),
     "-o {}".format(output_directory),
     "-p {}".format(opts.n_jobs),
 
@@ -1016,8 +1018,10 @@ def main(args=None):
 
     # MetaEuk
     parser_metaeuk = parser.add_argument_group('MetaEuk arguments')
+    parser_metaeuk.add_argument("-M", "--microeuk_database", type=str, choices={"MicroEuk100", "MicroEuk90", "MicroEuk50"}, default="MicroEuk50", help="MicroEuk database {MicroEuk100, MicroEuk90, MicroEuk50} [Default: MicroEuk50]")
     parser_metaeuk.add_argument("--metaeuk_sensitivity", type=float, default=4.0, help="MetaEuk | Sensitivity: 1.0 faster; 4.0 fast; 7.5 sensitive  [Default: 4.0]")
     parser_metaeuk.add_argument("--metaeuk_evalue", type=float, default=0.01, help="MetaEuk | List matches below this E-value (range 0.0-inf) [Default: 0.01]")
+    parser_metaeuk.add_argument("--metaeuk_split_memory_limit", type=str, default="36G", help="MetaEuk | Set max memory per split. E.g. 800B, 5K, 10M, 1G. Use 0 to use all available system memory. (Default value is experimental) [Default: 36G]")
     parser_metaeuk.add_argument("--metaeuk_options", type=str, default="", help="MetaEuk | More options (e.g. --arg 1 ) [Default: ''] https://github.com/soedinglab/metaeuk")
     # --split-memory-limit 70G: https://github.com/soedinglab/metaeuk/issues/59
 
@@ -1071,7 +1075,7 @@ def main(args=None):
     if opts.veba_database is None:
         assert "VEBA_DATABASE" in os.environ, "Please set the following environment variable 'export VEBA_DATABASE=/path/to/veba_database' or provide path to --veba_database"
         opts.veba_database = os.environ["VEBA_DATABASE"]
-    opts.metaeuk_database = os.path.join(opts.veba_database, "Classify", "Microeukaryotic", "microeukaryotic")
+    opts.metaeuk_database = os.path.join(opts.veba_database, "Classify", "MicroEuk", opts.microeuk_database)
 
 
     # Directories
@@ -1097,6 +1101,7 @@ def main(args=None):
     print("VEBA Database:", opts.veba_database, file=sys.stdout)    
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/binning-prokaryotic.py b/src/binning-prokaryotic.py
index 29f80c9..a52eb54 100755
--- a/src/binning-prokaryotic.py
+++ b/src/binning-prokaryotic.py
@@ -13,7 +13,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 # Assembly
 def get_coverage_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -1683,6 +1683,7 @@ def main(args=None):
     print("VEBA Database:", opts.veba_database, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/binning-viral.py b/src/binning-viral.py
index f109b01..55f299e 100755
--- a/src/binning-viral.py
+++ b/src/binning-viral.py
@@ -14,7 +14,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 # geNomad
 def get_genomad_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -953,6 +953,7 @@ def main(args=None):
     print("VEBA Database:", opts.veba_database, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/biosynthetic.py b/src/biosynthetic.py
index 5c1cb77..9996c68 100755
--- a/src/biosynthetic.py
+++ b/src/biosynthetic.py
@@ -13,7 +13,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.12.18"
 
 # antiSMASH
 def get_antismash_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -336,7 +336,7 @@ def get_mmseqs2_protein_cmd(input_filepaths, output_filepaths, output_directory,
 
             "&&",
 
-        os.environ["mmseqs2_wrapper.py"],
+        os.environ["clustering_wrapper.py"],
         "--fasta {}".format(os.path.join(directories["tmp"], "components.concatenated.faa")),
         "--output_directory {}".format(output_directory),
         "--no_singletons" if bool(opts.no_singletons) else "",
@@ -415,7 +415,7 @@ def get_mmseqs2_nucleotide_cmd(input_filepaths, output_filepaths, output_directo
 
             "&&",
 
-        os.environ["mmseqs2_wrapper.py"],
+        os.environ["clustering_wrapper.py"],
         "--fasta {}".format(os.path.join(directories["tmp"], "bgcs.concatenated.fasta")),
         "--output_directory {}".format(output_directory),
         "--no_singletons" if bool(opts.no_singletons) else "",
@@ -483,7 +483,7 @@ def add_executables_to_environment(opts):
         "concatenate_dataframes.py",
         "bgc_novelty_scorer.py",
         "compile_krona.py",
-        "mmseqs2_wrapper.py",
+        "clustering_wrapper.py",
         "compile_protein_cluster_prevalence_table.py",
         }
 
@@ -860,7 +860,7 @@ def main(args=None):
     # antiSMASH
     parser_antismash = parser.add_argument_group('antiSMASH arguments')
     parser_antismash.add_argument("-t", "--taxon", type=str, default="bacteria", help="Taxonomic classification of input sequence {bacteria,fungi} [Default: bacteria]")
-    parser_antismash.add_argument("--minimum_contig_length", type=int, default=1500, help="Minimum contig length.  [Default: 1500] ")
+    parser_antismash.add_argument("--minimum_contig_length", type=int, default=1, help="Minimum contig length.  [Default: 1] ")
     parser_antismash.add_argument("-d", "--antismash_database", type=str, default=os.path.join(site.getsitepackages()[0], "antismash", "databases"), help="antiSMASH | Database directory path [Default: {}]".format(os.path.join(site.getsitepackages()[0], "antismash", "databases")))
     parser_antismash.add_argument("-s", "--hmmdetection_strictness", type=str, default="relaxed", help="antiSMASH | Defines which level of strictness to use for HMM-based cluster detection {strict,relaxed,loose}  [Default: relaxed] ")
     parser_antismash.add_argument("--tta_threshold", type=float, default=0.65, help="antiSMASH | Lowest GC content to annotate TTA codons at [Default: 0.65]")
@@ -881,7 +881,7 @@ def main(args=None):
 
     # MMSEQS2
     parser_mmseqs2 = parser.add_argument_group('MMSEQS2 arguments')
-    parser_mmseqs2.add_argument("-a", "--algorithm", type=str, default="easy-cluster", help="MMSEQS2 | {easy-cluster, easy-linclust} [Default: easy-cluster]")
+    parser_mmseqs2.add_argument("-a", "--algorithm", type=str, default="mmseqs-cluster", choices={"mmseqs-cluster", "mmseqs-linclust"}, help="MMSEQS2 | {mmseqs-cluster, mmseqs-linclust} [Default: mmseqs-cluster]")
     parser_mmseqs2.add_argument("-f","--representative_output_format", type=str, default="fasta", help = "Format of output for representative sequences: {table, fasta} [Default: fasta]") # Should fasta be the new default?
 
 
@@ -943,6 +943,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/classify-eukaryotic.py b/src/classify-eukaryotic.py
index 216c26c..a9bb93d 100755
--- a/src/classify-eukaryotic.py
+++ b/src/classify-eukaryotic.py
@@ -14,7 +14,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 # Assembly
 def get_concatenate_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -160,7 +160,7 @@ def get_compile_cmd( input_filepaths, output_filepaths, output_directory, direct
 
     return cmd
 
-def get_consensus_genome_classification_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
+def get_consensus_genome_classification_ranked_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
 
     # Command
     cmd = [ 
@@ -172,7 +172,7 @@ def get_consensus_genome_classification_cmd( input_filepaths, output_filepaths,
         "|",
         "tail -n +2",
         "|",
-        os.environ["consensus_genome_classification.py"],
+        os.environ["consensus_genome_classification_ranked.py"],
         "--leniency {}".format(opts.leniency),
         "-o {}".format(output_filepaths[0]),
         "-r c__,o__,f__,g__,s__",
@@ -224,7 +224,7 @@ def get_consensus_cluster_classification_cmd( input_filepaths, output_filepaths,
         "-n id_genome_cluster",
         "-i 0",
         "|",
-        os.environ["consensus_genome_classification.py"],
+        os.environ["consensus_genome_classification_ranked.py"],
         "--leniency {}".format(opts.leniency),
         "-o {}".format(output_filepaths[0]),
         "-r c__,o__,f__,g__,s__",
@@ -252,7 +252,7 @@ def add_executables_to_environment(opts):
         "filter_hmmsearch_results.py",
         "subset_table.py",
         "compile_eukaryotic_classifications.py",
-        "consensus_genome_classification.py",
+        "consensus_genome_classification_ranked.py",
         "insert_column_to_table.py",
         "metaeuk_wrapper.py",
         "scaffolds_to_bins.py",
@@ -481,7 +481,7 @@ def create_pipeline(opts, directories, f_cmds):
     # ==========
     step += 1
 
-    program = "consensus_genome_classification"
+    program = "consensus_genome_classification_ranked"
     program_label = "{}__{}".format(step, program)
     # Add to directories
     output_directory = directories["output"]# = create_directory(os.path.join(directories["intermediate"], program_label))
@@ -504,7 +504,7 @@ def create_pipeline(opts, directories, f_cmds):
         "directories":directories,
     }
 
-    cmd = get_consensus_genome_classification_cmd(**params)
+    cmd = get_consensus_genome_classification_ranked_cmd(**params)
 
     pipeline.add_step(
                 id=program,
@@ -698,6 +698,7 @@ def main(args=None):
     print("VEBA Database:", opts.veba_database, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/classify-prokaryotic.py b/src/classify-prokaryotic.py
index b5abb15..e6f1d47 100755
--- a/src/classify-prokaryotic.py
+++ b/src/classify-prokaryotic.py
@@ -15,7 +15,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 # GTDB-Tk
 def get_gtdbtk_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -138,7 +138,7 @@ def get_consensus_cluster_classification_cmd( input_filepaths, output_filepaths,
         "-i {}".format(input_filepaths[0]),
         "-c {}".format(input_filepaths[1]),
         "|",
-        os.environ["consensus_genome_classification.py"],
+        os.environ["consensus_genome_classification_ranked.py"],
         "--leniency {}".format(opts.leniency),
         "-o {}".format(output_filepaths[0]),
         "-u 'Unclassified prokaryote'",
@@ -158,7 +158,7 @@ def add_executables_to_environment(opts):
         "compile_prokaryotic_genome_cluster_classification_scores_table.py",
         # "cut_table_by_column_labels.py",
         "concatenate_dataframes.py",
-        "consensus_genome_classification.py",
+        "consensus_genome_classification_ranked.py",
         # "insert_column_to_table.py",
         "compile_krona.py",
 
@@ -443,6 +443,7 @@ def main(args=None):
     print("VEBA Database:", opts.veba_database, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/classify-viral.py b/src/classify-viral.py
index ed0da0f..50cca6b 100755
--- a/src/classify-viral.py
+++ b/src/classify-viral.py
@@ -14,7 +14,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 
 def get_concatenate_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -359,6 +359,7 @@ def main(args=None):
     print("VEBA Database:", opts.veba_database, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/cluster.py b/src/cluster.py
index 320ef00..e13263f 100755
--- a/src/cluster.py
+++ b/src/cluster.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 from __future__ import print_function, division
-import sys, os, argparse, glob
+import sys, os, argparse, glob, warnings
 from collections import OrderedDict, defaultdict
 
 import pandas as pd
@@ -13,7 +13,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.24"
+__version__ = "2023.12.11"
 
 # Global clustering
 def get_global_clustering_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -26,18 +26,35 @@ def get_global_clustering_cmd( input_filepaths, output_filepaths, output_directo
         # "--no_singletons" if bool(opts.no_singletons) else "",
         "-p {}".format(opts.n_jobs),
 
+        "--genome_clustering_algorithm {}".format(opts.genome_clustering_algorithm),
         "--ani_threshold {}".format(opts.ani_threshold),
         "--genome_cluster_prefix {}".format(opts.genome_cluster_prefix) if bool(opts.genome_cluster_prefix) else "",
         "--genome_cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
         "--genome_cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill) if bool(opts.genome_cluster_prefix_zfill) else "",
+        "--skani_target_ani {}".format(opts.skani_target_ani),
+        "--skani_minimum_af {}".format(opts.skani_minimum_af),
+        "--skani_no_confidence_interval" if opts.skani_no_confidence_interval else "",
+
+        "--skani_nonviral_preset {}".format(opts.skani_nonviral_preset),
+        "--skani_nonviral_compression_factor {}".format(opts.skani_nonviral_compression_factor),
+        "--skani_nonviral_marker_kmer_compression_factor {}".format(opts.skani_nonviral_marker_kmer_compression_factor),
+        "--skani_nonviral_options {}".format(opts.skani_nonviral_options) if bool(opts.skani_nonviral_options) else "",
+
+        "--skani_viral_preset {}".format(opts.skani_viral_preset),
+        "--skani_viral_compression_factor {}".format(opts.skani_viral_compression_factor),
+        "--skani_viral_marker_kmer_compression_factor {}".format(opts.skani_viral_marker_kmer_compression_factor),
+        "--skani_viral_options {}".format(opts.skani_viral_options) if bool(opts.skani_viral_options) else "",
+
         "--fastani_options {}".format(opts.fastani_options) if bool(opts.fastani_options) else "",
-        "--algorithm {}".format(opts.algorithm),
+
+        "--protein_clustering_algorithm {}".format(opts.protein_clustering_algorithm),
         "--minimum_identity_threshold {}".format(opts.minimum_identity_threshold),
         "--minimum_coverage_threshold {}".format(opts.minimum_coverage_threshold),
         "--protein_cluster_prefix {}".format(opts.protein_cluster_prefix) if bool(opts.protein_cluster_prefix) else "",
         "--protein_cluster_suffix {}".format(opts.protein_cluster_suffix) if bool(opts.protein_cluster_suffix) else "",
         "--protein_cluster_prefix_zfill {}".format(opts.protein_cluster_prefix_zfill) if bool(opts.protein_cluster_prefix_zfill) else "",
         "--mmseqs2_options {}".format(opts.mmseqs2_options) if bool(opts.mmseqs2_options) else "",
+        "--diamond_options {}".format(opts.diamond_options) if bool(opts.diamond_options) else "",
         "--minimum_core_prevalence {}".format(opts.minimum_core_prevalence),
 
             "&&",
@@ -60,18 +77,36 @@ def get_local_clustering_cmd( input_filepaths, output_filepaths, output_director
         "-o {}".format(output_directory),
         # "--no_singletons" if bool(opts.no_singletons) else "",
         "-p {}".format(opts.n_jobs),
+
+        "--genome_clustering_algorithm {}".format(opts.genome_clustering_algorithm),
         "--ani_threshold {}".format(opts.ani_threshold),
         "--genome_cluster_prefix {}".format(opts.genome_cluster_prefix) if bool(opts.genome_cluster_prefix) else "",
         "--genome_cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
         "--genome_cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill) if bool(opts.genome_cluster_prefix_zfill) else "",
+        "--skani_target_ani {}".format(opts.skani_target_ani),
+        "--skani_minimum_af {}".format(opts.skani_minimum_af),
+        "--skani_no_confidence_interval" if opts.skani_no_confidence_interval else "",
+
+        "--skani_nonviral_preset {}".format(opts.skani_nonviral_preset),
+        "--skani_nonviral_compression_factor {}".format(opts.skani_nonviral_compression_factor),
+        "--skani_nonviral_marker_kmer_compression_factor {}".format(opts.skani_nonviral_marker_kmer_compression_factor),
+        "--skani_nonviral_options {}".format(opts.skani_nonviral_options) if bool(opts.skani_nonviral_options) else "",
+
+        "--skani_viral_preset {}".format(opts.skani_viral_preset),
+        "--skani_viral_compression_factor {}".format(opts.skani_viral_compression_factor),
+        "--skani_viral_marker_kmer_compression_factor {}".format(opts.skani_viral_marker_kmer_compression_factor),
+        "--skani_viral_options {}".format(opts.skani_viral_options) if bool(opts.skani_viral_options) else "",
+
         "--fastani_options {}".format(opts.fastani_options) if bool(opts.fastani_options) else "",
-        "--algorithm {}".format(opts.algorithm),
+
+        "--protein_clustering_algorithm {}".format(opts.protein_clustering_algorithm),
         "--minimum_identity_threshold {}".format(opts.minimum_identity_threshold),
         "--minimum_coverage_threshold {}".format(opts.minimum_coverage_threshold),
         "--protein_cluster_prefix {}".format(opts.protein_cluster_prefix) if bool(opts.protein_cluster_prefix) else "",
         "--protein_cluster_suffix {}".format(opts.protein_cluster_suffix) if bool(opts.protein_cluster_suffix) else "",
         "--protein_cluster_prefix_zfill {}".format(opts.protein_cluster_prefix_zfill) if bool(opts.protein_cluster_prefix_zfill) else "",
         "--mmseqs2_options {}".format(opts.mmseqs2_options) if bool(opts.mmseqs2_options) else "",
+        "--diamond_options {}".format(opts.diamond_options) if bool(opts.diamond_options) else "",
         "--minimum_core_prevalence {}".format(opts.minimum_core_prevalence),
 
             "&&",
@@ -107,8 +142,10 @@ def add_executables_to_environment(opts):
 
     required_executables={
                 # 1
+                "skani",
                 "fastANI",
                 "mmseqs",
+                "diamond",
      } | accessory_scripts
 
     if opts.path_config == "CONDA_PREFIX":
@@ -142,6 +179,21 @@ def add_executables_to_environment(opts):
 # Pipeline
 def create_pipeline(opts, directories, f_cmds):
 
+
+    # Genome clustering algorithm
+    GENOME_CLUSTERING_ALGORITHM = opts.genome_clustering_algorithm.lower()
+    if GENOME_CLUSTERING_ALGORITHM == "fastani":
+        GENOME_CLUSTERING_ALGORITHM = "FastANI"
+    if GENOME_CLUSTERING_ALGORITHM == "skani":
+        GENOME_CLUSTERING_ALGORITHM = "skani"
+
+    # Protein clustering algorithm
+    PROTEIN_CLUSTERING_ALGORITHM = opts.protein_clustering_algorithm.split("-")[0].lower()
+    if PROTEIN_CLUSTERING_ALGORITHM == "mmseqs":
+        PROTEIN_CLUSTERING_ALGORITHM = PROTEIN_CLUSTERING_ALGORITHM.upper()
+    if PROTEIN_CLUSTERING_ALGORITHM == "diamond":
+        PROTEIN_CLUSTERING_ALGORITHM = PROTEIN_CLUSTERING_ALGORITHM.capitalize()
+
     # .................................................................
     # Primordial
     # .................................................................
@@ -159,7 +211,7 @@ def create_pipeline(opts, directories, f_cmds):
     output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
 
     # Info
-    description = "Global clustering of genomes (FastANI) and proteins (MMSEQS2)"
+    description = "Global clustering of genomes ({}) and proteins ({})".format(GENOME_CLUSTERING_ALGORITHM, PROTEIN_CLUSTERING_ALGORITHM)
 
     # i/o
     input_filepaths = [opts.genomes_table]
@@ -206,7 +258,7 @@ def create_pipeline(opts, directories, f_cmds):
         output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
 
         # Info
-        description = "Local clustering of genomes (FastANI) and proteins (MMSEQS2)"
+        description = "Local clustering of genomes ({}) and proteins ({})".format(GENOME_CLUSTERING_ALGORITHM, PROTEIN_CLUSTERING_ALGORITHM)
 
         # i/o
         input_filepaths = [opts.genomes_table]
@@ -245,8 +297,20 @@ def create_pipeline(opts, directories, f_cmds):
 
 # Configure parameters
 def configure_parameters(opts, directories):
-    assert_acceptable_arguments(opts.algorithm, {"easy-cluster", "easy-linclust"})
+    
+    assert_acceptable_arguments(opts.protein_clustering_algorithm, {"easy-cluster", "easy-linclust", "mmseqs-cluster", "mmseqs-linclust", "diamond-cluster", "diamond-linclust"})
+    if opts.protein_clustering_algorithm in {"easy-cluster", "easy-linclust"}:
+        d = {"easy-cluster":"mmseqs-cluster", "easy-linclust":"mmseqs-linclust"}
+        warnings.warn("\n\nPlease use `{}` instead of `{}` for MMSEQS2 clustering.".format(d[opts.protein_clustering_algorithm], opts.protein_clustering_algorithm))
+        opts.protein_clustering_algorithm = d[opts.protein_clustering_algorithm]
 
+    if opts.skani_nonviral_preset.lower() == "none":
+        opts.skani_nonviral_preset = None
+
+    if opts.skani_viral_preset.lower() == "none":
+        opts.skani_viral_preset = None
+
+    assert 0 < opts.minimum_core_prevalence <= 1.0, "--minimum_core_prevalence must be a float between (0.0,1.0])"
     # Set environment variables
     add_executables_to_environment(opts=opts)
 
@@ -257,7 +321,7 @@ def main(args=None):
     # Path info
     description = """
     Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
-    usage = "{} -i <genomes_table.tsv> -o <output_directory> -A 95 -a easy-cluster".format(__program__)
+    usage = "{} -i <genomes_table.tsv> -o <output_directory> -A 95 -a mmseqs-cluster".format(__program__)
     epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
 
     # Parser
@@ -276,24 +340,45 @@ def main(args=None):
     parser_utility.add_argument("--restart_from_checkpoint", type=str, default=None, help = "Restart from a particular checkpoint [Default: None]")
     parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
 
-    # FastANI
+    # ANI
+    parser_genome_clustering = parser.add_argument_group('Genome clustering arguments')
+    parser_genome_clustering.add_argument("-G", "--genome_clustering_algorithm", type=str,  choices={"fastani", "skani"}, default="skani", help="Program to use for ANI calculations.  `skani` is faster and more memory efficient. For v1.0.0 - v1.3.x behavior, use `fastani`. [Default: skani]")
+    parser_genome_clustering.add_argument("-A", "--ani_threshold", type=float, default=95.0, help="Species-level cluster (SLC) ANI threshold (Range (0.0, 100.0]) [Default: 95.0]")
+    parser_genome_clustering.add_argument("--genome_cluster_prefix", type=str, default="SLC-", help="Cluster prefix [Default: 'SLC-")
+    parser_genome_clustering.add_argument("--genome_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
+    parser_genome_clustering.add_argument("--genome_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
+
+    parser_skani = parser.add_argument_group('Skani triangle arguments')
+    parser_skani.add_argument("--skani_target_ani",  type=float, default=80, help="skani | If you set --skani_target_ani to --ani_threshold, you may screen out genomes ANI ≥ --ani_threshold [Default: 80]")
+    parser_skani.add_argument("--skani_minimum_af",  type=float, default=15, help="skani | Minimum aligned fraction greater than this value [Default: 15]")
+    parser_skani.add_argument("--skani_no_confidence_interval",  action="store_true", help="skani | Output [5,95] ANI confidence intervals using percentile bootstrap on the putative ANI distribution")
+    # parser_skani.add_argument("--skani_low_memory", action="store_true", help="Skani | More options (e.g. --arg 1 ) https://github.com/bluenote-1577/skani [Default: '']")
+
+    parser_skani = parser.add_argument_group('[Prokaryotic & Eukaryotic] Skani triangle arguments')
+    parser_skani.add_argument("--skani_nonviral_preset", type=str, default="medium", choices={"fast", "medium", "slow", "none"}, help="skani [Prokaryotic & Eukaryotic]| Use `none` if you are setting skani -c (compression factor) {fast, medium, slow, none} [Default: medium]")
+    parser_skani.add_argument("--skani_nonviral_compression_factor", type=int, default=125,  help="skani [Prokaryotic & Eukaryotic]|  Compression factor (k-mer subsampling rate).	[Default: 125]")
+    parser_skani.add_argument("--skani_nonviral_marker_kmer_compression_factor", type=int, default=1000,  help="skani [Prokaryotic & Eukaryotic] | Marker k-mer compression factor. Markers are used for filtering. [Default: 1000]")
+    parser_skani.add_argument("--skani_nonviral_options", type=str, default="", help="skani [Prokaryotic & Eukaryotic] | More options for `skani triangle` (e.g. --arg 1 ) [Default: '']")
+
+    parser_skani = parser.add_argument_group('[Viral] Skani triangle arguments')
+    parser_skani.add_argument("--skani_viral_preset", type=str, default="slow", choices={"fast", "medium", "slow", "none"}, help="skani | Use `none` if you are setting skani -c (compression factor) {fast, medium, slow, none} [Default: slow]")
+    parser_skani.add_argument("--skani_viral_compression_factor", type=int, default=30,  help="skani [Viral] | Compression factor (k-mer subsampling rate).	[Default: 30]")
+    parser_skani.add_argument("--skani_viral_marker_kmer_compression_factor", type=int, default=200,  help="skani [Viral] | Marker k-mer compression factor. Markers are used for filtering. Consider decreasing to ~200-300 if working with small genomes (e.g. plasmids or viruses). [Default: 200]")
+    parser_skani.add_argument("--skani_viral_options", type=str, default="", help="skani [Viral] | More options for `skani triangle` (e.g. --arg 1 ) [Default: '']")
+
     parser_fastani = parser.add_argument_group('FastANI arguments')
-    parser_fastani.add_argument("-A", "--ani_threshold", type=float, default=95.0, help="FastANI | Species-level cluster (SLC) ANI threshold (Range (0.0, 100.0]) [Default: 95.0]")
-    parser_fastani.add_argument("--genome_cluster_prefix", type=str, default="SLC-", help="Cluster prefix [Default: 'SLC-")
-    parser_fastani.add_argument("--genome_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
-    parser_fastani.add_argument("--genome_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
     parser_fastani.add_argument("--fastani_options", type=str, default="", help="FastANI | More options (e.g. --arg 1 ) [Default: '']")
 
-
-    # MMSEQS2
-    parser_mmseqs2 = parser.add_argument_group('MMSEQS2 arguments')
-    parser_mmseqs2.add_argument("-a", "--algorithm", type=str, default="easy-cluster", help="MMSEQS2 | {easy-cluster, easy-linclust} [Default: easy-cluster]")
-    parser_mmseqs2.add_argument("-t", "--minimum_identity_threshold", type=float, default=50.0, help="MMSEQS2 | SLC-Specific Protein Cluster (SSPC, previously referred to as SSO) percent identity threshold (Range (0.0, 100.0]) [Default: 50.0]")
-    parser_mmseqs2.add_argument("-c", "--minimum_coverage_threshold", type=float, default=0.8, help="MMSEQS2 | SSPC coverage threshold (Range (0.0, 1.0]) [Default: 0.8]")
-    parser_mmseqs2.add_argument("--protein_cluster_prefix", type=str, default="SSPC-", help="Cluster prefix [Default: 'SSPC-")
-    parser_mmseqs2.add_argument("--protein_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
-    parser_mmseqs2.add_argument("--protein_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
-    parser_mmseqs2.add_argument("--mmseqs2_options", type=str, default="", help="MMSEQS2 | More options (e.g. --arg 1 ) [Default: '']")
+    # Clustering
+    parser_protein_clustering = parser.add_argument_group('Protein clustering arguments')
+    parser_protein_clustering.add_argument("-P", "--protein_clustering_algorithm", type=str, choices={"mmseqs-cluster", "mmseqs-linclust", "diamond-cluster", "diamond-linclust"}, default="mmseqs-cluster", help="Clustering algorithm | Diamond can only be used for clustering proteins {mmseqs-cluster, mmseqs-linclust, diamond-cluster, mmseqs-linclust} [Default: mmseqs-cluster]")
+    parser_protein_clustering.add_argument("-t", "--minimum_identity_threshold", type=float, default=50.0, help="Clustering | Percent identity threshold (Range (0.0, 100.0]) [Default: 50.0]")
+    parser_protein_clustering.add_argument("-c", "--minimum_coverage_threshold", type=float, default=0.8, help="Clustering | Coverage threshold (Range (0.0, 1.0]) [Default: 0.8]")
+    parser_protein_clustering.add_argument("--protein_cluster_prefix", type=str, default="SSPC-", help="Cluster prefix [Default: 'SSPC-")
+    parser_protein_clustering.add_argument("--protein_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
+    parser_protein_clustering.add_argument("--protein_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
+    parser_protein_clustering.add_argument("--mmseqs2_options", type=str, default="", help="MMSEQS2 | More options (e.g. --arg 1 ) [Default: '']")
+    parser_protein_clustering.add_argument("--diamond_options", type=str, default="", help="Diamond | More options (e.g. --arg 1 ) [Default: '']")
 
     # Pangenome
     parser_pangenome = parser.add_argument_group('Pangenome arguments')
@@ -329,6 +414,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/coverage-long.py b/src/coverage-long.py
new file mode 100755
index 0000000..d282754
--- /dev/null
+++ b/src/coverage-long.py
@@ -0,0 +1,587 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, glob
+from collections import OrderedDict, defaultdict
+
+import pandas as pd
+
+# Soothsayer Ecosystem
+from genopype import *
+from genopype import __version__ as genopype_version
+from soothsayer_utils import *
+
+pd.options.display.max_colwidth = 100
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.12.18"
+
+# Assembly
+def get_index_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+    cmd = [
+        # Filtering out small contigs
+        "cat",
+        opts.fasta,
+        "|",
+        os.environ["seqkit"],
+        "seq", 
+        "-m {}".format(opts.minimum_contig_length),
+        "-j {}".format(opts.n_jobs),
+        opts.seqkit_seq_options,
+        ">",
+        output_filepaths[0],
+
+        # Create SAF file
+        "&&",
+        os.environ["fasta_to_saf.py"],
+        "-i {}".format(output_filepaths[0]),
+        ">",
+        output_filepaths[1],
+
+        "&&",
+
+        # Minimap2 Index
+        os.environ["minimap2"],
+        "-t {}".format(opts.n_jobs),
+        # "--seed {}".format(opts.random_state),
+        opts.minimap2_index_options,
+        "-d {}".format(output_filepaths[3]), # Index
+        output_filepaths[0], # Reference
+
+        # Get stats for reference
+        "&&",
+        os.environ["seqkit"],
+        "stats",
+        "-a", 
+        "-j {}".format(opts.n_jobs),
+        "-T",
+        "-b",
+        output_filepaths[0],
+        ">",
+        output_filepaths[2],
+    ]
+
+    return cmd
+
+
+# # Bowtie2
+# def get_alignment_gnuparallel_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+
+#     # Command
+#     cmd = [
+
+# # MAKE THIS A FOR LOOP WITH MAX THREADS FOR EACH ONE. THE REASON FOR THIS IS THAT IF THERE IS A SMALL SAMPLE IT WILL BE DONE QUICK BUT THE LARGER SAMPLES ARE GOING TO BE STUCK WITH ONE THREAD STILL
+# """
+#     # Clear temporary directory just in case
+
+# rm -rf %s
+
+# # Minimap2
+# %s --jobs %d -a %s -C "\t" "mkdir -p %s && %s -x %s -1 {2} -2 {3} --threads 1 --seed %d --no-unal %s | %s sort --threads 1 --reference %s -T %s > %s && %s index -@ 1 %s"
+
+# """%( 
+#     os.path.join(directories["tmp"], "*"),
+
+#     # Parallel
+#     os.environ["parallel"],
+#     opts.n_jobs,
+#     input_filepaths[0],
+
+#     # Make directory
+#     os.path.join(output_directory, "{1}"),
+
+#     # Bowtie2
+#     os.environ["minimap2"],
+#     input_filepaths[1],
+#     opts.random_state,
+#     opts.bowtie2_options,
+
+#     # Samtools sort
+#     os.environ["samtools"],
+#     input_filepaths[0],
+#     os.path.join(directories["tmp"], "samtools_sort_{1}"),
+#     os.path.join(output_directory, "{1}", "mapped.sorted.bam"),
+
+#     # Samtools index
+#     os.environ["samtools"],
+#     os.path.join(output_directory, "{1}", "mapped.sorted.bam"),
+
+#     ),
+
+
+#     ]
+
+#     return cmd
+
+def get_alignment_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+    cmd = [
+
+"""
+ # Clear temporary directory just in case
+rm -rf %s
+
+# Read lines
+READ_TABLE=%s
+
+while IFS= read -r LINE
+do echo $LINE
+    # Split fields
+    ID_SAMPLE=$(echo $LINE | cut -f1 -d " ")
+    READS=$(echo $LINE | cut -f2 -d " ")
+
+    # Create subdirectory
+    mkdir -p %s
+
+    OUTPUT_BAM="%s"
+
+    # Minimap2
+    if [[ -e "$OUTPUT_BAM" && -s "$OUTPUT_BAM" ]]; then
+        echo "[Skipping (Exists)] [Minimap2] [$ID_SAMPLE]"
+    else
+        echo "[Running] [Minimap2] [$ID_SAMPLE]"
+        %s -a -x %s -t %d %s %s $READS | %s view -h -b -F 4 | %s sort -@ %d --reference %s -T %s > $OUTPUT_BAM && %s index -@ %d $OUTPUT_BAM
+    fi
+done < $READ_TABLE
+
+"""%( 
+    # Clear temporary directory just in case
+    os.path.join(directories["tmp"], "*"),
+
+    # Read lines
+    input_filepaths[0],
+
+    # Make directory
+    os.path.join(output_directory, "${ID_SAMPLE}"),
+
+    # Output BAM
+    os.path.join(output_directory, "${ID_SAMPLE}", "mapped.sorted.bam"),
+
+
+    # Bowtie2
+    os.environ["minimap2"],
+    opts.minimap2_preset,
+    opts.n_jobs,
+    opts.minimap2_options,
+    input_filepaths[2],
+
+
+    # Samtools view
+    os.environ["samtools"],
+
+
+    # Samtools sort
+    os.environ["samtools"],
+    opts.n_jobs,
+    input_filepaths[1],
+    os.path.join(directories["tmp"], "samtools_sort_${ID_SAMPLE}"),
+    # os.path.join(output_directory, "${ID_SAMPLE}", "mapped.sorted.bam"),
+
+    # Samtools index
+    os.environ["samtools"],
+    opts.n_jobs,
+    # os.path.join(output_directory, "${ID_SAMPLE}", "mapped.sorted.bam"),
+    ),
+
+    ]
+
+    return cmd
+
+
+# featureCounts
+def get_featurecounts_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+
+    # ORF-Level Counts
+    cmd = [
+    "mkdir -p {}".format(os.path.join(directories["tmp"], "featurecounts")),
+    "&&",
+    "(",
+        os.environ["featureCounts"],
+        # "-G {}".format(input_filepaths[0]),
+        "-a {}".format(input_filepaths[0]),
+        "-o {}".format(os.path.join(output_directory, "featurecounts.tsv")),
+        "-F SAF",
+        "--tmpDir {}".format(os.path.join(directories["tmp"], "featurecounts")),
+        "-T {}".format(opts.n_jobs),
+        "-L",
+        opts.featurecounts_options,
+        *input_filepaths[1:],
+    ")",
+        "&&",
+    "gzip -f {}".format(os.path.join(output_directory, "featurecounts.tsv")),
+        ]
+    return cmd
+
+
+
+# Symlink
+def get_symlink_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+    # Command
+    cmd = [
+    "DST={}; (for SRC in {}; do SRC=$(realpath --relative-to $DST $SRC); ln -sf $SRC $DST; done)".format(
+        output_directory,
+        " ".join(input_filepaths), 
+        )
+    ]
+    return cmd
+
+# ============
+# Run Pipeline
+# ============
+# Set environment variables
+def add_executables_to_environment(opts):
+    """
+    Adapted from Soothsayer: https://github.com/jolespin/soothsayer
+    """
+    accessory_scripts = {
+                "fasta_to_saf.py"
+                }
+
+    required_executables={
+                "minimap2",
+                "samtools",
+                "featureCounts",
+                "seqkit",
+                # "parallel",
+     } | accessory_scripts
+
+    if opts.path_config == "CONDA_PREFIX":
+        executables = dict()
+        for name in required_executables:
+            executables[name] = os.path.join(os.environ["CONDA_PREFIX"], "bin", name)
+    else:
+        if opts.path_config is None:
+            opts.path_config = os.path.join(opts.script_directory, "veba_config.tsv")
+        opts.path_config = format_path(opts.path_config)
+        assert os.path.exists(opts.path_config), "config file does not exist.  Have you created one in the following directory?\n{}\nIf not, either create one, check this filepath:{}, or give the path to a proper config file using --path_config".format(opts.script_directory, opts.path_config)
+        assert os.stat(opts.path_config).st_size > 1, "config file seems to be empty.  Please add 'name' and 'executable' columns for the following program names: {}".format(required_executables)
+        df_config = pd.read_csv(opts.path_config, sep="\t")
+        assert {"name", "executable"} <= set(df_config.columns), "config must have `name` and `executable` columns.  Please adjust file: {}".format(opts.path_config)
+        df_config = df_config.loc[:,["name", "executable"]].dropna(how="any", axis=0).applymap(str)
+        # Get executable paths
+        executables = OrderedDict(zip(df_config["name"], df_config["executable"]))
+        assert required_executables <= set(list(executables.keys())), "config must have the required executables for this run.  Please adjust file: {}\nIn particular, add info for the following: {}".format(opts.path_config, required_executables - set(list(executables.keys())))
+
+    # Display
+    for name in sorted(accessory_scripts):
+        executables[name] = "'{}'".format(os.path.join(opts.script_directory, "scripts", name)) # Can handle spaces in path
+    print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
+    for name, executable in executables.items():
+        if name in required_executables:
+            print(name, executable, sep = " --> ", file=sys.stdout)
+            os.environ[name] = executable.strip()
+    print("", file=sys.stdout)
+
+# Pipeline
+def create_pipeline(opts, directories, f_cmds):
+
+    # .................................................................
+    # Primordial
+    # .................................................................
+    # Commands file
+    pipeline = ExecutablePipeline(name=__program__, description="Coverage", f_cmds=f_cmds, checkpoint_directory=directories["checkpoints"], log_directory=directories["log"])
+
+    # ==========
+    # Assembly
+    # ==========
+    
+    step = 1
+
+    # Info
+    program = "index"
+    program_label = "{}__{}".format(step, program)
+    description = "Preprocess fasta file and build Bowtie2 index"
+    
+    # Add to directories
+    output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
+
+
+    # i/o
+    input_filepaths = [opts.fasta]
+    output_filenames = ["reference.fasta", "reference.fasta.saf", "seqkit_stats.tsv", "reference.mmi"] 
+
+
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_index_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+    # ==========
+    # Alignment
+    # ==========
+    
+    step = 2
+
+    # Info
+    program = "alignment"
+    program_label = "{}__{}".format(step, program)
+    description = "Aligning reads to reference"
+
+    # Add to directories
+    output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
+
+
+    # i/o
+    input_filepaths = [
+            opts.reads,
+            os.path.join(directories[("intermediate", "1__index")], "reference.fasta"),
+            os.path.join(directories[("intermediate", "1__index")], "reference.mmi"),
+        ]
+
+
+
+    output_filenames = ["*/mapped.sorted.bam"]
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    # if not opts.one_task_per_cpu:
+    cmd = get_alignment_cmd(**params)
+    # else:
+    #     cmd = get_alignment_gnuparallel_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+    # ==========
+    # featureCounts
+    # ==========
+    step = 3 
+
+    # Info
+    program = "featurecounts"
+    program_label = "{}__{}".format(step, program)
+    description = "Counting reads"
+
+    # Add to directories
+    output_directory = directories[("intermediate",  program_label)] = create_directory(os.path.join(directories["intermediate"], program_label))
+
+    # i/o
+
+    input_filepaths = [ 
+        os.path.join(directories[("intermediate", "1__index")], "reference.fasta.saf"),
+        os.path.join(directories[("intermediate", "2__alignment")], "*", "mapped.sorted.bam"),
+    ]
+
+    output_filenames = ["featurecounts.tsv.gz"]
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_featurecounts_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+   
+
+
+    # =============
+    # Symlink
+    # =============
+    step = 4
+
+    # Info
+    program = "symlink"
+    program_label = "{}__{}".format(step, program)
+    description = "Symlinking relevant output files"
+
+    # Add to directories
+    output_directory = directories["output"]
+
+    # i/o
+
+    input_filepaths = [
+            os.path.join(directories[("intermediate", "1__index")], "reference.fasta"),
+            os.path.join(directories[("intermediate", "1__index")], "reference.fasta.saf"),
+            os.path.join(directories[("intermediate", "1__index")], "seqkit_stats.tsv"),
+            os.path.join(directories[("intermediate", "2__alignment")], "*"),
+            os.path.join(directories[("intermediate", "3__featurecounts")], "featurecounts.tsv.gz"),
+        ]
+
+    output_filenames =  map(lambda fp: fp.split("/")[-1], input_filepaths)
+    output_filepaths = list(map(lambda fn:os.path.join(directories["output"], fn), output_filenames))
+
+    params = {
+    "input_filepaths":input_filepaths,
+    "output_filepaths":output_filepaths,
+    "output_directory":output_directory,
+    "opts":opts,
+    "directories":directories,
+    }
+
+    cmd = get_symlink_cmd(**params)
+    pipeline.add_step(
+            id=program_label,
+            description = description,
+            step=step,
+            cmd=cmd,
+            input_filepaths = input_filepaths,
+            output_filepaths = output_filepaths,
+            validate_inputs=True,
+            validate_outputs=True,
+            log_prefix=program_label,
+
+    )
+
+    return pipeline
+
+# Configure parameters
+def configure_parameters(opts, directories):
+    # os.environ[]
+
+        # assert not bool(opts.unpaired_reads), "Cannot have --unpaired_reads if --forward_reads.  Note, this behavior may be changed in the future but it's an adaptation of interleaved reads."
+    df = pd.read_csv(opts.reads, sep="\t", header=None)
+    n, m = df.shape
+    assert m == 2, "--reads must be a 2 column table seperated by tabs and no header. Currently there are {} columns".format(m)
+    # Set environment variables
+    add_executables_to_environment(opts=opts)
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -f <reference.fasta> -r <reads.tsv> -o <output_directory>".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser_io = parser.add_argument_group('Required I/O arguments')
+    parser_io.add_argument("-f","--fasta", type=str, required=True, help = "path/to/reference.fasta. Recommended usage is for merging unbinned contigs. [Required]")
+    parser_io.add_argument("-r","--reads", type=str, required = True, help = "path/to/reads_table.tsv with the following format: [id_sample]<tab>[path/to/reads.fastq.gz], No header")
+    parser_io.add_argument("-o","--output_directory", type=str, default="veba_output/assembly/multisample", help = "path/to/project_directory [Default: veba_output/assembly/multisample]")
+
+    # Utility
+    parser_utility = parser.add_argument_group('Utility arguments')
+    parser_utility.add_argument("--path_config", type=str,  default="CONDA_PREFIX", help="path/to/config.tsv [Default: CONDA_PREFIX]")  #site-packges in future
+    parser_utility.add_argument("-p", "--n_jobs", type=int, default=1, help = "Number of threads [Default: 1]")
+    parser_utility.add_argument("--random_state", type=int, default=0, help = "Random state [Default: 0]")
+    parser_utility.add_argument("--restart_from_checkpoint", type=str, default=None, help = "Restart from a particular checkpoint [Default: None]")
+    parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
+    parser_utility.add_argument("--tmpdir", type=str, help="Set temporary directory")  #site-packges in future
+
+    # Aligner
+    parser_seqkit = parser.add_argument_group('SeqKit seq arguments')
+    parser_seqkit.add_argument("-m", "--minimum_contig_length", type=int, default=1, help="seqkit seq | Minimum contig length [Default: 1]")
+    parser_seqkit.add_argument("--seqkit_seq_options", type=str, default="", help="seqkit seq | More options (e.g. --arg 1 ) [Default: '']")
+
+
+    # Aligner
+    parser_aligner = parser.add_argument_group('Minmap2 arguments')
+    parser_aligner.add_argument("--minimap2_preset", type=str, default="map-ont", help="MiniMap2 | MiniMap2 preset {map-pb, map-ont, map-hifi} [Default: map-ont]")
+    parser_aligner.add_argument("--minimap2_index_options", type=str, default="", help="Minimap2 | More options (e.g. --arg 1 ) [Default: '']")
+    # parser_aligner.add_argument("--one_task_per_cpu", action="store_true", help="Use GNU parallel to run GNU parallel with 1 task per CPU.  Useful if all samples are roughly the same size but inefficient if depth varies.")
+    parser_aligner.add_argument("--minimap2_options", type=str, default="", help="Minimap2 | More options (e.g. --arg 1 ) [Default: '']")
+
+    # featureCounts
+    parser_featurecounts = parser.add_argument_group('featureCounts arguments')
+    parser_featurecounts.add_argument("--featurecounts_options", type=str, default="", help="featureCounts | More options (e.g. --arg 1 ) [Default: ''] | http://bioinf.wehi.edu.au/featureCounts/")
+
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    # Threads
+    if opts.n_jobs == -1:
+        from multiprocessing import cpu_count 
+        opts.n_jobs = cpu_count()
+    assert opts.n_jobs >= 1, "--n_jobs must be ≥ 1.  To select all available threads, use -1."
+
+    # Directories
+    directories = dict()
+    directories["project"] = create_directory(opts.output_directory)
+    directories["output"] = create_directory(os.path.join(directories["project"], "output"))
+    directories["log"] = create_directory(os.path.join(directories["project"], "log"))
+    if not opts.tmpdir:
+        opts.tmpdir = os.path.join(directories["project"], "tmp")
+    directories["tmp"] = create_directory(opts.tmpdir)
+    directories["checkpoints"] = create_directory(os.path.join(directories["project"], "checkpoints"))
+    directories["intermediate"] = create_directory(os.path.join(directories["project"], "intermediate"))
+    os.environ["TMPDIR"] = directories["tmp"]
+
+    # Info
+    print(format_header(__program__, "="), file=sys.stdout)
+    print(format_header("Configuration:", "-"), file=sys.stdout)
+    print("Python version:", sys.version.replace("\n"," "), file=sys.stdout)
+    print("Python path:", sys.executable, file=sys.stdout) #sys.path[2]
+    print("GenoPype version:", genopype_version, file=sys.stdout) #sys.path[2]
+    print("Script version:", __version__, file=sys.stdout)
+    print("Moment:", get_timestamp(), file=sys.stdout)
+    print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
+    print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
+    configure_parameters(opts, directories)
+    sys.stdout.flush()
+
+    # Run pipeline
+    with open(os.path.join(directories["project"], "commands.sh"), "w") as f_cmds:
+        pipeline = create_pipeline(
+                     opts=opts,
+                     directories=directories,
+                     f_cmds=f_cmds,
+        )
+        pipeline.compile()
+        pipeline.execute(restart_from_checkpoint=opts.restart_from_checkpoint)
+
+if __name__ == "__main__":
+    main()
diff --git a/src/coverage.py b/src/coverage.py
index 77c0131..b7b331f 100755
--- a/src/coverage.py
+++ b/src/coverage.py
@@ -13,7 +13,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 # .............................................................................
 # Notes
@@ -525,7 +525,7 @@ def main(args=None):
 
     # Aligner
     parser_seqkit = parser.add_argument_group('SeqKit seq arguments')
-    parser_seqkit.add_argument("-m", "--minimum_contig_length", type=int, default=1500, help="seqkit seq | Minimum contig length [Default: 1500]")
+    parser_seqkit.add_argument("-m", "--minimum_contig_length", type=int, default=1, help="seqkit seq | Minimum contig length [Default: 1]")
     parser_seqkit.add_argument("--seqkit_seq_options", type=str, default="", help="seqkit seq | More options (e.g. --arg 1 ) [Default: '']")
 
 
@@ -572,6 +572,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/deprecated/preprocess.py b/src/deprecated/preprocess.py
new file mode 100755
index 0000000..73adac7
--- /dev/null
+++ b/src/deprecated/preprocess.py
@@ -0,0 +1,151 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, glob
+from collections import OrderedDict
+
+import pandas as pd
+
+# Soothsayer Ecosystem
+from genopype import *
+from genopype import __version__ as genopype_version
+
+from soothsayer_utils import *
+import fastq_preprocessor
+
+
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.28"
+
+# ============
+# Run Pipeline
+# ============
+# Set environment variables
+def add_executables_to_environment(opts):
+    """
+    Adapted from Soothsayer: https://github.com/jolespin/soothsayer
+    """
+    accessory_scripts = set([])
+
+    required_executables={
+                "repair.sh",
+                "bbduk.sh",
+                "bowtie2",
+                "fastp",
+                "seqkit",
+                "fastq_preprocessor",
+                "minimap2",
+                "pigz",
+                "chopper",
+    } | accessory_scripts
+
+    if opts.path_config == "CONDA_PREFIX":
+        executables = dict()
+        for name in required_executables:
+            executables[name] = os.path.join(os.environ["CONDA_PREFIX"], "bin", name)
+    else:
+        opts.path_config = format_path(opts.path_config)
+        assert os.path.exists(opts.path_config), "config file does not exist.  Have you created one in the following directory?\n{}\nIf not, either create one, check this filepath:{}, or give the path to a proper config file using --path_config".format(opts.script_directory, opts.path_config)
+        assert os.stat(opts.path_config).st_size > 1, "config file seems to be empty.  Please add 'name' and 'executable' columns for the following program names: {}".format(required_executables)
+        df_config = pd.read_csv(opts.path_config, sep="\t")
+        assert {"name", "executable"} <= set(df_config.columns), "config must have `name` and `executable` columns.  Please adjust file: {}".format(opts.path_config)
+        df_config = df_config.loc[:,["name", "executable"]].dropna(how="any", axis=0).applymap(str)
+        # Get executable paths
+        executables = OrderedDict(zip(df_config["name"], df_config["executable"]))
+        assert required_executables <= set(list(executables.keys())), "config must have the required executables for this run.  Please adjust file: {}\nIn particular, add info for the following: {}".format(opts.path_config, required_executables - set(list(executables.keys())))
+
+    # Display
+    for name in sorted(accessory_scripts):
+        executables[name] = "'{}'".format(os.path.join(opts.script_directory, "scripts", name)) # Can handle spaces in path
+    print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
+    for name, executable in executables.items():
+        if name in required_executables:
+            print(name, executable, sep = " --> ", file=sys.stdout)
+            os.environ[name] = executable.strip()
+    print("", file=sys.stdout)
+
+
+# Configure parameters
+def configure_parameters(opts, directories):
+
+    assert opts.forward_reads != opts.reverse_reads, "You probably mislabeled the input files because `r1` should not be the same as `r2`: {}".format(opts.forward_reads)
+    assert_acceptable_arguments(opts.retain_trimmed_reads, {0,1})
+    assert_acceptable_arguments(opts.retain_decontaminated_reads, {0,1})
+
+    # Set environment variables
+    add_executables_to_environment(opts=opts)
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Wrapper around github.com/jolespin/fastq_preprocessor
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -1 <reads_1.fq> -2 <reads_2.fq> -n <name> -o <output_directory> |Optional| -x <reference_index> -k <kmer_database>".format(__program__)
+    epilog = "Copyright 2022 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser_io = parser.add_argument_group('Required I/O arguments')
+    parser_io.add_argument("-1","--forward_reads", type=str, help = "path/to/reads_1.fastq")
+    parser_io.add_argument("-2","--reverse_reads", type=str, help = "path/to/reads_2.fastq")
+    parser_io.add_argument("-n", "--name", type=str, help="Name of sample", required=True)
+    parser_io.add_argument("-o","--project_directory", type=str, default="veba_output/preprocess", help = "path/to/project_directory [Default: veba_output/preprocess]")
+
+    # Utility
+    parser_utility = parser.add_argument_group('Utility arguments')
+    parser_utility.add_argument("--path_config", type=str,  default="CONDA_PREFIX", help="path/to/config.tsv. Must have at least 2 columns [name, executable] [Default: CONDA_PREFIX]")  #site-packges in future
+    parser_utility.add_argument("-p", "--n_jobs", type=int, default=1, help = "Number of threads [Default: 1]")
+    parser_utility.add_argument("--random_state", type=int, default=0, help = "Random state [Default: 0]")
+    parser_utility.add_argument("--restart_from_checkpoint", type=int, help = "Restart from a particular checkpoint")
+    parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
+
+    # Fastp
+    parser_fastp = parser.add_argument_group('Fastp arguments')
+    parser_fastp.add_argument("-m", "--minimum_read_length", type=int, default=75, help="Fastp | Minimum read length [Default: 75]")
+    parser_fastp.add_argument("-a", "--adapters", type=str, default="detect", help="Fastp | path/to/adapters.fasta [Default: detect]")
+    parser_fastp.add_argument("--fastp_options", type=str, default="", help="Fastp | More options (e.g. --arg 1 ) [Default: '']")
+
+    # Bowtie
+    parser_bowtie2 = parser.add_argument_group('Bowtie2 arguments')
+    parser_bowtie2.add_argument("-x", "--contamination_index", type=str, help="Bowtie2 | path/to/contamination_index\n(e.g., Human T2T CHM13 v2 in $VEBA_DATABASE/Contamination/chm13v2.0/chm13v2.0)")
+    parser_bowtie2.add_argument("--retain_trimmed_reads", default=0, type=int, help = "Retain fastp trimmed fastq after decontamination. 0=No, 1=yes [Default: 0]") 
+    parser_bowtie2.add_argument("--retain_contaminated_reads", default=0, type=int, help = "Retain contaminated fastq after decontamination. 0=No, 1=yes [Default: 0]")
+    parser_bowtie2.add_argument("--bowtie2_options", type=str, default="", help="Bowtie2 | More options (e.g. --arg 1 ) [Default: '']\nhttp://bowtie-bio.sourceforge.net/bowtie2/manual.shtml")
+
+    # BBDuk
+    parser_bbduk = parser.add_argument_group('BBDuk arguments')
+    parser_bbduk.add_argument("-k","--kmer_database", type=str,  help="BBDuk | path/to/kmer_database\n(e.g., Ribokmers in $VEBA_DATABASE/Contamination/kmers/ribokmers.fa.gz)")
+    parser_bbduk.add_argument("--kmer_size", type=int, default=31, help="BBDuk | k-mer size [Default: 31]")
+    parser_bbduk.add_argument("--retain_kmer_hits", default=0, type=int, help = "Retain reads that map to k-mer database. 0=No, 1=yes [Default: 0]")
+    parser_bbduk.add_argument("--retain_non_kmer_hits", default=0, type=int, help = "Retain reads that do not map to k-mer database. 0=No, 1=yes [Default: 0]")
+    parser_bbduk.add_argument("--bbduk_options", type=str, default="", help="BBDuk | More options (e.g., --arg 1) [Default: '']")
+
+    # Options
+    opts = parser.parse_args()
+    # opts.script_directory  = script_directory
+    # opts.script_filename = script_filename
+
+    # Threads
+    if opts.n_jobs == -1:
+        from multiprocessing import cpu_count 
+        opts.n_jobs = cpu_count()
+    assert opts.n_jobs >= 1, "--n_jobs must be ≥ 1.  To select all available threads, use -1."
+
+    #Get arguments
+    args = list() 
+    for k,v in opts.__dict__.items():
+        if v is not None:
+            args += ["--{}".format(k), str(v)]
+    # args = flatten(map(lambda item: ("--{}".format(item[0]), item[1]), opts.__dict__.items()))
+    sys.argv = [sys.argv[0]] + args
+
+    # Wrapper
+    fastq_preprocessor.main(args)
+
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/index.py b/src/index.py
index 8f532d4..5c10154 100755
--- a/src/index.py
+++ b/src/index.py
@@ -7,7 +7,7 @@
 from soothsayer_utils import *
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.12.12"
 
 # ==============
 # Agostic commands
@@ -22,11 +22,22 @@ def get_concatenate_fasta_cmd( input_filepaths, output_filepaths, output_directo
         "-i {}".format(input_filepaths[0]),
         "-o {}".format(output_directory),
         "-m {}".format(opts.minimum_contig_length),
-        "-x {}".format("fa.gz"),
+        "-x {}".format("fa.gz" if opts.reference_gzipped else "fa"),
         "-b reference",
         "-M {}".format(opts.mode),
 
-
+        "&&",
+
+        "cat",
+        os.path.join(output_directory, "reference.fa.gz" if opts.reference_gzipped else "reference.fa"),
+        "|",
+        os.environ["seqkit"],
+        "fx2tab",
+        "-i",
+        "-s",
+        "-n",
+        ">",
+        os.path.join(output_directory, "reference.id_to_hash.tsv"),
     ]
     return cmd
 
@@ -51,22 +62,25 @@ def get_concatenate_gff_cmd( input_filepaths, output_filepaths, output_directory
 def get_bowtie2_local_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
     os.environ["TMPDIR"] = directories["tmp"]
     # Command 
+
     cmd = [
 """
-
+OUTPUT_DIRECTORY=%s
+FASTA_FILENAME=%s
 for ID_SAMPLE in $(cut -f1 %s); 
-    do %s --threads %d --seed %d %s/${ID_SAMPLE}/reference.fa.gz %s/${ID_SAMPLE}/reference.fa.gz
+    do %s --threads %d --seed %d ${OUTPUT_DIRECTORY}/${ID_SAMPLE}/${FASTA_FILENAME} ${OUTPUT_DIRECTORY}/${ID_SAMPLE}/${FASTA_FILENAME}
     done
 """%(
+    output_directory,
+    "reference.fa.gz" if opts.reference_gzipped else "reference.fa",
     opts.references,
     os.environ["bowtie2-build"],
     opts.n_jobs,
     opts.random_state,
-    output_directory,
-    output_directory,
     ),
     ]
 
+
     return cmd
 
 # ==============
@@ -115,10 +129,10 @@ def create_local_pipeline(opts, directories, f_cmds):
     ]
 
     output_filenames = [
-        "*/reference.fa.gz",
+        "*/reference.fa.gz" if opts.reference_gzipped else "*/reference.fa",
         "*/reference.saf",
-
-    ]
+        "*/reference.id_to_hash.tsv",
+        ]
     output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
 
     params = {
@@ -207,8 +221,9 @@ def create_local_pipeline(opts, directories, f_cmds):
     # Info
     description = "Build mapping index"
     # i/o
+    
     input_filepaths = list(
-        map(lambda id_sample: os.path.join(directories["output"], id_sample, "reference.fa.gz"), 
+        map(lambda id_sample: os.path.join(directories["output"], id_sample, "reference.fa.gz" if opts.reference_gzipped else "reference.fa"), 
         opts.samples,
         ),
     )
@@ -273,8 +288,10 @@ def create_global_pipeline(opts, directories, f_cmds):
     ]
 
     output_filenames = [
-        "reference.fa.gz",
+        "reference.fa.gz" if opts.reference_gzipped else "reference.fa",
         "reference.saf",
+        "reference.id_to_hash.tsv",
+
     ]
     output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
 
@@ -365,13 +382,22 @@ def create_global_pipeline(opts, directories, f_cmds):
     # Info
     description = "Build mapping index"
     # i/o
-    input_filepaths = [
-        os.path.join(directories["output"], "reference.fa.gz"),
-    ]
+    if opts.reference_gzipped:
+        input_filepaths = [
+            os.path.join(directories["output"], "reference.fa.gz"),
+        ]
+
+        output_filenames = [
+            "reference.fa.gz.*.bt2",
+        ]
+    else:
+        input_filepaths = [
+            os.path.join(directories["output"], "reference.fa"),
+        ]
 
-    output_filenames = [
-        "reference.fa.gz.*.bt2",
-    ]
+        output_filenames = [
+            "reference.fa.*.bt2",
+        ]
     output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
 
     params = {
@@ -417,7 +443,8 @@ def add_executables_to_environment(opts):
 
     
     required_executables = set([ 
-         "bowtie2-build",
+        "seqkit",
+        "bowtie2-build",
     ])| accessory_scripts
 
     if opts.path_config == "CONDA_PREFIX":
@@ -509,8 +536,9 @@ def main(args=None):
     parser_io.add_argument("-r","--references", type=str, required=True, help = "local mode: [id_sample]<tab>[path/to/reference.fa] and global mode: [path/to/reference.fa]")
     parser_io.add_argument("-g","--gene_models", type=str, required=True, help = "local mode: [id_sample]<tab>[path/to/reference.gff] and global mode: [path/to/reference.gff]")
     parser_io.add_argument("-o","--output_directory", type=str, default="veba_output/index", help = "path/to/project_directory [Default: veba_output/index]")
-    parser_io.add_argument("-m", "--minimum_contig_length", type=int, default=1500, help="Minimum contig length [Default: 1500]")
+    parser_io.add_argument("-m", "--minimum_contig_length", type=int, default=1, help="Minimum contig length [Default: 1]")
     parser_io.add_argument("-M", "--mode", type=str, default="infer", help="Concatenate all references with global and build index or build index for each reference {global, local, infer}")
+    parser_io.add_argument("-z", "--reference_gzipped",action="store_true", help="Gzip the reference to generate `reference.fa.gz` instead of `reference.fa`")
     # parser_io.add_argument("-c", "--copy_files", action="store_true", help="Copy files instead of symlinking. Only applies to global.")
 
     # Utility
@@ -559,6 +587,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/mapping.py b/src/mapping.py
index 8db61cc..b06175c 100755
--- a/src/mapping.py
+++ b/src/mapping.py
@@ -13,7 +13,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.12.12"
 
 
 # Bowtie2
@@ -451,9 +451,12 @@ def configure_parameters(opts, directories):
         assert os.path.isdir(opts.reference_index), "If --reference_saf is not provided, then --reference_index must be provided as a directory containing a file 'reference.saf'"
         opts.reference_saf = os.path.join(opts.reference_index, "reference.saf")
 
-    # Check if --reference_index is a directory, if it is then set reference.fa.gz as the directory
+    # Check if --reference_index is a directory, if it is then set reference.fa as the directory
     if os.path.isdir(opts.reference_index):
-        opts.reference_index = os.path.join(opts.reference_index, "reference.fa.gz")
+        if opts.reference_gzipped:
+            opts.reference_index = os.path.join(opts.reference_index, "reference.fa.gz")
+        else:
+            opts.reference_index = os.path.join(opts.reference_index, "reference.fa")
 
     # If --reference_fasta isn't provided then set it to the --reference_index
     if opts.reference_fasta is None:
@@ -491,10 +494,11 @@ def main(args=None):
     parser_io.add_argument("-o","--project_directory", type=str, default="veba_output/mapping", help = "path/to/project_directory [Default: veba_output/mapping]")
 
     parser_reference = parser.add_argument_group('Reference arguments')
-    parser_reference.add_argument("-x", "--reference_index",type=str, required=True, help="path/to/bowtie2_index. Either a file or directory. If directory, then it assumes the index is named `reference.fa.gz`")
+    parser_reference.add_argument("-x", "--reference_index",type=str, required=True, help="path/to/bowtie2_index. Either a file or directory. If directory, then it assumes the index is named `reference.fa`")
     parser_reference.add_argument("-r", "--reference_fasta", type=str, required=False, help = "path/to/reference.fasta. If not provided then it is set to the --reference_index" ) # ; or (2) a directory of fasta files [Must all have the same extension.  Use `query_ext` argument]
     parser_reference.add_argument("-a", "--reference_gff",type=str, required=False, help="path/to/reference.gff. If not provided then --reference_index must be a directory that contains the file: 'reference.gff'")
     parser_reference.add_argument("-s", "--reference_saf",type=str, required=False, help="path/to/reference.saf. If not provided then --reference_index must be a directory that contains the file: 'reference.saf'")
+    parser_reference.add_argument("-z", "--reference_gzipped",action="store_true", help="If --reference_index directory, then it assumes the index is named `reference.fa.gz` instead of `reference.fa`")
 
     # parser_io.add_argument("-S","--scaffold_identifier_mapping", type=str, required=False,  help = "path/to/scaffold_identifiers.tsv, Format: [id_scaffold]<tab>[id_mag]<tab>[id_cluster], No header")
     # parser_io.add_argument("-O","--orf_identifier_mapping", type=str, required=False,  help = "path/to/scaffold_identifiers.tsv, Format: [id_scaffold]<tab>[id_mag]<tab>[id_cluster], No header")
@@ -558,6 +562,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/phylogeny.py b/src/phylogeny.py
index 002cce6..0730b4e 100755
--- a/src/phylogeny.py
+++ b/src/phylogeny.py
@@ -14,7 +14,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.27"
+__version__ = "2023.11.30"
 
 # Assembly
 def preprocess( input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -650,6 +650,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/preprocess-long.py b/src/preprocess-long.py
new file mode 100755
index 0000000..fe1e58a
--- /dev/null
+++ b/src/preprocess-long.py
@@ -0,0 +1,21 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse
+from soothsayer_utils import format_header, read_script_as_module
+
+script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+
+try:
+    from fastq_preprocessor import fastq_preprocessor_long
+except ImportError:
+    fastq_preprocessor_long = read_script_as_module("fastq_preprocessor_long", os.path.join(script_directory, "fastq_preprocessor_long.py"))
+
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.29"
+
+if __name__ == "__main__":
+    print(format_header("VEBA Preprocessing Wrapper (fastq_preprocessor v{})".format(fastq_preprocessor_long.__version__)), file=sys.stderr)
+    label = "Mode: Long Nanopore and PacBio reads"
+    print(label, file=sys.stderr)
+    print(len(label)*"-", file=sys.stderr)
+    fastq_preprocessor_long.main(sys.argv[1:])
diff --git a/src/preprocess.py b/src/preprocess.py
index d28ccc2..146b03c 100755
--- a/src/preprocess.py
+++ b/src/preprocess.py
@@ -1,148 +1,21 @@
 #!/usr/bin/env python
 from __future__ import print_function, division
-import sys, os, argparse, glob
-from collections import OrderedDict
-
-import pandas as pd
-
-# Soothsayer Ecosystem
-from genopype import *
-from genopype import __version__ as genopype_version
-
-from soothsayer_utils import *
-import fastq_preprocessor
+import sys, os, argparse
+from soothsayer_utils import format_header, read_script_as_module
 
+script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
 
+try:
+    from fastq_preprocessor import fastq_preprocessor_short
+except ImportError:
+    fastq_preprocessor_short = read_script_as_module("fastq_preprocessor_short", os.path.join(script_directory, "fastq_preprocessor_short.py"))
+    
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
-
-# ============
-# Run Pipeline
-# ============
-# Set environment variables
-def add_executables_to_environment(opts):
-    """
-    Adapted from Soothsayer: https://github.com/jolespin/soothsayer
-    """
-    accessory_scripts = set([])
-
-    required_executables={
-                "repair.sh",
-                "bbduk.sh",
-                "bowtie2",
-                "fastp",
-                "seqkit",
-                "fastq_preprocessor",
-    } | accessory_scripts
-
-    if opts.path_config == "CONDA_PREFIX":
-        executables = dict()
-        for name in required_executables:
-            executables[name] = os.path.join(os.environ["CONDA_PREFIX"], "bin", name)
-    else:
-        opts.path_config = format_path(opts.path_config)
-        assert os.path.exists(opts.path_config), "config file does not exist.  Have you created one in the following directory?\n{}\nIf not, either create one, check this filepath:{}, or give the path to a proper config file using --path_config".format(opts.script_directory, opts.path_config)
-        assert os.stat(opts.path_config).st_size > 1, "config file seems to be empty.  Please add 'name' and 'executable' columns for the following program names: {}".format(required_executables)
-        df_config = pd.read_csv(opts.path_config, sep="\t")
-        assert {"name", "executable"} <= set(df_config.columns), "config must have `name` and `executable` columns.  Please adjust file: {}".format(opts.path_config)
-        df_config = df_config.loc[:,["name", "executable"]].dropna(how="any", axis=0).applymap(str)
-        # Get executable paths
-        executables = OrderedDict(zip(df_config["name"], df_config["executable"]))
-        assert required_executables <= set(list(executables.keys())), "config must have the required executables for this run.  Please adjust file: {}\nIn particular, add info for the following: {}".format(opts.path_config, required_executables - set(list(executables.keys())))
-
-    # Display
-    for name in sorted(accessory_scripts):
-        executables[name] = "'{}'".format(os.path.join(opts.script_directory, "scripts", name)) # Can handle spaces in path
-    print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
-    for name, executable in executables.items():
-        if name in required_executables:
-            print(name, executable, sep = " --> ", file=sys.stdout)
-            os.environ[name] = executable.strip()
-    print("", file=sys.stdout)
-
-
-# Configure parameters
-def configure_parameters(opts, directories):
-
-    assert opts.forward_reads != opts.reverse_reads, "You probably mislabeled the input files because `r1` should not be the same as `r2`: {}".format(opts.forward_reads)
-    assert_acceptable_arguments(opts.retain_trimmed_reads, {0,1})
-    assert_acceptable_arguments(opts.retain_decontaminated_reads, {0,1})
-
-    # Set environment variables
-    add_executables_to_environment(opts=opts)
-
-def main(args=None):
-    # Path info
-    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
-    script_filename = __program__
-    # Path info
-    description = """
-    Wrapper around github.com/jolespin/fastq_preprocessor
-    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
-    usage = "{} -1 <reads_1.fq> -2 <reads_2.fq> -n <name> -o <output_directory> |Optional| -x <reference_index> -k <kmer_database>".format(__program__)
-    epilog = "Copyright 2022 Josh L. Espinoza (jespinoz@jcvi.org)"
-
-    # Parser
-    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
-    # Pipeline
-    parser_io = parser.add_argument_group('Required I/O arguments')
-    parser_io.add_argument("-1","--forward_reads", type=str, help = "path/to/reads_1.fastq")
-    parser_io.add_argument("-2","--reverse_reads", type=str, help = "path/to/reads_2.fastq")
-    parser_io.add_argument("-n", "--name", type=str, help="Name of sample", required=True)
-    parser_io.add_argument("-o","--project_directory", type=str, default="veba_output/preprocess", help = "path/to/project_directory [Default: veba_output/preprocess]")
-
-    # Utility
-    parser_utility = parser.add_argument_group('Utility arguments')
-    parser_utility.add_argument("--path_config", type=str,  default="CONDA_PREFIX", help="path/to/config.tsv. Must have at least 2 columns [name, executable] [Default: CONDA_PREFIX]")  #site-packges in future
-    parser_utility.add_argument("-p", "--n_jobs", type=int, default=1, help = "Number of threads [Default: 1]")
-    parser_utility.add_argument("--random_state", type=int, default=0, help = "Random state [Default: 0]")
-    parser_utility.add_argument("--restart_from_checkpoint", type=int, help = "Restart from a particular checkpoint")
-    parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
-
-    # Fastp
-    parser_fastp = parser.add_argument_group('Fastp arguments')
-    parser_fastp.add_argument("-m", "--minimum_read_length", type=int, default=75, help="Fastp | Minimum read length [Default: 75]")
-    parser_fastp.add_argument("-a", "--adapters", type=str, default="detect", help="Fastp | path/to/adapters.fasta [Default: detect]")
-    parser_fastp.add_argument("--fastp_options", type=str, default="", help="Fastp | More options (e.g. --arg 1 ) [Default: '']")
-
-    # Bowtie
-    parser_bowtie2 = parser.add_argument_group('Bowtie2 arguments')
-    parser_bowtie2.add_argument("-x", "--contamination_index", type=str, help="Bowtie2 | path/to/contamination_index\n(e.g., Human T2T CHM13 v2 in $VEBA_DATABASE/Contamination/chm13v2.0/chm13v2.0)")
-    parser_bowtie2.add_argument("--retain_trimmed_reads", default=0, type=int, help = "Retain fastp trimmed fastq after decontamination. 0=No, 1=yes [Default: 0]") 
-    parser_bowtie2.add_argument("--retain_contaminated_reads", default=0, type=int, help = "Retain contaminated fastq after decontamination. 0=No, 1=yes [Default: 0]")
-    parser_bowtie2.add_argument("--bowtie2_options", type=str, default="", help="Bowtie2 | More options (e.g. --arg 1 ) [Default: '']\nhttp://bowtie-bio.sourceforge.net/bowtie2/manual.shtml")
-
-    # BBDuk
-    parser_bbduk = parser.add_argument_group('BBDuk arguments')
-    parser_bbduk.add_argument("-k","--kmer_database", type=str,  help="BBDuk | path/to/kmer_database\n(e.g., Ribokmers in $VEBA_DATABASE/Contamination/kmers/ribokmers.fa.gz)")
-    parser_bbduk.add_argument("--kmer_size", type=int, default=31, help="BBDuk | k-mer size [Default: 31]")
-    parser_bbduk.add_argument("--retain_kmer_hits", default=0, type=int, help = "Retain reads that map to k-mer database. 0=No, 1=yes [Default: 0]")
-    parser_bbduk.add_argument("--retain_non_kmer_hits", default=0, type=int, help = "Retain reads that do not map to k-mer database. 0=No, 1=yes [Default: 0]")
-    parser_bbduk.add_argument("--bbduk_options", type=str, default="", help="BBDuk | More options (e.g., --arg 1) [Default: '']")
-
-    # Options
-    opts = parser.parse_args()
-    # opts.script_directory  = script_directory
-    # opts.script_filename = script_filename
-
-    # Threads
-    if opts.n_jobs == -1:
-        from multiprocessing import cpu_count 
-        opts.n_jobs = cpu_count()
-    assert opts.n_jobs >= 1, "--n_jobs must be ≥ 1.  To select all available threads, use -1."
-
-    #Get arguments
-    args = list() 
-    for k,v in opts.__dict__.items():
-        if v is not None:
-            args += ["--{}".format(k), str(v)]
-    # args = flatten(map(lambda item: ("--{}".format(item[0]), item[1]), opts.__dict__.items()))
-    sys.argv = [sys.argv[0]] + args
-
-    # Wrapper
-    fastq_preprocessor.main(args)
-
-
+__version__ = "2023.11.29"
 
 if __name__ == "__main__":
-    main()
+    print(format_header("VEBA Preprocessing Wrapper (fastq_preprocessor v{})".format(fastq_preprocessor_short.__version__)), file=sys.stderr)
+    label = "Mode: Paired Illumina Reads"
+    print(label, file=sys.stderr)
+    print(len(label)*"-", file=sys.stderr)
+    fastq_preprocessor_short.main(sys.argv[1:])
\ No newline at end of file
diff --git a/src/profile-pathway.py b/src/profile-pathway.py
index 3f674f4..d84738a 100755
--- a/src/profile-pathway.py
+++ b/src/profile-pathway.py
@@ -13,7 +13,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.30"
 
 DIAMOND_DATABASE_SUFFIX = "_v201901b.dmnd"
 
@@ -625,6 +625,7 @@ def main(args=None):
     print("Script version:", __version__, file=sys.stdout)
     print("Moment:", get_timestamp(), file=sys.stdout)
     print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
     print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
     configure_parameters(opts, directories)
     sys.stdout.flush()
diff --git a/src/profile-taxonomy.py b/src/profile-taxonomy.py
new file mode 100755
index 0000000..2aa4db0
--- /dev/null
+++ b/src/profile-taxonomy.py
@@ -0,0 +1,357 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, glob, gzip
+from collections import OrderedDict, defaultdict
+
+import pandas as pd
+
+# Soothsayer Ecosystem
+from genopype import *
+from genopype import __version__ as genopype_version
+from soothsayer_utils import *
+
+pd.options.display.max_colwidth = 100
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.12.19"
+
+# Preprocess reads
+def get_sylph_sketch_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
+    cmd = [
+    os.environ["sylph"],
+    "sketch",
+    "-t {}".format(opts.n_jobs),
+    "-c {}".format(opts.sylph_sketch_subsampling_rate),
+    "-k {}".format(opts.sylph_sketch_k),
+    "--min-spacing {}".format(opts.sylph_sketch_minimum_spacing),
+    "-1 {}".format(opts.forward_reads),
+    "-2 {}".format(opts.reverse_reads),
+    "-d {}".format(output_directory),
+
+        "&&",
+
+    "mv",
+    "-v",
+    os.path.join(output_directory, "{}.paired.sylsp".format(os.path.split(opts.forward_reads)[1])),
+    os.path.join(output_directory, "reads.sylsp"),
+    ]
+
+    return cmd
+
+def get_sylph_profile_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
+    # Command
+    cmd = [
+        os.environ["sylph"],
+        "profile",
+        "-t {}".format(opts.n_jobs),
+        "--minimum-ani {}".format(opts.sylph_profile_minimum_ani),
+        "--min-number-kmers {}".format(opts.sylph_profile_minimum_number_kmers),
+        "--min-count-correct {}".format(opts.sylph_profile_minimum_count_correct),
+        opts.sylph_profile_options,
+        " ".join(input_filepaths),
+        "|",
+        "gzip",
+        ">",
+        os.path.join(output_directory, "sylph_profile.tsv.gz"),
+
+            "&&",
+
+        os.environ["reformat_sylph_profile_single_sample_output.py"],
+        "-i {}".format(os.path.join(output_directory, "sylph_profile.tsv.gz")),
+        "-o {}".format(output_directory),
+        "-c {}".format(opts.genome_clusters) if opts.genome_clusters else "",
+        "-f Taxonomic_abundance",
+        "-x {}".format(opts.extension),
+        "--header" if opts.header else "",
+        ]
+   
+    return cmd
+
+
+
+# ============
+# Run Pipeline
+# ============
+# Set environment variables
+def add_executables_to_environment(opts):
+    """
+    Adapted from Soothsayer: https://github.com/jolespin/soothsayer
+    """
+    accessory_scripts = set([
+            "reformat_sylph_profile_single_sample_output.py",
+    ]
+    )
+
+    required_executables={ 
+                "sylph",
+                # "seqkit",
+                
+     } | accessory_scripts
+
+    if opts.path_config == "CONDA_PREFIX":
+        executables = dict()
+        for name in required_executables:
+            executables[name] = os.path.join(os.environ["CONDA_PREFIX"], "bin", name)
+    else:
+        if opts.path_config is None:
+            opts.path_config = os.path.join(opts.script_directory, "veba_config.tsv")
+        opts.path_config = format_path(opts.path_config)
+        assert os.path.exists(opts.path_config), "config file does not exist.  Have you created one in the following directory?\n{}\nIf not, either create one, check this filepath:{}, or give the path to a proper config file using --path_config".format(opts.script_directory, opts.path_config)
+        assert os.stat(opts.path_config).st_size > 1, "config file seems to be empty.  Please add 'name' and 'executable' columns for the following program names: {}".format(required_executables)
+        df_config = pd.read_csv(opts.path_config, sep="\t")
+        assert {"name", "executable"} <= set(df_config.columns), "config must have `name` and `executable` columns.  Please adjust file: {}".format(opts.path_config)
+        df_config = df_config.loc[:,["name", "executable"]].dropna(how="any", axis=0).applymap(str)
+        # Get executable paths
+        executables = OrderedDict(zip(df_config["name"], df_config["executable"]))
+        assert required_executables <= set(list(executables.keys())), "config must have the required executables for this run.  Please adjust file: {}\nIn particular, add info for the following: {}".format(opts.path_config, required_executables - set(list(executables.keys())))
+
+    # Display
+    for name in sorted(accessory_scripts):
+        executables[name] = "'{}'".format(os.path.join(opts.script_directory, "scripts", name)) # Can handle spaces in path
+
+    print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
+    for name, executable in executables.items():
+        if name in required_executables:
+            print(name, executable, sep = " --> ", file=sys.stdout)
+            os.environ[name] = executable.strip()
+    print("", file=sys.stdout)
+
+
+# Pipeline
+def create_pipeline(opts, directories, f_cmds):
+
+    # .................................................................
+    # Primordial
+    # .................................................................
+    # Commands file
+    pipeline = ExecutablePipeline(name=__program__, description=opts.name, f_cmds=f_cmds, checkpoint_directory=directories["checkpoints"], log_directory=directories["log"])
+
+    # ==========
+    # Preprocess reads
+    # ==========
+
+    if opts.input_reads_format == "paired":
+
+        step = 0
+
+        # Info
+        program = "sylph_sketch"
+        program_label = "{}__{}".format(step, program)
+        description = "Sketch input reads"
+        
+        # Add to directories
+        output_directory = directories["output"]
+        # i/o
+        input_filepaths = [opts.forward_reads, opts.reverse_reads]
+        output_filepaths = [
+            os.path.join(output_directory, "reads.sylsp"),
+            ]
+
+        params = {
+            "input_filepaths":input_filepaths,
+            "output_filepaths":output_filepaths,
+            "output_directory":output_directory,
+            "opts":opts,
+            "directories":directories,
+        }
+
+        cmd = get_sylph_sketch_cmd(**params)
+        pipeline.add_step(
+                    id=program_label,
+                    description = description,
+                    step=step,
+                    cmd=cmd,
+                    input_filepaths = input_filepaths,
+                    output_filepaths = output_filepaths,
+                    validate_inputs=True,
+                    validate_outputs=True,
+                    log_prefix=program_label,
+        )
+    else:
+        output_filepaths = [opts.reads_sketch]
+    
+
+    # ==========
+    # Profile
+    # ==========
+    
+    step = 1
+
+    # Info
+    program = "sylph_profile"
+    program_label = "{}__{}".format(step, program)
+    description = "Profile genome databases"
+    
+    # Add to directories
+    output_directory = directories["output"] 
+
+    # i/o
+    input_filepaths = output_filepaths + opts.sylph_databases
+
+
+    output_filepaths = [
+            os.path.join(output_directory,  "sylph_profile.tsv.gz"),
+            os.path.join(output_directory,  "taxonomic_abundance.tsv.gz"),
+        ]
+    if opts.genome_clusters:
+        input_filepaths += [
+            opts.genome_clusters,
+        ]
+        output_filepaths += [
+            os.path.join(output_directory,  "taxonomic_abundance.clusters.tsv.gz"),
+        ]
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_sylph_profile_cmd(**params)
+    pipeline.add_step(
+                id=program_label,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+                log_prefix=program_label,
+
+    )
+
+   
+
+    return pipeline
+
+# Configure parameters
+def configure_parameters(opts, directories):
+
+    for db in opts.sylph_databases:
+        assert db.endswith(".syldb"), "{} must have .syldb file extension".format(db)
+
+ # --input_reads_format
+    assert_acceptable_arguments(opts.input_reads_format, {"paired",  "sketch", "auto"})
+    if opts.input_reads_format == "auto":
+        if any([opts.forward_reads, opts.reverse_reads]):
+            assert opts.forward_reads != opts.reverse_reads, "You probably mislabeled the input files because `forward_reads` should not be the same as `reverse_reads`: {}".format(opts.forward_reads)
+            assert opts.forward_reads is not None, "If running in --input_reads_format paired mode, --forward_reads and --reverse_reads are needed."
+            assert opts.reverse_reads is not None, "If running in --input_reads_format paired mode, --forward_reads and --reverse_reads are needed."
+            opts.input_reads_format = "paired"
+        if opts.reads_sketch is not None:
+            assert opts.forward_reads is None, "If running in --input_reads_format sketch mode, you cannot provide --forward_reads, --reverse_reads"
+            assert opts.reverse_reads is None, "If running in --input_reads_format sketch mode, you cannot provide --forward_reads, --reverse_reads"
+            opts.input_reads_format = "sketch"
+
+        print("Auto detecting reads format: {}".format(opts.input_reads_format), file=sys.stdout)
+    assert_acceptable_arguments(opts.input_reads_format, {"paired", "sketch"})
+
+    # Set environment variables
+    add_executables_to_environment(opts=opts)
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -1 <forward_reads.fq> -2 <reverse_reads.fq>|-s <sketch> -n <name> -o <output_directory> -d <db_1.syldb db_2.syldb ... db_n.syldb>".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+
+    # Pipeline
+    parser_io = parser.add_argument_group('Required I/O arguments')
+    parser_io.add_argument("-1","--forward_reads", type=str,  help = "path/to/forward_reads.fq[.gz]")
+    parser_io.add_argument("-2","--reverse_reads", type=str,  help = "path/to/reverse_reads.fq[.gz]]")
+    parser_io.add_argument("-s","--reads_sketch", type=str, help = "path/to/reads_sketch.sylsp (e.g., sylph sketch output) (Cannot be used with --forward_reads and --reverse_reads)")
+    parser_io.add_argument("-n", "--name", type=str, required=True, help="Name of sample")
+    parser_io.add_argument("-d","--sylph_databases", type=str, nargs="+", required=True, help = "Sylph database(s) with all genomes.  Can be multiple databases delimited by spaces.  Use compile_custom_sylph_sketch_database_from_genomes.py to build database.") 
+    parser_io.add_argument("-o","--project_directory", type=str, default="veba_output/profiling/taxonomy", help = "path/to/project_directory [Default: veba_output/profiling/taxonomy]")
+    parser_io.add_argument("-c","--genome_clusters", type=str, help = "path/to/mags_to_slcs.tsv. [id_genome]<tab>[id_genome-cluster], No header. Aggregates counts for genome clusters.")
+    parser_io.add_argument("-F", "--input_reads_format", choices={"paired", "sketch"}, type=str, default="auto", help = "Input reads format {paired, sketch} [Default: auto]")
+    parser_io.add_argument("-x","--extension", type=str, default="fa", help = "Fasta file extension for bins. Assumes all genomes have the same file extension. [Default: fa]")
+
+
+    # Utility
+    parser_utility = parser.add_argument_group('Utility arguments')
+    parser_utility.add_argument("--path_config", type=str,  default="CONDA_PREFIX", help="path/to/config.tsv [Default: CONDA_PREFIX]")  #site-packges in future
+    parser_utility.add_argument("-p", "--n_jobs", type=int, default=1, help = "Number of threads [Default: 1]")
+    parser_utility.add_argument("--random_state", type=int, default=0, help = "Random state [Default: 0]")
+    parser_utility.add_argument("--restart_from_checkpoint", type=str, default=None, help = "Restart from a particular checkpoint [Default: None]")
+    parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
+    parser_utility.add_argument("--tmpdir", type=str, help="Set temporary directory")  #site-packges in future
+
+    # Sylph
+    parser_sylph_sketch = parser.add_argument_group('Sylph sketch arguments (Fastq)')
+    parser_sylph_sketch.add_argument("--sylph_sketch_k", type=int, choices={21,31}, default=31,  help="Sylph sketch [Fastq] |  Value of k. Only k = 21, 31 are currently supported. [Default: 31]")
+    parser_sylph_sketch.add_argument("--sylph_sketch_minimum_spacing", type=int,  default=30,  help="Sylph sketch [Fastq] |  Minimum spacing between selected k-mers on the genomes [Default: 30]")
+    parser_sylph_sketch.add_argument("--sylph_sketch_subsampling_rate", type=int, default=100,  help="Sylph sketch [Fastq] |  Subsampling rate.	 sylph runs without issues if the -c for all genomes is ≥ the -c for reads.  [Default: 100]")
+    parser_sylph_sketch.add_argument("--sylph_sketch_options", type=str, default="", help="Sylph sketch [Fastq] | More options for `sylph sketch` (e.g. --arg 1 ) [Default: '']")
+
+    parser_sylph_profile = parser.add_argument_group('Sylph profile arguments')
+    parser_sylph_profile.add_argument("--sylph_profile_minimum_ani", type=float, default=95, help="Sylph profile | Minimum adjusted ANI to consider (0-100). [Default: 95]")
+    parser_sylph_profile.add_argument("--sylph_profile_minimum_number_kmers", type=int, default=20, help="Sylph profile | Exclude genomes with less than this number of sampled k-mers.  Default is 50 in Sylph but lowering to 20 accounts for viruses and small CPR genomes. [Default: 20]")
+    parser_sylph_profile.add_argument("--sylph_profile_minimum_count_correct", type=int, default=3, help="Sylph profile | Minimum k-mer multiplicity needed for coverage correction. Higher values gives more precision but lower sensitivity [Default: 3]")
+    parser_sylph_profile.add_argument("--sylph_profile_options", type=str, default="", help="Sylph profile | More options for `sylph profile` (e.g. --arg 1 ) [Default: '']")
+    parser_sylph_profile.add_argument("--header", action="store_true",  help = "Include header in taxonomic abundance tables")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    # Threads
+    if opts.n_jobs == -1:
+        from multiprocessing import cpu_count 
+        opts.n_jobs = cpu_count()
+    assert opts.n_jobs >= 1, "--n_jobs must be ≥ 1.  To select all available threads, use -1."
+
+
+    # Directories
+    directories = dict()
+    directories["project"] = create_directory(opts.project_directory)
+    directories["sample"] = create_directory(os.path.join(directories["project"], opts.name))
+    directories["output"] = create_directory(os.path.join(directories["sample"], "output"))
+
+    directories["log"] = create_directory(os.path.join(directories["sample"], "log"))
+    if not opts.tmpdir:
+        opts.tmpdir = os.path.join(directories["sample"], "tmp")
+    directories["tmp"] = create_directory(opts.tmpdir)
+    directories["checkpoints"] = create_directory(os.path.join(directories["sample"], "checkpoints"))
+    directories["intermediate"] = create_directory(os.path.join(directories["sample"], "intermediate"))
+    os.environ["TMPDIR"] = directories["tmp"]
+
+    # Info
+    print(format_header(__program__, "="), file=sys.stdout)
+    print(format_header("Configuration:", "-"), file=sys.stdout)
+    print(format_header("Name: {}".format(opts.name), "."), file=sys.stdout)
+    print("Python version:", sys.version.replace("\n"," "), file=sys.stdout)
+    print("Python path:", sys.executable, file=sys.stdout) #sys.path[2]
+    print("GenoPype version:", genopype_version, file=sys.stdout) #sys.path[2]
+    print("Script version:", __version__, file=sys.stdout)
+    print("Moment:", get_timestamp(), file=sys.stdout)
+    print("Directory:", os.getcwd(), file=sys.stdout)
+    if "TMPDIR" in os.environ: print(os.environ["TMPDIR"], file=sys.stdout)
+    print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
+    configure_parameters(opts, directories)
+    sys.stdout.flush()
+
+    # Run pipeline
+    with open(os.path.join(directories["sample"], "commands.sh"), "w") as f_cmds:
+        pipeline = create_pipeline(
+                     opts=opts,
+                     directories=directories,
+                     f_cmds=f_cmds,
+        )
+        pipeline.compile()
+        pipeline.execute(restart_from_checkpoint=opts.restart_from_checkpoint)
+
+if __name__ == "__main__":
+    main()
diff --git a/src/scripts/binning_wrapper.py b/src/scripts/binning_wrapper.py
index cfd1c0e..a9904e6 100755
--- a/src/scripts/binning_wrapper.py
+++ b/src/scripts/binning_wrapper.py
@@ -12,7 +12,7 @@
 
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.5.8"
+__version__ = "2023.12.4"
 
 def get_maxbin2_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
     # Create dummy scaffolds_to_bins.tsv to overwrite later. This makes DAS_Tool easier to run
@@ -740,6 +740,9 @@ def add_executables_to_environment(opts):
                 "merge_cutup_clustering.py", 
                 "extract_fasta_bins.py", 
             } 
+    # if opts.algorithm == "vrhyme":
+    #     required_executables |= {"vRhyme"}
+
     # if opts.algorithm == "metacoag":
     #     required_executables |= {"MetaCoAG"}
 
@@ -845,7 +848,7 @@ def main(argv=None):
 
     # Binning
     parser_binning = parser.add_argument_group('Binning arguments')
-    parser_binning.add_argument("-a", "--algorithm", type=str, default="metabat2", help="Binning algorithm: {concoct, metabat2, maxbin2} Future: {metacoag, vamb} [Default: metabat2] ")
+    parser_binning.add_argument("-a", "--algorithm", type=str, default="metabat2", help="Binning algorithm: {concoct, metabat2, maxbin2} Future: {vrhyme} [Default: metabat2] ")
     parser_binning.add_argument("-m", "--minimum_contig_length", type=int, default=1500, help="Minimum contig length.  [Default: 1500] ")
     parser_binning.add_argument("-s", "--minimum_genome_length", type=int, default=150000, help="Minimum genome length.  [Default: 150000] ")
     parser_binning.add_argument("-P","--bin_prefix", type=str,  default="DEFAULT", help = "Prefix for bin names. Special strings include: 1) --bin_prefix NONE which does not include a bin prefix; and 2) --bin_prefix DEFAULT then prefix is [ALGORITHM_UPPERCASE]__")
@@ -870,8 +873,8 @@ def main(argv=None):
     # parser_metacoag = parser.add_argument_group('MetaCoAG arguments')
     # parser_metacoag.add_argument("--metacoag_options", type=str, default="", help="MetaCoAG | More options (e.g. --arg 1 ) [Default: '']")
 
-    # parser_vamb = parser.add_argument_group('VAMB arguments')
-    # parser_vamb.add_argument("--vamb_options", type=str, default="", help="VAMB | More options (e.g. --arg 1 ) [Default: '']")
+    # parser_vrhyme = parser.add_argument_group('vRhyme arguments')
+    # parser_vrhyme.add_argument("--vrhyme_options", type=str, default="", help="vRhyme | More options (e.g. --arg 1 ) [Default: '']")
 
     # Options
     opts = parser.parse_args(argv)
diff --git a/src/scripts/build_source_to_lineage_dictionary.py b/src/scripts/build_source_to_lineage_dictionary.py
new file mode 100755
index 0000000..e593068
--- /dev/null
+++ b/src/scripts/build_source_to_lineage_dictionary.py
@@ -0,0 +1,69 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, gzip, pickle
+from tqdm import tqdm
+import pandas as pd
+
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.13"
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <source_lineage.tsv[.gz]> -o <output.dict.pkl[.gz]>".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser.add_argument("-i","--input", default="stdin", type=str, help = "Path to table [id_source]<tab>[class]<tab>[order]<tab>[family]<tab>[genus]<tab>[species], with header.  Can include more columns but the first column must be `id_source`. [Default: stdin]")
+    parser.add_argument("-o","--output", required=True, type=str, help = "Path to dictionary pickle object.  Can be gzipped. (Recommended name: source_to_lineage.dict.pkl.gz)")
+    parser.add_argument("--separator", default=";", type=str, help = "Separator field for taxonomy [Default: ; ]")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    # Input 
+    if opts.input == "stdin":
+        opts.input = sys.stdin 
+
+
+    print(" * Reading identifier mappings from the following file: {}".format(opts.input), file=sys.stderr)
+    source_to_lineage = dict()
+    df_input = pd.read_csv(opts.input, sep="\t", index_col=0)
+    for id_source, row in tqdm(df_input.loc[:,["class", "order", "family", "genus", "species"]].iterrows(), total=df_input.shape[0]):
+        lineage = list()
+        for level, taxon in row.items():
+            v = level[0] + "__"
+            if pd.notnull(taxon):
+                v += taxon 
+            lineage.append(v)
+        source_to_lineage[id_source] = opts.separator.join(lineage)
+
+
+
+    print(" * Writing Python dictionary: {}".format(opts.output), file=sys.stderr)
+    f_out = None 
+    if opts.output.endswith((".gz", ".pgz")):
+        f_out = gzip.open(opts.output, "wb")
+    else:
+        f_out = open(opts.output, "wb")
+    assert f_out is not None, "Unrecognized file format: {}".format(opts.output)
+    pickle.dump(source_to_lineage, f_out)
+
+
+   
+
+
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/scripts/build_target_to_source_dictionary.py b/src/scripts/build_target_to_source_dictionary.py
new file mode 100755
index 0000000..367bc94
--- /dev/null
+++ b/src/scripts/build_target_to_source_dictionary.py
@@ -0,0 +1,77 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, gzip, pickle
+
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.15"
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <identifier_mapping.proteins.tsv[.gz]> -o <output.dict.pkl[.gz]>".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser.add_argument("-i","--input", default="stdin", type=str, help = "Path to identifier mapping table [id_database]<tab>[id_source]<tab>[id_protein]<tab>[id_hash], No header. [Default: stdin]")
+    parser.add_argument("-o","--output", required=True, type=str, help = "Path to dictionary pickle object.  Can be gzipped. (Recommended name: target_to_source.dict.pkl.gz)")
+    parser.add_argument("-n","--number_of_sequences",  type=int, help = "Number of sequences.  If used, the tqdm is required.")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    # Input 
+    f_in = None
+    if opts.input == "stdin":
+        f_in = sys.stdin 
+    else:
+        if opts.input.endswith(".gz"):
+            f_in = gzip.open(opts.input, "rt")
+        else:
+            f_in = open(opts.input, "r")
+    assert f_in is not None, "Unrecognized file format: {}".format(opts.input)
+
+    if opts.number_of_sequences is not None:
+        from tqdm import tqdm
+        input_iterable = tqdm(f_in, total=opts.number_of_sequences, unit=" sequences")
+    else:
+        input_iterable = f_in
+
+    print(" * Reading identifier mappings from the following file: {}".format(f_in), file=sys.stderr)
+    target_to_source = dict()
+    for line in input_iterable:
+        line = line.strip()
+        if line:
+            fields = line.split("\t")
+            id_hash = fields[3]
+            id_source = fields[1]
+            target_to_source[id_hash] = id_source
+    if f_in != sys.stdin:
+        f_in.close()
+
+    print(" * Writing Python dictionary: {}".format(opts.output), file=sys.stderr)
+    f_out = None 
+    if opts.output.endswith((".gz", ".pgz")):
+        f_out = gzip.open(opts.output, "wb")
+    else:
+        f_out = open(opts.output, "wb")
+    assert f_out is not None, "Unrecognized file format: {}".format(opts.output)
+    pickle.dump(target_to_source, f_out)
+
+
+   
+
+
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/scripts/check_fasta_duplicates.py b/src/scripts/check_fasta_duplicates.py
index b4ca8dd..527b508 100755
--- a/src/scripts/check_fasta_duplicates.py
+++ b/src/scripts/check_fasta_duplicates.py
@@ -3,7 +3,7 @@
 from tqdm import tqdm
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.4.17"
+__version__ = "2023.11.10"
 
 def main(args=None):
     # Path info
@@ -30,13 +30,15 @@ def main(args=None):
     if not opts.input:
         identifiers = set()
         duplicates = set()
-        for line in tqdm(sys.stdin, "stdin"):
+        for i, line in tqdm(enumerate(sys.stdin), "stdin"):
             if line.startswith(">"):
                 id = line[1:].split(" ")[0].strip()
                 if id not in identifiers:
                     identifiers.add(id)
                 else:
                     duplicates.add(id)
+            else:
+                assert ">" not in line, "Line={} has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(i+1)
         if duplicates:
             print("# Duplicates:", *sorted(duplicates), file=sys.stdout, sep="\n", end=None)
             sys.exit(1)
@@ -48,13 +50,16 @@ def main(args=None):
             identifiers = set()
             duplicates = set()
             f = {True:gzip.open(fp, "rt"), False:open(fp, "r")}[fp.endswith(".gz")]
-            for line in tqdm(f, fp):
+            for i,line in tqdm(enumerate(f), fp):
                 if line.startswith(">"):
                     id = line[1:].split(" ")[0]
                     if id not in identifiers:
                         identifiers.add(id)
                     else:
                         duplicates.add(id)
+                else:
+                    assert ">" not in line, "Line={} has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(i+1)
+
             if duplicates:
                 files_with_duplicates.add(fp)
                 print(f"[Fail] {fp}", file=sys.stdout)
diff --git a/src/scripts/clean_fasta.py b/src/scripts/clean_fasta.py
new file mode 100755
index 0000000..92192c6
--- /dev/null
+++ b/src/scripts/clean_fasta.py
@@ -0,0 +1,124 @@
+#!/usr/bin/env python
+import sys, os, argparse, gzip 
+from Bio.SeqIO.FastaIO import SimpleFastaParser
+from tqdm import tqdm
+
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.10"
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <input.fasta> -o <output.fasta>)".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+
+    # Pipeline
+    parser.add_argument("-i","--input", default="stdin", type=str, help = "Input fasta file")
+    parser.add_argument("-o","--output", default="stdout", type=str, help = "Output fasta file")
+    parser.add_argument("-r","--retain_description", action="store_true", help = "Retain description")
+    parser.add_argument("-s","--retain_stop_codon", action="store_true", help = "Retain stop codon character (if one exists)")
+    parser.add_argument("-m","--minimum_sequence_length", default=1, type=int, help = "Minimum sequence length accepted [Default: 1]")
+    parser.add_argument("--stop_codon_character", default="*", type=str, help = "Stop codon character [Default: *] ")
+    # parser.add_argument("-t","--molecule_type",  help = "Comma-separated list of names for the --scaffolds_to_bins")
+
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    assert opts.minimum_sequence_length > 0
+
+    # Input
+    f_in = None
+    if opts.input == "stdin":
+        f_in = sys.stdin 
+    else:
+        if opts.input.endswith(".gz"):
+            f_in = gzip.open(opts.input, "rt")
+        else:
+            f_in = open(opts.input, "r")
+    assert f_in is not None
+
+    # Output
+    f_out = None
+    if opts.output == "stdout":
+        f_out = sys.stdout 
+    else:
+        if opts.output.endswith(".gz"):
+            f_out = gzip.open(opts.output, "wt")
+        else:
+            f_out = open(opts.output, "w")
+    assert f_out is not None
+    
+    # retain_description=True
+    # retain_stop_codon=True
+    if all([
+        opts.retain_description,
+        opts.retain_stop_codon,
+        ]):
+        for header, seq in tqdm(SimpleFastaParser(f_in), "Reading fasta input"):
+            header = header.strip()
+            if len(seq) >= opts.minimum_sequence_length:
+                assert ">" not in seq, "`{}` has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(header)
+                print(">{}\n{}".format(header,seq), file=f_out)
+
+    # retain_description=False
+    # retain_stop_codon=True
+    if all([
+        not opts.retain_description,
+        opts.retain_stop_codon,
+        ]):
+        for header, seq in tqdm(SimpleFastaParser(f_in), "Reading fasta input"):
+            id = header.split(" ")[0].strip()
+            if len(seq) >= opts.minimum_sequence_length:
+                assert ">" not in seq, "`{}` has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(header)
+                print(">{}\n{}".format(id,seq), file=f_out)
+
+    # retain_description=True
+    # retain_stop_codon=False
+    if all([
+        opts.retain_description,
+        not opts.retain_stop_codon,
+        ]):
+        for header, seq in tqdm(SimpleFastaParser(f_in), "Reading fasta input"):
+            header = header.strip()
+            if seq.endswith(opts.stop_codon_character):
+                seq = seq[:-1]
+            if len(seq) >= opts.minimum_sequence_length:
+                assert ">" not in seq, "`{}` has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(header)
+                print(">{}\n{}".format(header,seq), file=f_out)
+
+    # retain_description=False
+    # retain_stop_codon=False
+    if all([
+        not opts.retain_description,
+        not opts.retain_stop_codon,
+        ]):
+        for header, seq in tqdm(SimpleFastaParser(f_in), "Reading fasta input"):
+            id = header.split(" ")[0].strip()
+            if seq.endswith(opts.stop_codon_character):
+                seq = seq[:-1]
+            if len(seq) >= opts.minimum_sequence_length:
+                assert ">" not in seq, "`{}` has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(header)
+                print(">{}\n{}".format(id,seq), file=f_out)
+
+    # Close
+    if f_in != sys.stdin:
+        f_in.close()
+    if f_out != sys.stdout:
+        f_out.close()
+
+if __name__ == "__main__":
+    main()
+    
+                
+
diff --git a/src/scripts/clustering_wrapper.py b/src/scripts/clustering_wrapper.py
new file mode 100755
index 0000000..b8eddbb
--- /dev/null
+++ b/src/scripts/clustering_wrapper.py
@@ -0,0 +1,439 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, glob, shutil, time, warnings
+from multiprocessing import cpu_count
+from collections import OrderedDict, defaultdict
+
+import pandas as pd
+
+# Soothsayer Ecosystem
+from genopype import *
+from soothsayer_utils import *
+
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.10"
+
+# Check
+def get_check_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+    # Command
+
+    # Command
+    cmd = [
+        os.environ["check_fasta_duplicates.py"],
+        opts.fasta,
+        ]
+
+    return cmd
+
+def get_mmseqs2_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+    cmd = [
+            os.environ["mmseqs"],
+            "easy-{}".format(opts.algorithm.split("-")[1]),
+            opts.fasta,
+            os.path.join(output_directory, "mmseqs2"),
+            directories["tmp"],
+            "--threads {}".format(opts.n_jobs),
+            "--min-seq-id {}".format(opts.minimum_identity_threshold/100),
+            "-c {}".format(opts.minimum_coverage_threshold),
+            "--cov-mode 1",
+            opts.mmseqs2_options,
+
+                "&&",
+
+            "mv",
+            os.path.join(output_directory, "mmseqs2_cluster.tsv"),
+            os.path.join(output_directory, "clusters.tsv"),
+
+                "&&",
+
+            "mv",
+            os.path.join(output_directory, "mmseqs2_rep_seq.fasta"),
+            os.path.join(output_directory, "representatives.fasta"),
+
+                "&&",
+
+            "gzip",
+            os.path.join(output_directory, "representatives.fasta"),
+
+                "&&",
+
+            "rm -rf",
+            os.path.join(output_directory, "mmseqs2_all_seqs.fasta"),
+            os.path.join(directories["tmp"], "*"),
+        ]
+
+    return cmd
+
+def get_diamond_cmd( input_filepaths, output_filepaths, output_directory, directories, opts):
+
+    # Command
+    cmd = [
+            os.environ["diamond"],
+            {"diamond-cluster":"cluster", "diamond-linclust":"linclust"}[opts.algorithm],
+            "--db",
+            opts.fasta,
+            "--out",
+            os.path.join(output_directory, "clusters.tsv"),
+            "--tmpdir",
+            directories["tmp"],
+            "--threads {}".format(opts.n_jobs),
+            "--approx-id {}".format(opts.minimum_identity_threshold),
+            "--member-cover {}".format(opts.minimum_coverage_threshold*100),
+            opts.diamond_options,
+
+                "&&",
+
+            "cut -f1",
+            os.path.join(output_directory, "clusters.tsv"),
+            "|",
+            "sort -u",
+            ">",
+            os.path.join(output_directory, "representatives.list"),
+
+                "&&",
+
+            os.environ["seqkit"],
+            "grep",
+            "-w 0",
+            "-f",
+            os.path.join(output_directory, "representatives.list"),
+            opts.fasta,
+            "|",
+            "gzip",
+            ">",
+            os.path.join(output_directory, "representatives.fasta.gz"),
+
+                "&&",
+
+            "rm -rf",
+            os.path.join(directories["tmp"], "*"),
+        ]
+
+    return cmd
+
+# Compile
+def get_compile_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
+    
+    # Command
+    cmd = [
+
+            os.environ["edgelist_to_clusters.py"],
+            "-i {}".format(input_filepaths[0]),
+            "--no_singletons" if bool(opts.no_singletons) else "",
+            "--cluster_prefix {}".format(opts.cluster_prefix) if bool(opts.cluster_prefix) else "",
+            "--cluster_suffix {}".format(opts.cluster_suffix) if bool(opts.cluster_suffix) else "",
+            "--cluster_prefix_zfill {}".format(opts.cluster_prefix_zfill),
+            "-o {}".format(os.path.join(output_directory, "{}.tsv".format(opts.basename))),
+            # "-g {}".format(os.path.join(output_directory, "{}.networkx_graph.pkl".format(opts.basename))),
+            # "-d {}".format(os.path.join(output_directory, "{}.dict.pkl".format(opts.basename))),
+            "--identifiers {}".format(opts.identifiers) if bool(opts.identifiers) else "",
+            
+                "&&",
+
+            os.environ["reformat_representative_sequences.py"],
+            "-c {}".format(os.path.join(output_directory, "{}.tsv".format(opts.basename))),
+            "-i {}".format(input_filepaths[1]),
+            "-f {}".format(opts.representative_output_format),
+            "-o {}".format(output_filepaths[1]),
+    ]
+
+    if opts.no_sequences_and_header:
+        cmd += [ 
+            "--no_sequences",
+            "--no_header",
+        ]
+
+    return cmd
+
+# ============
+# Run Pipeline
+# ============
+# Set environment variables
+def add_executables_to_environment(opts):
+    """
+    Adapted from Soothsayer: https://github.com/jolespin/soothsayer
+    """
+    accessory_scripts = set([ 
+        "check_fasta_duplicates.py",
+        "edgelist_to_clusters.py",
+        "reformat_representative_sequences.py",
+    ])
+
+    required_executables={
+                "mmseqs",
+                "diamond",
+                "seqkit",
+
+     } | accessory_scripts
+
+    if opts.path_config == "CONDA_PREFIX":
+        executables = dict()
+        for name in required_executables:
+            executables[name] = os.path.join(os.environ["CONDA_PREFIX"], "bin", name)
+    else:
+        if opts.path_config is None:
+            opts.path_config = os.path.join(opts.script_directory, "veba_config.tsv")
+        opts.path_config = format_path(opts.path_config)
+        assert os.path.exists(opts.path_config), "config file does not exist.  Have you created one in the following directory?\n{}\nIf not, either create one, check this filepath:{}, or give the path to a proper config file using --path_config".format(opts.script_directory, opts.path_config)
+        assert os.stat(opts.path_config).st_size > 1, "config file seems to be empty.  Please add 'name' and 'executable' columns for the following program names: {}".format(required_executables)
+        df_config = pd.read_csv(opts.path_config, sep="\t")
+        assert {"name", "executable"} <= set(df_config.columns), "config must have `name` and `executable` columns.  Please adjust file: {}".format(opts.path_config)
+        df_config = df_config.loc[:,["name", "executable"]].dropna(how="any", axis=0).applymap(str)
+        # Get executable paths
+        executables = OrderedDict(zip(df_config["name"], df_config["executable"]))
+        assert required_executables <= set(list(executables.keys())), "config must have the required executables for this run.  Please adjust file: {}\nIn particular, add info for the following: {}".format(opts.path_config, required_executables - set(list(executables.keys())))
+
+    # Display
+    for name in sorted(accessory_scripts):
+        executables[name] = "'{}'".format(os.path.join(opts.script_directory, name)) # Can handle spaces in path
+        
+    print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
+    for name, executable in executables.items():
+        if name in required_executables:
+            print(name, executable, sep = " --> ", file=sys.stdout)
+            os.environ[name] = executable.strip()
+    print("", file=sys.stdout)
+
+# Pipeline
+def create_pipeline(opts, directories, f_cmds):
+
+    # .................................................................
+    # Primordial
+    # .................................................................
+    # Commands file
+    pipeline = ExecutablePipeline(name=__program__,  f_cmds=f_cmds, checkpoint_directory=directories["checkpoints"], log_directory=directories["log"])
+
+    
+    # ==========
+    # Preprocessing
+    # ==========
+    
+    program = "check"
+    # Add to directories
+    output_directory = directories["tmp"] 
+
+    # Info
+    step = 0
+    description = "Check sequences for duplicates"
+
+    # i/o
+    input_filepaths = [opts.fasta]
+    output_filepaths = [
+    ]
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_check_cmd(**params)
+
+    pipeline.add_step(
+                id=program,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=False,
+    )
+
+    # ==========
+    # Clustering
+    # ==========
+    step = 1
+
+    # i/o
+    output_directory = directories["intermediate"] 
+
+    input_filepaths = [opts.fasta]
+    output_filenames = [
+        "clusters.tsv",
+        "representatives.fasta.gz",
+    ]
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    if opts.algorithm.split("-")[0] == "mmseqs":    
+        program = "mmseqs2"
+        # Info
+        description = "Cluster sequences via MMSEQS2"
+        cmd = get_mmseqs2_cmd(**params)
+
+    if opts.algorithm.split("-")[0] == "diamond":    
+        program = "diamond"
+        description = "Cluster sequences via Diamond"
+        cmd = get_diamond_cmd(**params)
+
+    pipeline.add_step(
+                id=program,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+    )
+
+    # ==========
+    # Compile
+    # ==========
+    
+    program = "compile"
+    # Add to directories
+    output_directory = directories["output"] 
+
+    # Info
+    step = 2
+    description = "Compile clustering results"
+
+    # i/o
+    input_filepaths = output_filepaths
+    output_filenames = [
+        "{}.tsv".format(opts.basename),
+    ]
+    if opts.representative_output_format == "table":
+        output_filenames += ["representative_sequences.tsv.gz"]
+    if opts.representative_output_format == "fasta":
+        output_filenames += ["representative_sequences.fasta.gz"]
+    output_filepaths = list(map(lambda filename: os.path.join(output_directory, filename), output_filenames))
+
+    params = {
+        "input_filepaths":input_filepaths,
+        "output_filepaths":output_filepaths,
+        "output_directory":output_directory,
+        "opts":opts,
+        "directories":directories,
+    }
+
+    cmd = get_compile_cmd(**params)
+
+    pipeline.add_step(
+                id=program,
+                description = description,
+                step=step,
+                cmd=cmd,
+                input_filepaths = input_filepaths,
+                output_filepaths = output_filepaths,
+                validate_inputs=True,
+                validate_outputs=True,
+    )
+
+    return pipeline
+
+# Configure parameters
+def configure_parameters(opts, directories):
+
+    assert_acceptable_arguments(opts.algorithm, {"easy-cluster", "easy-linclust", "mmseqs-cluster", "mmseqs-linclust", "diamond-cluster", "diamond-linclust"})
+    if opts.algorithm in {"easy-cluster", "easy-linclust"}:
+        d = {"easy-cluster":"mmseqs-cluster", "easy-linclust":"mmseqs-linclust"}
+        warnings.warn("\n\nPlease use `{}` instead of `{}` for MMSEQS2 clustering.".format(d[opts.algorithm], opts.algorithm))
+        opts.algorithm = d[opts.algorithm]
+    assert_acceptable_arguments(opts.representative_output_format, {"table", "fasta"})
+    # Set environment variables
+    add_executables_to_environment(opts=opts)
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <sequences.fasta> -o <output_directory>".format(__program__)
+
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser_io = parser.add_argument_group('Required I/O arguments')
+    parser_io.add_argument("-i","--fasta", type=str, help = "Fasta file")
+    parser_io.add_argument("-o","--output_directory", type=str, default="clustering_output", help = "path/to/project_directory [Default: clustering_output]")
+    parser_io.add_argument("-e", "--no_singletons", action="store_true", help="Exclude singletons")
+    parser_io.add_argument("-b", "--basename", type=str, default="clusters", help="Basename for clustering files [Default: clusters]")
+
+    # Utility
+    parser_utility = parser.add_argument_group('Utility arguments')
+    parser_utility.add_argument("--path_config", type=str,  default="CONDA_PREFIX", help="path/to/config.tsv [Default: CONDA_PREFIX]")  #site-packges in future
+    parser_utility.add_argument("-p", "--n_jobs", type=int, default=1, help = "Number of threads [Default: 1]")
+    parser_utility.add_argument("--restart_from_checkpoint", type=str, default=None, help = "Restart from a particular checkpoint [Default: None]")
+    parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
+    # parser_utility.add_argument("--verbose", action='store_true')
+
+    # Clustering
+    parser_clustering = parser.add_argument_group('Clustering arguments')
+    parser_clustering.add_argument("-a", "--algorithm", type=str, default="mmseqs-cluster", help="Clustering algorithm | Diamond can only be used for clustering proteins {mmseqs-cluster, mmseqs-linclust, diamond-cluster, mmseqs-linclust} [Default: mmseqs-cluster]")
+    parser_clustering.add_argument("-t", "--minimum_identity_threshold", type=float, default=50.0, help="Clustering | Percent identity threshold (Range (0.0, 100.0]) [Default: 50.0]")
+    parser_clustering.add_argument("-c", "--minimum_coverage_threshold", type=float, default=0.8, help="Clustering | Coverage threshold (Range (0.0, 1.0]) [Default: 0.8]")
+    parser_clustering.add_argument("--cluster_prefix", type=str, default="SC-", help="Sequence cluster prefix [Default: 'SC-]")
+    parser_clustering.add_argument("--cluster_suffix", type=str, default="", help="Sequence cluster suffix [Default: '']")
+    parser_clustering.add_argument("--cluster_prefix_zfill", type=int, default=0, help="Sequence cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
+    parser_clustering.add_argument("--mmseqs2_options", type=str, default="", help="MMSEQS2 | More options (e.g. --arg 1 ) [Default: '']")
+    parser_clustering.add_argument("--diamond_options", type=str, default="", help="Diamond | More options (e.g. --arg 1 ) [Default: '']")
+    parser_clustering.add_argument("--identifiers", type=str, help = "Identifiers to include for `edgelist_to_clusters.py`.  If missing identifiers and singletons are allowed, then they will be included as singleton clusters with weight of np.inf")
+    parser_clustering.add_argument("--no_sequences_and_header", action="store_true", help = "Don't include sequences or header in table.  Useful for concatenation and reduced redundancy of sequences")
+    parser_clustering.add_argument("-f","--representative_output_format", type=str, default="fasta", help = "Format of output for representative sequences: {table, fasta} [Default: fasta]") # Should fasta be the new default?
+
+    # Options
+    opts = parser.parse_args()
+
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+    
+    # Threads
+    if opts.n_jobs == -1:
+        opts.n_jobs = cpu_count()
+    assert opts.n_jobs >= 1, "--n_jobs must be ≥ 1 (or -1 to use all available threads)"
+
+    # Directories
+    directories = dict()
+    directories["project"] = create_directory(opts.output_directory)
+    directories["output"] = create_directory(os.path.join(directories["project"], "output"))
+    directories["log"] = create_directory(os.path.join(directories["project"], "log"))
+    directories["tmp"] = create_directory(os.path.join(directories["project"], "tmp"))
+    directories["checkpoints"] = create_directory(os.path.join(directories["project"], "checkpoints"))
+    directories["intermediate"] = create_directory(os.path.join(directories["project"], "intermediate"))
+    os.environ["TMPDIR"] = directories["tmp"]
+
+    # Info
+    print(format_header(__program__, "="), file=sys.stdout)
+    print(format_header("Configuration:", "-"), file=sys.stdout)
+    print("Python version:", sys.version.replace("\n"," "), file=sys.stdout)
+    print("Python path:", sys.executable, file=sys.stdout) #sys.path[2]
+    print("Script version:", __version__, file=sys.stdout)
+    print("Moment:", get_timestamp(), file=sys.stdout)
+    print("Directory:", os.getcwd(), file=sys.stdout)
+    print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
+    configure_parameters(opts, directories)
+    sys.stdout.flush()
+
+    # Run pipeline
+    with open(os.path.join(directories["project"], "commands.sh"), "w") as f_cmds:
+        pipeline = create_pipeline(
+                     opts=opts,
+                     directories=directories,
+                     f_cmds=f_cmds,
+        )
+        pipeline.compile()
+        pipeline.execute(restart_from_checkpoint=opts.restart_from_checkpoint)
+
+if __name__ == "__main__":
+    main(sys.argv[1:])
+
+
diff --git a/src/scripts/compile_custom_humann_database_from_annotations.py b/src/scripts/compile_custom_humann_database_from_annotations.py
index a644bb5..6604413 100755
--- a/src/scripts/compile_custom_humann_database_from_annotations.py
+++ b/src/scripts/compile_custom_humann_database_from_annotations.py
@@ -11,7 +11,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.11"
+__version__ = "2023.12.15"
 
 
 def main(args=None):
@@ -31,7 +31,7 @@ def main(args=None):
     parser.add_argument("-a","--annotations",  type=str,  required=True, help = "path/to/annotations.tsv[.gz] Output from annotations.py. Multi-level header that contains (UniRef, sseqid)")
     parser.add_argument("-t","--taxonomy",  type=str, required=True,  help = "path/to/taxonomy.tsv[.gz] [id_genome]<tab>[classification] (No header).  Use output from `merge_taxonomy_classifications.py` with --no_header and --no_domain")
     parser.add_argument("-s","--sequences",  type=str, required=True,  help = "path/to/proteins.fasta[.gz]")
-    parser.add_argument("-o","--output", type=str, default="stdout", help = "path/to/humann_uniref_annotations.tsv[.gz] [Default: stdout]")
+    parser.add_argument("-o","--output", type=str, default="stdout", help = "path/to/humann_uniref_annotations.tsv[.gz] (veba_output/profiling/databases/) is recommended [Default: stdout]")
     parser.add_argument("--sep", default=";", help = "Separator for taxonomic levels [Default: ;]")
     # parser.add_argument("--mandatory_taxonomy_prefixes", help = "Comma-separated values for mandatory prefix levels. (e.g., 'c__,f__,g__,s__')")
     # parser.add_argument("--discarded_file",  help = "Proteins that have been discarded due to incomplete lineage")
diff --git a/src/scripts/compile_custom_sylph_sketch_database_from_genomes.py b/src/scripts/compile_custom_sylph_sketch_database_from_genomes.py
new file mode 100755
index 0000000..9c25424
--- /dev/null
+++ b/src/scripts/compile_custom_sylph_sketch_database_from_genomes.py
@@ -0,0 +1,239 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, glob, shutil, time, warnings
+from multiprocessing import cpu_count
+from collections import OrderedDict, defaultdict
+
+import pandas as pd
+
+# Soothsayer Ecosystem
+from genopype import *
+from genopype import __version__ as genopype_version
+from soothsayer_utils import *
+
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.12.15"
+
+# ============
+# Run Pipeline
+# ============
+# Set environment variables
+def add_executables_to_environment(opts):
+    """
+    Adapted from Soothsayer: https://github.com/jolespin/soothsayer
+    """
+    accessory_scripts = set([ 
+
+    ])
+
+    required_executables={
+                "sylph",
+
+     } | accessory_scripts
+
+    if opts.path_config == "CONDA_PREFIX":
+        executables = dict()
+        for name in required_executables:
+            executables[name] = os.path.join(os.environ["CONDA_PREFIX"], "bin", name)
+    else:
+        if opts.path_config is None:
+            opts.path_config = os.path.join(opts.script_directory, "veba_config.tsv")
+        opts.path_config = format_path(opts.path_config)
+        assert os.path.exists(opts.path_config), "config file does not exist.  Have you created one in the following directory?\n{}\nIf not, either create one, check this filepath:{}, or give the path to a proper config file using --path_config".format(opts.script_directory, opts.path_config)
+        assert os.stat(opts.path_config).st_size > 1, "config file seems to be empty.  Please add 'name' and 'executable' columns for the following program names: {}".format(required_executables)
+        df_config = pd.read_csv(opts.path_config, sep="\t")
+        assert {"name", "executable"} <= set(df_config.columns), "config must have `name` and `executable` columns.  Please adjust file: {}".format(opts.path_config)
+        df_config = df_config.loc[:,["name", "executable"]].dropna(how="any", axis=0).applymap(str)
+        # Get executable paths
+        executables = OrderedDict(zip(df_config["name"], df_config["executable"]))
+        assert required_executables <= set(list(executables.keys())), "config must have the required executables for this run.  Please adjust file: {}\nIn particular, add info for the following: {}".format(opts.path_config, required_executables - set(list(executables.keys())))
+
+    # Display
+    for name in sorted(accessory_scripts):
+        executables[name] = "'{}'".format(os.path.join(opts.script_directory, name)) # Can handle spaces in path
+        
+    print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
+    for name, executable in executables.items():
+        if name in required_executables:
+            print(name, executable, sep = " --> ", file=sys.stdout)
+            os.environ[name] = executable.strip()
+    print("", file=sys.stdout)
+
+
+# Configure parameters
+def configure_parameters(opts, directories):
+
+
+    # Set environment variables
+    add_executables_to_environment(opts=opts)
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <input.tsv> -o <output_directory>".format(__program__)
+
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser_io = parser.add_argument_group('Required I/O arguments')
+    parser_io.add_argument("-i","--input", type=str, default="stdin", help = "path/to/input.tsv. Format: Must include the following columns (No header)[organism_type]<tab>[path/to/genome.fa]. You can get this from `cut -f1,4 veba_output/misc/genomes_table.tsv` [Default: stdin]")
+    parser_io.add_argument("-o","--output_directory", type=str, default="veba_output/profiling/databases", help = "path/to/output_directory for databases [Default: veba_output/profiling/databases]")
+    parser_io.add_argument("--viral_tag", type=str, default="viral", help = "[Not case sensitive] Tag/Label of viral organisms in first column of --input (e.g., viral, virus, viron) [Default: viral]")
+
+
+    # Utility
+    parser_utility = parser.add_argument_group('Utility arguments')
+    parser_utility.add_argument("--path_config", type=str,  default="CONDA_PREFIX", help="path/to/config.tsv [Default: CONDA_PREFIX]")  #site-packges in future
+    parser_utility.add_argument("-p", "--n_jobs", type=int, default=1, help = "Number of threads [Default: 1]")
+    parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
+    # parser_utility.add_argument("--verbose", action='store_true')
+
+    # Sylph
+    parser_sylph = parser.add_argument_group('Sylph sketch arguments')
+    parser_sylph.add_argument("-k", "--sylph_k", type=int, choices={21,31}, default=31,  help="Sylph |  Value of k. Only k = 21, 31 are currently supported. [Default: 31]")
+    parser_sylph.add_argument("-s", "--sylph_minimum_spacing", type=int,  default=30,  help="Sylph |  Minimum spacing between selected k-mers on the genomes [Default: 30]")
+
+    parser_sylph_nonviral = parser.add_argument_group('[Prokaryotic & Eukaryotic] Sylph sketch arguments')
+    parser_sylph_nonviral.add_argument("--sylph_nonviral_subsampling_rate", type=int, default=200,  help="Sylph [Prokaryotic & Eukaryotic]|  Subsampling rate.	[Default: 200]")
+    parser_sylph_nonviral.add_argument("--sylph_nonviral_options", type=str, default="", help="Sylph [Prokaryotic & Eukaryotic] | More options for `sylph sketch` (e.g. --arg 1 ) [Default: '']")
+
+    parser_sylph_viral = parser.add_argument_group('[Viral] Sylph sketch arguments')
+    parser_sylph_viral.add_argument("--sylph_viral_subsampling_rate", type=int, default=100,  help="Sylph [Viral]|  Subsampling rate. [Default: 100]")
+    parser_sylph_viral.add_argument("--sylph_viral_options", type=str, default="", help="Sylph [Viral] | More options for `sylph sketch` (e.g. --arg 1 ) [Default: '']")
+
+    # Options
+    opts = parser.parse_args()
+
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+    
+    # Threads
+    if opts.n_jobs == -1:
+        opts.n_jobs = cpu_count()
+    assert opts.n_jobs >= 1, "--n_jobs must be ≥ 1 (or -1 to use all available threads)"
+
+    # Directories
+    directories = dict()
+    directories["output"] = create_directory(opts.output_directory)
+    directories["intermediate"] = create_directory(os.path.join(directories["output"], "intermediate"))
+    directories["log"] = create_directory(os.path.join(directories["intermediate"], "log"))
+    directories["checkpoints"] = create_directory(os.path.join(directories["intermediate"], "checkpoints"))
+
+    # Info
+    print(format_header(__program__, "="), file=sys.stdout)
+    print(format_header("Configuration:", "-"), file=sys.stdout)
+    print("Python version:", sys.version.replace("\n"," "), file=sys.stdout)
+    print("Python path:", sys.executable, file=sys.stdout) #sys.path[2]
+    print("Script version:", __version__, file=sys.stdout)
+    print("GenoPype version:", genopype_version, file=sys.stdout) #sys.path[2]
+    print("Moment:", get_timestamp(), file=sys.stdout)
+    print("Directory:", os.getcwd(), file=sys.stdout)
+    print("Commands:", list(filter(bool,sys.argv)),  sep="\n", file=sys.stdout)
+    configure_parameters(opts, directories)
+    sys.stdout.flush()
+
+    # Make directories
+    t0 = time.time()
+    # print(format_header("* ({}) Creating directories:".format(format_duration(t0)), opts.output_directory), file=sys.stdout)
+    # os.makedirs(opts.output_directory, exist_ok=True)
+    
+    # Load input
+    if opts.input == "stdin":
+        opts.input = sys.stdin
+    df_genomes = pd.read_csv(opts.input, sep="\t", header=None)
+    assert df_genomes.shape[1] == 2, "Must include the follow columns (No header) [organism_type]<tab>[genome]).  Suggested input is from `compile_genomes_table.py` script using `cut -f1,4` to get the necessary columns."
+    df_genomes.columns = ["organism_type", "genome"]
+
+    opts.viral_tag = opts.viral_tag.lower()
+
+    print(format_header("* ({}) Organizing genomes by organism_type".format(format_duration(t0))), file=sys.stdout)
+    organism_to_genomes = defaultdict(set)
+    for i, (organism_type, genome_filepath) in pv(df_genomes.iterrows(), unit="genomes ", total=df_genomes.shape[0]):
+        organism_type = organism_type.lower()
+        if organism_type == opts.viral_tag:
+            organism_to_genomes["viral"].add(genome_filepath)
+        else:
+            organism_to_genomes["nonviral"].add(genome_filepath)
+    # del df_genomes
+
+    # Commands
+    f_cmds = open(os.path.join(directories["intermediate"], "commands.sh"), "w")
+
+    for organism_type, filepaths in organism_to_genomes.items():
+        # Write genomes to file
+        print(format_header("* ({}) Creating genome database: (N={}) for organism_type='{}'".format(format_duration(t0),len(filepaths), organism_type)), file=sys.stdout)
+
+        genome_filepaths_list = os.path.join(directories["intermediate"], "{}_genomes.list".format(organism_type))
+        with open(genome_filepaths_list, "w") as f:
+            for fp in sorted(filepaths):
+                print(fp, file=f)
+
+        name = "sylph__{}".format(organism_type)
+        description = "[Program = sylph sketch] [Organism_Type = {}]".format(organism_type)
+
+        arguments = [
+            os.environ["sylph"],
+            "sketch",
+            "-t {}".format(opts.n_jobs),
+            "--gl {}".format(genome_filepaths_list),
+            "-o {}".format(os.path.join(opts.output_directory, "genome_database-{}".format(organism_type))),
+            "-k {}".format(opts.sylph_k),
+            "--min-spacing {}".format(opts.sylph_minimum_spacing),
+        ]
+
+        if organism_type == "nonviral":
+            arguments += [
+            "-c {}".format(opts.sylph_nonviral_subsampling_rate),
+            opts.sylph_nonviral_options,
+        ]
+
+        else:
+            arguments += [
+            "-c {}".format(opts.sylph_viral_subsampling_rate),
+            opts.sylph_viral_options,
+        ]
+        print(arguments, file=sys.stdout)                
+        cmd = Command(
+            arguments,
+            name=name, 
+            f_cmds=f_cmds,
+            )
+        
+    
+        # Run command
+        cmd.run(
+            checkpoint_message_notexists="[Running ({})] | {}".format(format_duration(t0), description),
+            checkpoint_message_exists="[Loading Checkpoint ({})] | {}".format(format_duration(t0), description),
+            write_stdout=os.path.join(directories["log"], "{}.o".format(name)),
+            write_stderr=os.path.join(directories["log"], "{}.e".format(name)),
+            write_returncode=os.path.join(directories["log"], "{}.returncode".format(name)),
+            checkpoint=os.path.join(directories["checkpoints"], name),
+            )
+        
+        if hasattr(cmd, "returncode_"):
+            if cmd.returncode_ != 0:
+                print("[Error] | {}".format(description), file=sys.stdout)
+                print("Check the following files:\ncat {}".format(os.path.join(directories["log"], "{}.*".format(name))), file=sys.stdout)
+                sys.exit(cmd.returncode_)
+            else:
+                output_filepath = os.path.join(opts.output_directory, "genome_database-{}.syldb".format(organism_type))
+                size_bytes = os.path.getsize(output_filepath)
+                size_mb = size_bytes >> 20
+                if size_mb < 1:
+                    print("Output Database:", output_filepath, "({} bytes)".format(size_bytes), file=sys.stdout)
+                else:
+                    print("Output Database:", output_filepath, "({} MB)".format(size_mb), file=sys.stdout)
+
+    f_cmds.close()
+
+if __name__ == "__main__":
+    main(sys.argv[1:])
+
+
diff --git a/src/scripts/compile_eukaryotic_classifications.py b/src/scripts/compile_eukaryotic_classifications.py
index 609526e..4841d85 100755
--- a/src/scripts/compile_eukaryotic_classifications.py
+++ b/src/scripts/compile_eukaryotic_classifications.py
@@ -6,7 +6,7 @@
 from tqdm import tqdm 
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.3.20"
+__version__ = "2023.12.14"
 
 
 def main(args=None):
@@ -16,20 +16,23 @@ def main(args=None):
     # Path info
     description = """
     Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
-    usage = "{} -i <metaeuk_identifier_mapping> -s <scaffolds_to_bins.tsv> -c <mag_to_clusters.tsv> -o <output_filepath>".format(__program__)
+    usage = "{} -i <metaeuk_identifier_mapping> -s <scaffolds_to_bins.tsv> -c <genome_to_clusters.tsv> -o <output_filepath>".format(__program__)
     epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
 
     # Parser
     parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
     # Pipeline
     parser.add_argument("-i","--metaeuk_identifier_mapping", type=str, required=True, help = "path/to/identifier_mapping.metaeuk.tsv")
-    parser.add_argument("-s","--scaffolds_to_bins", type=str, required=True, help = "path/to/scaffolds_to_bins.tsv")
-    parser.add_argument("-c","--clusters", type=str,  help = "path/to/clusters.tsv, Format: [id_mag]<tab>[id_cluster], No header [Optional]")
-    parser.add_argument("-o","--output", type=str, default="stdout", help = "path/to/output.tsv [Default: stdout]")
-    parser.add_argument("--eukaryotic_database", type=str, default=None, required=True, help="path/to/eukaryotic_database (e.g. --arg 1 )")
+    parser.add_argument("-s","--scaffolds_to_bins", type=str, required=False, help = "path/to/scaffolds_to_bins.tsv")
+    # parser.add_argument("-g","--genes_to_contigs", type=str, required=False, help = "path/to/genes_to_contigs.tsv cannot be used with --scaffolds_to_bins")
+    parser.add_argument("-c","--clusters", type=str,  help = "path/to/clusters.tsv, Format: [id_genome]<tab>[id_cluster], No header [Optional]")
+    parser.add_argument("-o","--output", type=str, default="stdout", help = "path/to/gene-source_lineage.tsv [Default: stdout]")
+    parser.add_argument("-d", "--eukaryotic_database", type=str, default=None, required=True, help="path/to/eukaryotic_database directory (e.g. --arg 1 )")
     # parser.add_argument("--veba_database", type=str, default=None, help=f"VEBA database location.  [Default: $VEBA_DATABASE environment variable]")
     parser.add_argument("--header", type=int, default=1, help="Include header in output {0=No, 1=Yes) [Default: 1]")
     parser.add_argument("--debug", action="store_true")
+    parser.add_argument("--remove_genes_with_missing_values", action="store_true")
+    parser.add_argument("--use_original_metaeuk_gene_identifiers", action="store_true")
 
     # Options
     opts = parser.parse_args()
@@ -44,20 +47,15 @@ def main(args=None):
     # opts.eukaryotic_database = os.path.join(opts.veba_database, "Classify", "Microeukaryotic")
 
     # I/O
-    # Scaffolds -> Bins
-    fp = opts.scaffolds_to_bins
-    print("* Reading scaffolds to bins table {}".format(fp), file=sys.stderr)
-    scaffold_to_bin = pd.read_csv(fp, sep="\t", index_col=0, header=None).iloc[:,0]
-    if opts.debug:
-        print(fp, file=sys.stderr)
-        scaffold_to_bin.head().to_csv(sys.stderr, sep="\t", header=None)
-        print("\n", file=sys.stderr)
+
         
     # SourceID -> Taxonomy
     fp = os.path.join(opts.eukaryotic_database,"source_taxonomy.tsv.gz")
     print("* Reading source taxonomy table {}".format(fp), file=sys.stderr)
     df_source_taxonomy = pd.read_csv(fp, sep="\t", index_col=0)
     df_source_taxonomy.index = df_source_taxonomy.index.map(str)
+    df_source_taxonomy = pd.DataFrame(df_source_taxonomy.to_dict()) # Hack for duplicate entries that will be resolved in MicroEuk_v3.1
+    
     if opts.debug:
         print(fp, file=sys.stderr)
         df_source_taxonomy.head().to_csv(sys.stderr, sep="\t")
@@ -65,7 +63,7 @@ def main(args=None):
 
     # VEBA -> SourceID
     fp = os.path.join(opts.eukaryotic_database,"target_to_source.dict.pkl.gz")
-    print("* Reading target to source mapping {}".format(fp), file=sys.stderr)
+    print("* Reading target to source mapping {} (Note: This one takes a little longer to load...)".format(fp), file=sys.stderr)
     with gzip.open(fp, "rb") as f:
         target_to_source = pickle.load(f)
     #target_to_source = pd.read_csv(fp, sep="\t", index_col=0, dtype=str, usecols=["id_veba", "id_source"], squeeze=True)#.iloc[:,0]
@@ -83,32 +81,44 @@ def main(args=None):
         df_metaeuk.head().to_csv(sys.stderr, sep="\t")
         print("\n", file=sys.stderr)
 
-    orf_to_bitscore = df_metaeuk["bitscore"].map(float)
-    orf_to_scaffold = df_metaeuk["C_acc"].map(str)
-    orf_to_mag = orf_to_scaffold.map(lambda id_scaffold: scaffold_to_bin[id_scaffold])
-
-    orf_to_target = df_metaeuk["T_acc"]
-    orf_to_source = orf_to_target.map(lambda id_target: target_to_source.get(id_target,np.nan))
-    if np.any(pd.isnull(orf_to_source)):
+    gene_to_bitscore = df_metaeuk["bitscore"].map(float)
+    gene_to_scaffold = df_metaeuk["C_acc"].map(str)
+    gene_to_genome = pd.Series([np.nan]*df_metaeuk.shape[0], index=df_metaeuk.index) 
+    gene_to_target = df_metaeuk["T_acc"]
+    gene_to_source = gene_to_target.map(lambda id_target: target_to_source.get(id_target,np.nan))
+
+    if opts.scaffolds_to_bins:
+        # Scaffolds -> Bins
+        fp = opts.scaffolds_to_bins
+        print("* Reading scaffolds to bins table {}".format(fp), file=sys.stderr)
+        scaffold_to_bin = pd.read_csv(fp, sep="\t", index_col=0, header=None).iloc[:,0]
+        if opts.debug:
+            print(fp, file=sys.stderr)
+            scaffold_to_bin.head().to_csv(sys.stderr, sep="\t", header=None)
+            print("\n", file=sys.stderr)
+        gene_to_genome = gene_to_scaffold.map(lambda id_scaffold: scaffold_to_bin[id_scaffold])
+
+    if np.any(pd.isnull(gene_to_source)):
         warnings.warn("The following gene - target identifiers are not in the database file: {}".format(
             os.path.join(opts.eukaryotic_database,"target_to_source.dict.pkl.gz"), 
             ),
         )
-        orf_to_target[orf_to_source[orf_to_source.isnull()].index].to_frame().to_csv(sys.stderr, sep="\t", header=None)
-        orf_to_source = orf_to_source.dropna()
+        gene_to_target[gene_to_source[gene_to_source.isnull()].index].to_frame().to_csv(sys.stderr, sep="\t", header=None)
+        gene_to_source = gene_to_source.dropna()
 
     # Lineage
-    orf_to_lineage = OrderedDict()
+    gene_to_lineage = OrderedDict()
 
     missing_lineage = list()
-    for id_orf, id_source in tqdm(orf_to_source.items(), desc="Retrieving lineage", unit = " ORFs"):
+    for id_gene, id_source in tqdm(gene_to_source.items(), desc="Retrieving lineage", unit = " genes"):
         if id_source in df_source_taxonomy.index:
             lineage = df_source_taxonomy.loc[id_source, ["class", "order", "family", "genus", "species"]] #  class   order   family  genus   species
+            lineage = lineage.fillna("")
             lineage = ";".join(map(lambda items: "".join(items), zip(["c__", "o__", "f__", "g__", "s__"], lineage)))
-            orf_to_lineage[id_orf] = lineage
+            gene_to_lineage[id_gene] = lineage
         else:
             missing_lineage.append(id_source)
-    orf_to_lineage = pd.Series(orf_to_lineage)
+    gene_to_lineage = pd.Series(gene_to_lineage)
 
     if len(missing_lineage):
         warnings.warn("The following source identifiers are not in the database file: {}\n{}`".format(
@@ -118,31 +128,47 @@ def main(args=None):
         )
 
     # Output
-    # ["id_orf", "id_mag", "bitscore", "lineage"]
-    df_orf_classifications = pd.concat([
-        orf_to_scaffold.to_frame("id_scaffold"),
-        orf_to_mag.to_frame("id_mag"),
-        orf_to_target.to_frame("id_target"),
-        orf_to_source.to_frame("id_source"),
-        orf_to_lineage.to_frame("lineage"),
-        orf_to_bitscore.to_frame("bitscore"),
-    ],
-    axis=1)
-    df_orf_classifications.index.name = "id_gene"
+    df_gene_classifications = pd.DataFrame({
+        "id_scaffold":gene_to_scaffold,
+        "id_genome":gene_to_genome,
+        "id_target":gene_to_target,
+        "id_source":gene_to_source,
+        "lineage":gene_to_lineage,
+        "bitscore":gene_to_bitscore,
+    })
+    df_gene_classifications.index.name = "id_gene"
+
+
+    # df_gene_classifications = pd.concat([
+    #     gene_to_scaffold.to_frame("id_scaffold"),
+    #     gene_to_genome.to_frame("id_genome"),
+    #     gene_to_target.to_frame("id_target"),
+    #     gene_to_source.to_frame("id_source"),
+    #     gene_to_lineage.to_frame("lineage"),
+    #     gene_to_bitscore.to_frame("bitscore"),
+    # ],
+    # axis=1)
+    # df_gene_classifications.index.name = "id_gene"
 
 
     # Add clusters if provided
     if opts.clusters:
         if opts.clusters != "None": # Hack for when called internally 
-            mag_to_cluster = pd.read_csv(opts.clusters, sep="\t", index_col=0, header=None).iloc[:,0]
-            orf_to_cluster = orf_to_mag.map(lambda id_orf: mag_to_cluster[id_orf])
-            df_orf_classifications.insert(loc=2, column="id_cluster", value=orf_to_cluster)
+            genome_to_cluster = pd.read_csv(opts.clusters, sep="\t", index_col=0, header=None).iloc[:,0]
+            gene_to_cluster = gene_to_genome.map(lambda id_gene: genome_to_cluster[id_gene])
+            df_gene_classifications.insert(loc=2, column="id_cluster", value=gene_to_cluster)
 
     # Output
     if opts.output == "stdout":
         opts.output = sys.stdout
-    df_orf_classifications = df_orf_classifications.dropna(how="any", axis=0)
-    df_orf_classifications.to_csv(opts.output, sep="\t", header=bool(opts.header))
+    if opts.remove_genes_with_missing_values:
+        df_gene_classifications = df_gene_classifications.dropna(how="any", axis=0)
+
+    if not opts.use_original_metaeuk_gene_identifiers:
+        metaeuk_to_gene = df_metaeuk["gene_id"].to_dict()
+        df_gene_classifications.index = df_gene_classifications.index.map(lambda x: metaeuk_to_gene[x])
+
+    df_gene_classifications.to_csv(opts.output, sep="\t", header=bool(opts.header))
 
   
 
diff --git a/src/scripts/compile_prokaryotic_genome_cluster_classification_scores_table.py b/src/scripts/compile_prokaryotic_genome_cluster_classification_scores_table.py
index 1e4a031..3ee1147 100755
--- a/src/scripts/compile_prokaryotic_genome_cluster_classification_scores_table.py
+++ b/src/scripts/compile_prokaryotic_genome_cluster_classification_scores_table.py
@@ -30,7 +30,6 @@ def main(argv=None):
     parser_io.add_argument("--fill_missing_weight", type=float,  help = "Fill missing weight between [0, 100.0].  [Default is to throw error if value is missing]")
     parser_io.add_argument("--header", action="store_true", help = "Include header")
 
-
     # Options
     opts = parser.parse_args()
     opts.script_directory  = script_directory
diff --git a/src/scripts/compile_reads_table.py b/src/scripts/compile_reads_table.py
index 3b3edd8..8075113 100755
--- a/src/scripts/compile_reads_table.py
+++ b/src/scripts/compile_reads_table.py
@@ -7,7 +7,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.8.28"
+__version__ = "2023.12.18"
 
 def parse_basename(query: str, naming_scheme: str):
     """
@@ -43,6 +43,7 @@ def main(args=None):
     parser_preprocess_directory = parser.add_argument_group('[Mode 1] Preprocess Directory arguments')
     parser_preprocess_directory.add_argument("-i","--preprocess_directory",  type=str, help = "path/to/preprocess directory (e.g., veba_output/preprocess) [Cannot be used with --fastq_directory]")
     parser_preprocess_directory.add_argument("-b","--basename", default="cleaned", type=str, help = "File basename to search VEBA preprocess directory [preprocess_directory]/[id_sample]/[output]/[basename]_1/2.fastq.gz [Default: cleaned]")
+    parser_preprocess_directory.add_argument("-L","--long", action="store_true", help = "Use if reads are ONT or PacBio")
 
     parser_fastq_directory = parser.add_argument_group('[Mode 2] Fastq Directory arguments')
     parser_fastq_directory.add_argument("-f","--fastq_directory",  type=str, help = "path/to/fastq_directory [Cannot be used with --preprocess_directory]")
@@ -55,6 +56,7 @@ def main(args=None):
     parser_output.add_argument("-0", "--sample_label", default="sample-id", type=str, help = "Sample ID column label [Reverse: sample-id]")
     parser_output.add_argument("-1", "--forward_label", default="forward-absolute-filepath", type=str, help = "Forward filepath column label [Default: forward-absolute-filepath]")
     parser_output.add_argument("-2", "--reverse_label", default="reverse-absolute-filepath", type=str, help = "Reverse filepath column label [Default: reverse-absolute-filepath]")
+    parser_output.add_argument("-3", "--long_label", default="reads-filepath", type=str, help = "Long reads filepath column label [Default: reads-filepath]")
     parser_output.add_argument("--header", action="store_true", help = "Write header")
     parser_output.add_argument("--volume_prefix", type=str, help = "Docker container prefix to volume path")
 
@@ -69,27 +71,41 @@ def main(args=None):
     output = defaultdict(dict)
     # Build table from preprocess directory
     if opts.preprocess_directory:
-        for fp in glob.glob(os.path.join(opts.preprocess_directory, "*", "output", "{}_1.fastq.gz".format(opts.basename))):
-            id_sample = fp.split("/")[-3]
-            output[id_sample][opts.forward_label] = fp
-        for fp in glob.glob(os.path.join(opts.preprocess_directory, "*", "output", "{}_2.fastq.gz".format(opts.basename))):
-            id_sample = fp.split("/")[-3]
-            output[id_sample][opts.reverse_label] = fp
-    # Build table from fastq directory
-    if opts.fastq_directory:
-        for fp in glob.glob(os.path.join(opts.fastq_directory, "*.{}".format(opts.extension))):
-            basename = fp.split("/")[-1]
-            id_sample, direction = parse_basename(basename, naming_scheme=opts.naming_scheme)
-            # id_sample = "_R".join(basename.split("_R")[:-1])
-            if direction == "1":
+        if not opts.long:
+            for fp in glob.glob(os.path.join(opts.preprocess_directory, "*", "output", "{}_1.fastq.gz".format(opts.basename))):
+                id_sample = fp.split("/")[-3]
                 output[id_sample][opts.forward_label] = fp
-            if direction == "2":
+            for fp in glob.glob(os.path.join(opts.preprocess_directory, "*", "output", "{}_2.fastq.gz".format(opts.basename))):
+                id_sample = fp.split("/")[-3]
                 output[id_sample][opts.reverse_label] = fp
-    df_output = pd.DataFrame(output).T.sort_index().loc[:,[opts.forward_label, opts.reverse_label]]
+        else:
+            for fp in glob.glob(os.path.join(opts.preprocess_directory, "*", "output", "{}.fastq.gz".format(opts.basename))):
+                id_sample = fp.split("/")[-3]
+                output[id_sample][opts.long_label] = fp
+            
+    # Build table from fastq directory
+    if opts.fastq_directory:
+        if not opts.long:
+            for fp in glob.glob(os.path.join(opts.fastq_directory, "*.{}".format(opts.extension))):
+                basename = fp.split("/")[-1]
+                id_sample, direction = parse_basename(basename, naming_scheme=opts.naming_scheme)
+                # id_sample = "_R".join(basename.split("_R")[:-1])
+                if direction == "1":
+                    output[id_sample][opts.forward_label] = fp
+                if direction == "2":
+                    output[id_sample][opts.reverse_label] = fp
+        else:
+            print("Long reads support with -L is currently only available with --preprocess_directory and not --fastq_directory", file=sys.stderr)
+            sys.exit(1)
+
+    if not opts.long:
+        df_output = pd.DataFrame(output).T.sort_index().loc[:,[opts.forward_label, opts.reverse_label]]
+    else:
+        df_output = pd.DataFrame(output).T.sort_index().loc[:,[opts.long_label]]
     df_output.index.name = opts.sample_label
     
     # Check missing values
-    missing_values = df_output.notnull().sum(axis=1)[lambda x: x < 2].index
+    missing_values = df_output.notnull().sum(axis=1)[lambda x: x < df_output.shape[1]].index
     assert missing_values.size == 0, "Missing fastq for the following samples: {}".format(missing_values.index)
     
     # Absolute paths
@@ -97,10 +113,14 @@ def main(args=None):
         df_output = df_output.applymap(lambda fp: os.path.abspath(fp))
     else:
         if opts.header:
-            if "absolute" in opts.forward_label.lower():
-                print("You've selected --relative and may want to either not use a header or remove 'absolute' from the --forward_label: {}".format(opts.forward_label), file=sys.stderr)
-            if "absolute" in opts.reverse_label.lower():
-                print("You've selected --relative and may want to either not use a header or remove 'absolute' from the --reverse_label: {}".format(opts.reverse_label), file=sys.stderr)
+            if not opts.long:
+                if "absolute" in opts.forward_label.lower():
+                    print("You've selected --relative and may want to either not use a header or remove 'absolute' from the --forward_label: {}".format(opts.forward_label), file=sys.stderr)
+                if "absolute" in opts.reverse_label.lower():
+                    print("You've selected --relative and may want to either not use a header or remove 'absolute' from the --reverse_label: {}".format(opts.reverse_label), file=sys.stderr)
+            else:
+                if "absolute" in opts.long_label.lower():
+                    print("You've selected --relative and may want to either not use a header or remove 'absolute' from the --long_label: {}".format(opts.long_label), file=sys.stderr)
 
     # Docker volume prefix
     if opts.volume_prefix:
diff --git a/src/scripts/concatenate_assembly.py b/src/scripts/concatenate_assembly.py
new file mode 100755
index 0000000..bcec4ff
--- /dev/null
+++ b/src/scripts/concatenate_assembly.py
@@ -0,0 +1,99 @@
+#!/usr/bin/env python
+import sys, os, argparse, gzip 
+from Bio.SeqIO.FastaIO import SimpleFastaParser
+from tqdm import tqdm
+
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.12.18"
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <input.fasta> -o <output.fasta>)".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+
+    # Pipeline
+    parser.add_argument("-i","--input", default="stdin", type=str, help = "Input fasta file")
+    parser.add_argument("-o","--output", default="stdout", type=str, help = "Output fasta file")
+    parser.add_argument("-n", "--name", type=str, required=True, help = "Name to use for pseudo-scaffold")
+    parser.add_argument("-N", "--pad", type=int, default=100, help = "Number of N to use for joining contigs")
+    parser.add_argument("-d", "--description", type=str, help = "Description to use [Default: Input filepath]")
+    parser.add_argument("-m","--minimum_sequence_length", default=1, type=int, help = "Minimum sequence length accepted [Default: 1]")
+    parser.add_argument("-w","--wrap", default=1000, type=int, help = "Wrap fasta. Use 0 for no wrapping [Default: 1000]")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    assert opts.minimum_sequence_length > 0
+    assert opts.pad >= 0
+
+    # Input
+    f_in = None
+    if opts.input == "stdin":
+        f_in = sys.stdin 
+    else:
+        if opts.input.endswith(".gz"):
+            f_in = gzip.open(opts.input, "rt")
+        else:
+            f_in = open(opts.input, "r")
+    assert f_in is not None
+
+    # Output
+    f_out = None
+    if opts.output == "stdout":
+        f_out = sys.stdout 
+    else:
+        if opts.output.endswith(".gz"):
+            f_out = gzip.open(opts.output, "wt")
+        else:
+            f_out = open(opts.output, "w")
+    assert f_out is not None
+    
+    # Concatenated assembly
+
+    if not opts.description:
+        opts.description = "assembly_filepath: {}".format(opts.input)
+    else:
+        if opts.description == "NONE":
+            opts.description = ""
+    pseudoscaffold_header = "{} {}".format(opts.name, opts.description)
+
+    print(">{}".format(pseudoscaffold_header), file=f_out)
+    sequences = list()
+    for header, seq in tqdm(SimpleFastaParser(f_in), "Reading fasta input"):
+        if len(seq) >= opts.minimum_sequence_length:
+            sequences.append(seq)
+    number_of_sequences = len(sequences)
+    sequences = ("N"*opts.pad).join(sequences)
+    
+    # Open output file
+    if opts.wrap > 0:
+        for i in range(0, len(sequences), opts.wrap):
+            wrapped_sequence = sequences[i:i+opts.wrap]
+            # Write header and wrapped sequence
+            print(wrapped_sequence, file=f_out)
+    else:
+        print(sequences, file=f_out)
+
+
+    # Close
+    if f_in != sys.stdin:
+        f_in.close()
+    if f_out != sys.stdout:
+        f_out.close()
+
+if __name__ == "__main__":
+    main()
+    
+                
+
diff --git a/src/scripts/concatenate_fasta.py b/src/scripts/concatenate_fasta.py
index 0977942..38d12ae 100755
--- a/src/scripts/concatenate_fasta.py
+++ b/src/scripts/concatenate_fasta.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 from __future__ import print_function, division
-import sys, os, argparse
+import sys, os, argparse, hashlib
 import pandas as pd
 from Bio.SeqIO.FastaIO import SimpleFastaParser
 
@@ -12,45 +12,7 @@
 pd.options.display.max_colwidth = 100
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2022.02.17"
-
-
-
-def fasta_to_saf(path, compression="infer"):
-    """
-    #     GeneID	Chr	Start	End	Strand
-    # http://bioinf.wehi.edu.au/featureCounts/
-
-    # Useful:
-    import re
-    record_id = "lcl|NC_018632.1_cds_WP_039228897.1_1 [gene=dnaA] [locus_tag=MASE_RS00005] [protein=chromosomal replication initiator protein DnaA] [protein_id=WP_039228897.1] [location=410..2065] [gbkey=CDS]"
-    re.search("\[locus_tag=(\w+)\]", record_id).group(1)
-    # 'MASE_RS00005'
-
-    """
-
-
-    saf_data = list()
-
-    if path == "stdin":
-        f = sys.stdin
-    else:
-        f = get_file_object(path, mode="read", compression=compression, verbose=False)
-        
-    for id_record, seq in pv(SimpleFastaParser(f), "Reading sequences [{}]".format(path)):
-        id_record = id_record.split(" ")[0]
-        fields = [
-            id_record, 
-            id_record, 
-            1, 
-            len(seq),
-            "+",
-        ]
-        saf_data.append(fields)
-    if f is not sys.stdin:
-        f.close()
-    return pd.DataFrame(saf_data, columns=["GeneID", "Chr", "Start", "End", "Strand"])
-
+__version__ = "2023.12.13"
 
 def main(args=None):
     # Path info
@@ -120,7 +82,15 @@ def main(args=None):
                     safe_mode=False, 
                     verbose=False,
                 )
+
                 saf_filepath = os.path.join(opts.output_directory, "{}.saf".format(id_sample))
+
+                f_duplicates = get_file_object(
+                    path=os.path.join(opts.output_directory, "{}.duplicates_removed.list".format(id_sample)),
+                    mode="write", 
+                    safe_mode=False, 
+                    verbose=False,
+                )
             else:
                 os.makedirs(os.path.join(opts.output_directory, id_sample), exist_ok=True)
 
@@ -130,29 +100,43 @@ def main(args=None):
                     safe_mode=False, 
                     verbose=False,
                 )
+
                 saf_filepath = os.path.join(opts.output_directory, id_sample, "{}.saf".format(opts.basename))
 
+                f_duplicates = get_file_object(
+                    path=os.path.join(opts.output_directory, id_sample, "{}.duplicates_removed.list".format(opts.basename)),
+                    mode="write", 
+                    safe_mode=False, 
+                    verbose=False,
+                )
             
             # Read input fasta, filter out short sequences, and write to concatenated file
+            sequence_hashes = set()
             saf_data = list()
             for fp in pv(filepaths, description=id_sample, unit= " files"):
                 f_query = get_file_object(fp, mode="read", verbose=False)
                 for id, seq in SimpleFastaParser(f_query):
                     if len(seq) >= opts.minimum_contig_length:
-                        print(">{}\n{}".format(id, seq), file=f_out)
+                        id_hash = hashlib.md5(seq.upper().encode()).hexdigest()
                         id_record = id.split(" ")[0]
-                        fields = [
-                            id_record, 
-                            id_record, 
-                            1, 
-                            len(seq),
-                            "+",
-                        ]
-                        saf_data.append(fields)
+                        if id_hash not in sequence_hashes:
+                            print(">{}\n{}".format(id, seq), file=f_out)
+                            fields = [
+                                id_record, 
+                                id_record, 
+                                1, 
+                                len(seq),
+                                "+",
+                            ]
+                            saf_data.append(fields)
+                            sequence_hashes.add(id_hash)
+                        else:
+                            print(id_record, file=f_duplicates)
 
                 f_query.close()
 
             f_out.close()
+            f_duplicates.close()
 
             df_saf = pd.DataFrame(saf_data, columns=["GeneID", "Chr", "Start", "End", "Strand"])
             df_saf.to_csv(saf_filepath, sep="\t", index=None)
@@ -173,26 +157,39 @@ def main(args=None):
 
         saf_filepath = os.path.join(opts.output_directory, "{}.saf".format(opts.basename))
 
+        f_duplicates = get_file_object(
+            path=os.path.join(opts.output_directory, "{}.duplicates_removed.list".format(opts.basename)),
+            mode="write", 
+            safe_mode=False, 
+            verbose=False,
+        )
+
         # Read input fasta, filter out short sequences, and write to concatenated file
+        sequence_hashes = set()
         saf_data = list()
         for fp in pv(filepaths, unit= " files"):
             f_query = get_file_object(fp, mode="read", verbose=False)
             for id, seq in SimpleFastaParser(f_query):
                 if len(seq) >= opts.minimum_contig_length:
-                    print(">{}\n{}".format(id, seq), file=f_out)
+                    id_hash = hashlib.md5(seq.upper().encode()).hexdigest()
                     id_record = id.split(" ")[0]
-                    fields = [
-                        id_record, 
-                        id_record, 
-                        1, 
-                        len(seq),
-                        "+",
-                    ]
-                    saf_data.append(fields)
-
+                    if id_hash not in sequence_hashes:
+                        print(">{}\n{}".format(id, seq), file=f_out)
+                        fields = [
+                            id_record, 
+                            id_record, 
+                            1, 
+                            len(seq),
+                            "+",
+                        ]
+                        saf_data.append(fields)
+                    else:
+                        print(id_record, file=f_duplicates)
             f_query.close()
 
         f_out.close()
+        f_duplicates.close()
+
         df_saf = pd.DataFrame(saf_data, columns=["GeneID", "Chr", "Start", "End", "Strand"])
         df_saf.to_csv(saf_filepath, sep="\t", index=None)
 
diff --git a/src/scripts/consensus_genome_classification_ranked.py b/src/scripts/consensus_genome_classification_ranked.py
new file mode 100755
index 0000000..2c190fa
--- /dev/null
+++ b/src/scripts/consensus_genome_classification_ranked.py
@@ -0,0 +1,222 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse
+from collections import OrderedDict, defaultdict
+import pandas as pd
+import numpy as np
+
+
+pd.options.display.max_colwidth = 100
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.3"
+
+# RANK_TO_PREFIX="superkingdom:d__,phylum:p__,class:c__,order:o__,family:f__,genus:g__,species:s__"
+
+RANK_PREFIXES="d__,p__,c__,o__,f__,g__,s__"
+
+# Fill empty taxonomic levels for consensus classification
+def fill_lower_taxonomy_levels(
+    classifications:pd.Series,
+    rank_prefixes:list,
+    delimiter:str=";",
+    ):
+    
+    rank_prefixes = list(rank_prefixes)
+    number_of_taxonomic_levels = len(rank_prefixes)
+    classifications_ = dict()
+    for id_genome, classification in pd.Series(classifications).items():
+        taxonomy = classification.split(delimiter)
+        classifications_[id_genome] = delimiter.join(taxonomy + rank_prefixes[len(taxonomy):])
+    return pd.Series(classifications_)[classifications.index]
+
+# Get consensus classification
+def get_consensus_classification(
+    classification:pd.Series, 
+    classification_weights:pd.Series,
+    genome_to_genomecluster:pd.Series, 
+    rank_prefixes:list,
+    number_of_taxonomic_levels="infer",
+    delimiter=";",
+    leniency:float=1.382,
+    ):
+    # Assertions
+    assert np.all(classification.notnull())
+    assert np.all(classification_weights.notnull())
+    assert np.all(genome_to_genomecluster.notnull())
+    
+    # Set and index overlap
+    a = set(classification.index)
+    b = set(classification_weights.index)
+    c = set(genome_to_genomecluster.index)
+    assert a == b, "`classification` and `classification_weights` must  have the same keys in the index"
+    assert a <= c, "`classification` and `classification_weights` must be a subset (or equal) to the keys in `genome_to_genomecluster` index"
+    index_genomes = pd.Index(sorted(a & b & c ))
+    classification = classification[index_genomes]
+    classification_weights = classification_weights[index_genomes]
+    genome_to_genomecluster = genome_to_genomecluster[index_genomes]
+    
+    # Taxonomic levels
+    taxonomic_levels = classification.map(lambda x: x.count(delimiter)).unique()
+    if len(taxonomic_levels):
+        assert len(taxonomic_levels) == 1, "Taxonomic levels in `classification` should all have the same number of delimiters" #! Might need to change this to allow for missing taxonomic levels
+    else:
+        number_of_taxonomic_levels = 1
+
+    if number_of_taxonomic_levels == "infer":
+        number_of_taxonomic_levels = taxonomic_levels[0] + 1
+
+    # Scaling factors
+    scaling_factors = np.arange(1, number_of_taxonomic_levels + 1) # d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Actinomycetales;f__Dermabacteraceae;g__Brachybacterium
+    scaling_factors = np.power(scaling_factors, leniency)
+    
+    # Get container for scores [SLC -> Taxonomy -> Score]
+    #
+    # For example the following MAG: 
+    # CLASSIFICATION=d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium aurimucosum_E
+    # MSA_PERCENT=80.0
+    #
+    # Would be stored and appended for it's corresponding SLC:
+    # d__Bacteria += 80.0
+    # d__Bacteria;p__Actinobacteriota += 80.0
+    # d__Bacteria;p__Actinobacteriota;c__Actinomycetia += 80.0
+    # d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Mycobacteriales += 80.0
+    # d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Mycobacteriales;f__Mycobacteriaceae += 80.0
+    # d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium += 80.0
+    # d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Mycobacteriales;f__Mycobacteriaceae;g__Corynebacterium;s__Corynebacterium aurimucosum_E += 80.0
+    genomecluster_taxa_scores = defaultdict(lambda: defaultdict(float))
+    
+    # Iterate through MAG, classification, and score
+    df = pd.concat([genome_to_genomecluster.to_frame("id"), classification.to_frame("classification"), classification_weights.to_frame("weight")], axis=1)
+    genomecluster_to_genomes = defaultdict(list)
+    for id_genome, (id_genome_cluster, classification, w) in df.iterrows():
+        genomecluster_to_genomes[id_genome_cluster].append(id_genome)
+        # Split the taxonomy classification by levels
+        levels = classification.split(delimiter)
+        # Remove the empty taxonomy levels (e.g., g__Corynebacterium;s__ --> g__Corynebacterium)
+        # levels = list(filter(lambda x:x not in rank_prefixes, levels))
+        number_of_query_levels = len(levels)
+        # Iterate through each level, scale score by the leniency weights, and add to running sum
+        for i in range(1, number_of_query_levels + 1):
+            taxon_at_level = levels[i-1]
+            taxon_level_is_missing = taxon_at_level in rank_prefixes
+            if taxon_level_is_missing:
+                weighted_score = 0.0
+                print("`{}` is missing taxonomic level `{}`".format(id_genome, taxon_at_level), file=sys.stderr)
+
+            else:
+                weighted_score = float(w) * scaling_factors[i-1]
+            genomecluster_taxa_scores[id_genome_cluster][tuple(levels[:i])] += weighted_score
+    genomecluster_to_genomes = pd.Series(genomecluster_to_genomes)
+
+    # Build datafarme
+    genomecluster_taxa_scores = pd.Series(genomecluster_taxa_scores)
+    df_consensus_classification = pd.DataFrame(genomecluster_taxa_scores.map(lambda taxa_scores: sorted(taxa_scores.items(), key=lambda x:(x[1], len(x[0])), reverse=True)[0]).to_dict(), index=["consensus_classification", "score"]).T
+    df_consensus_classification["consensus_classification"] = df_consensus_classification["consensus_classification"].map(";".join)
+    df_consensus_classification["number_of_unique_classifications"] = df["classification"].groupby(genome_to_genomecluster).apply(lambda x: len(set(x)))
+    df_consensus_classification["number_of_components"] = genomecluster_to_genomes.map(len) #df["classification"].groupby(genome_to_genomecluster).apply(len)
+    df_consensus_classification["components"] = genomecluster_to_genomes
+    df_consensus_classification["classifications"] = df["classification"].groupby(genome_to_genomecluster).apply(lambda x: list(x))
+    df_consensus_classification["weights"] = df["weight"].groupby(genome_to_genomecluster).apply(lambda x: list(x))
+    df_consensus_classification.index.name = "id"
+    
+    # Homogeneity
+    slc_taxa_homogeneity = defaultdict(lambda: defaultdict(float))
+    for id_genome_cluster, (classifications, weights) in df_consensus_classification[["classifications", "weights"]].iterrows():
+        for (c, w) in zip(classifications, weights):
+            slc_taxa_homogeneity[id_genome_cluster][c] += w
+    df_consensus_classification["homogeneity"] = pd.DataFrame(slc_taxa_homogeneity).T.apply(lambda x: np.nanmax(x)/np.nansum(x), axis=1)
+        
+    fields = [
+        "consensus_classification", 
+        "homogeneity", 
+        "number_of_unique_classifications",
+        "number_of_components",
+        "components",
+        "classifications", 
+        "weights", 
+        "score",
+    ]
+    return df_consensus_classification.loc[:,fields]
+
+
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <input> -o <output>".format(__program__)
+    epilog = "Copyright 2022 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser.add_argument("-i","--input", default="stdin", type=str, help = "path/to/genome_to_classification.tsv  [id_genome]<tab>[id_genome_cluster]<tab>[classification]<tab>[weight]; No header. [Default: stdin]")
+    parser.add_argument("-o","--output", type=str, default="stdout", help = "Output table with consensus classification [Default: stdout]")
+    parser.add_argument("-l","--leniency", default=1.382, type=float, help = "Leniency parameter. Lower value means more conservative weighting. A value of 1 indiciates no weight bias. A value greater than 1 puts higher weight on higher level taxonomic assignments. A value less than 1 puts lower weights on higher level taxonomic assignments.  [Default: 1.382]")
+    parser.add_argument("-r", "--rank_prefixes", type=str, default=RANK_PREFIXES, help = "Rank prefixes separated by , delimiter'\n[Default: {}]".format(RANK_PREFIXES))
+    parser.add_argument("-d", "--delimiter", type=str, default=";", help = "Taxonomic delimiter [Default: ; ]")
+    parser.add_argument("-s", "--simple", action="store_true", help = "Simple classification that does not use lineage information from --rank_prefixes")
+    # parser.add_argument("--assert_resolved_taxonomy", action="store_true", help = "Do not allow missing taxonomic levels. (e.g., d__Eukaryota;p__;c__Pelagophyceae;o__Pelagomonadales;f__;g__Aureococcus;s__Aureococcus anophagefferens is missing phylum)")
+    parser.add_argument("--remove_missing_classifications", action="store_true", help = "Remove all classifications and weights that are  null.  For viruses this could cause an error if this isn't selected.")
+    parser.add_argument("-u", "--unclassified_label", default="Unclassified", type=str, help = "Unclassified label [Default: Unclassified]")
+    parser.add_argument("-w", "--unclassified_weight", default=100.0,type=float, help = "Unclassified label weight [Default: 100.0]")
+
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    # I/O
+    if opts.input == "stdin":
+        opts.input = sys.stdin
+
+    if opts.output == "stdout":
+        opts.output = sys.stdout
+
+    # Leniency 
+    assert opts.leniency > 0, "--leniency must be > 0"
+    # Format rank to lineage
+    opts.rank_prefixes = opts.rank_prefixes.strip().split(",")
+
+    # Classifications
+    df_input = pd.read_csv(opts.input, sep="\t", index_col=0, header=None)
+    genome_to_genomecluster = df_input.iloc[:,0]
+    genome_to_classification = df_input.iloc[:,1].reindex(genome_to_genomecluster.index)
+    genome_to_weights = df_input.iloc[:,2].reindex(genome_to_genomecluster.index)
+    if opts.remove_missing_classifications:
+        genome_to_weights = genome_to_weights.dropna()
+        genome_to_classification = genome_to_classification[genome_to_weights.index]
+    else:
+        mask = genome_to_weights.isnull()
+        genome_to_classification[mask] = ";".join(map(lambda x: f"{x}__{opts.unclassified_label}", opts.rank_prefixes))
+        genome_to_weights[mask] = opts.unclassified_weight
+        
+        
+    # Consensus classification
+    df_consensus_classification = get_consensus_classification(
+        classification=genome_to_classification, 
+        classification_weights=genome_to_weights,
+        genome_to_genomecluster=genome_to_genomecluster,
+        rank_prefixes=opts.rank_prefixes,
+        number_of_taxonomic_levels="infer",
+        delimiter=opts.delimiter,
+        leniency=opts.leniency,
+    )
+
+    if not opts.simple:
+        # Fill empty taxonomy levels
+        df_consensus_classification["consensus_classification"] = fill_lower_taxonomy_levels(
+            classifications=df_consensus_classification["consensus_classification"],
+            rank_prefixes=opts.rank_prefixes,
+            delimiter=opts.delimiter,
+        )
+
+    df_consensus_classification.to_csv(opts.output, sep="\t")
+
+if __name__ == "__main__":
+    main()
diff --git a/src/scripts/consensus_genome_classification.py b/src/scripts/deprecated/consensus_genome_classification.py
similarity index 100%
rename from src/scripts/consensus_genome_classification.py
rename to src/scripts/deprecated/consensus_genome_classification.py
diff --git a/src/scripts/mmseqs2_wrapper.py b/src/scripts/deprecated/mmseqs2_wrapper.py
similarity index 100%
rename from src/scripts/mmseqs2_wrapper.py
rename to src/scripts/deprecated/mmseqs2_wrapper.py
diff --git a/src/scripts/devel/compile_phylogenomic_functional_categories.py b/src/scripts/devel/compile_phylogenomic_functional_categories.py
new file mode 100755
index 0000000..10f45f8
--- /dev/null
+++ b/src/scripts/devel/compile_phylogenomic_functional_categories.py
@@ -0,0 +1,147 @@
+#!/usr/bin/env python
+from __future__ import print_function, division
+import sys, os, argparse, glob, pickle
+from collections import defaultdict
+# import numpy as np
+import pandas as pd
+from tqdm import tqdm 
+import ensemble_networkx as enx
+
+pd.options.display.max_colwidth = 100
+# from tqdm import tqdm
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.10.23"
+
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <annotation_results.tsv[.gz]> -l genome -o <output_table>".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+    # Pipeline
+    parser.add_argument("-i","--annotation_results",  type=str, default = "stdin", help = "path/to/annotation.tsv from annotate.py [Default: stdin]")
+    parser.add_argument("-X","--counts",  type=str, required = True, help = "path/to/X_orfs.tsv[.gz] from mapping.py at the ORF/gene/protein level.  Rows=Samples, Columns=Genes")
+    parser.add_argument("-g","--genes",  type=str, help = "path/to/genes.ffn[.gz] fasta used for scaling-factors")
+    parser.add_argument("-o","--output_directory", type=str, default="phylogenomic_functional_categories", help = "path/to/output_directory [Default: phylogenomic_functional_categories]")
+    parser.add_argument("-l","--level", type=str, default="genome_cluster", help = "level {genome, genome_cluster} [Default: genome_cluster]")
+    parser.add_argument("--minimum_count", type=float, default=1.0, help = "Minimum count to include gene [Default: 1 ]")
+    parser.add_argument("--veba_database", type=str,  help = "VEBA Database [Default: $VEBA_DATABASE environment variable]")
+
+    # parser.add_argument("-p", "--include_protein_identifiers", action="store_true", help = "Write protein identifiers")
+    # parser.add_argument("--header", action="store_true", help = "Write header")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    assert opts.level in {"genome", "genome_cluster"}, "--level must be either {genome, genome_cluster}"
+
+    if opts.level == "genome":
+        level_field = ("Identifiers", "id_genome")
+    if opts.level == "genome_cluster":
+        level_field = ("Identifiers", "id_genome_cluster")
+
+    if not opts.veba_database:
+        opts.veba_database = os.environ["VEBA_DATABASE"]
+
+    os.makedirs(opts.output_directory, exist_ok=True)
+    # os.makedirs(os.path.join(opts.output_directory, opts.level), exist_ok=True)
+
+    # Read annotations
+    if opts.annotation_results == "stdin":
+        opts.annotation_results = sys.stdin 
+    df_annotations = pd.read_csv(opts.annotation_results, sep="\t", index_col=0, header=[0,1])
+    protein_to_organism  = df_annotations[level_field]
+
+    # KEGG Database
+    delimiters = [",","_","-","+"]
+
+    # Load MicrobeAnnotator KEGG dictionaries
+    module_to_kos__unprocessed = defaultdict(set)
+    for fp in glob.glob(os.path.join(opts.veba_database, , "*.pkl")):
+        with open(fp, "rb") as f:
+            d = pickle.load(f)
+            
+        for id_module, v1 in d.items():
+            if isinstance(v1, list):
+                try:
+                    module_to_kos__unprocessed[id_module].update(v1)
+                except TypeError:
+                    for v2 in v1:
+                        module_to_kos__unprocessed[id_module].update(v2)
+            else:
+                for k2, v2 in v1.items():
+                    if isinstance(v2, list):
+                        try:
+                            module_to_kos__unprocessed[id_module].update(v2)
+                        except TypeError:
+                            for v3 in v2:
+                                module_to_kos__unprocessed[id_module].update(v3)
+
+    # Flatten the KEGG orthologs
+    module_to_kos = dict()
+    for id_module, kos_unprocessed in module_to_kos__unprocessed.items():
+        kos_processed = set()
+        for id_ko in kos:
+            composite=False
+            for sep in delimiters:
+                if sep in id_ko:
+                    id_ko = id_ko.replace(sep,";")
+                    composite = True
+            if composite:
+                kos_composite = set(map(str.strip, filter(bool, id_ko.split(";"))))
+                kos_processed.update(kos_composite)
+            else:
+                kos_processed.add(id_ko)
+        module_to_kos[id_module] = kos_processed
+
+    # Read counts
+    X_counts = pd.read_csv(opts.counts, sep="\t", index_col=0)
+
+    # Organisms
+    organisms = df_annotations[level_field].unique()
+
+    # Organizing KOs
+    organism_to_kos = defaultdict(set)
+    protein_to_kos = dict()
+    kos_global = list()
+    for id_protein, (id_organism, ko_ids) in tqdm(df_annotations.loc[:,[level_field, ("KOFAM", "ids")]].iterrows(), "Compiling KO identifiers", total=df_annotations.shape[0]):
+        ko_ids = eval(ko_ids)
+        if len(ko_ids):
+            ko_ids = set(ko_ids)
+            protein_to_kos[id_protein] = ko_ids
+            organism_to_kos[id_organism].update(ko_ids)
+            for id_ko in ko_ids:
+                kos_global.append([id_protein, id_organism, id_ko])
+    df_kos_global = pd.DataFrame(kos_global, columns=["id_protein", level_field[1], "id_kegg-ortholog"])
+    del kos_global
+    df_kos_global.to_csv(os.path.join(opts.output_directory, "kos.{}s.tsv".format(opts.level)), sep="\t", index=False)
+
+    # Sample -> Organisms -> KOs
+    sample_to_organism_to_kos = defaultdict(lambda: defaultdict(set))
+    for id_sample, row in X_counts.iterrows():
+        for id_protein, count in tqdm(row.items(), total=X_counts.shape[1]):
+            if id_protein in protein_to_kos:
+                if count >= opts.minimum_count:
+                    id_organism = protein_to_organism[id_protein]
+                    kos = protein_to_kos[id_protein]
+                    sample_to_organism_to_kos[id_sample][id_organism].update(kos)
+
+                    
+
+
+
+
+
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/scripts/devel/representative_genome_from_networkx_graph.py b/src/scripts/devel/representative_genome_from_networkx_graph.py
new file mode 100755
index 0000000..4a82f64
--- /dev/null
+++ b/src/scripts/devel/representative_genome_from_networkx_graph.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python
+import sys, os, argparse, gzip 
+from Bio.SeqIO.FastaIO import SimpleFastaParser
+from tqdm import tqdm
+
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.10"
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <input.fasta> -o <output.fasta>)".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+
+    # Pipeline
+    parser.add_argument("-c","--genome_to_cluster", type=str, help = "Input fasta file")
+    parser.add_argument("-g","--graph", type=str, help = "Input fasta file")
+    parser.add_argument("-o","--output", default="stdout", type=str, help = "Output fasta file")
+    parser.add_argument("-m","--maximum_weight", default=100, type=str, help = "Output fasta file")
+    parser.add_argument("--genome_statistics",  type=str, help = "Output fasta file")
+    parser.add_argument("--sort_by",  type=str, help = "Output fasta file")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+
+
+
+G = nx.path_graph(4)  # or DiGraph, MultiGraph, MultiDiGraph, etc
+H = G.subgraph([0, 1, 2])
+list(H.edges)
+[(0, 1), (1, 2)]
+
+
+
+
+
+if __name__ == "__main__":
+    main()
+    
+                
+
diff --git a/src/scripts/edgelist_to_clusters.py b/src/scripts/edgelist_to_clusters.py
index 061b60e..30650b8 100755
--- a/src/scripts/edgelist_to_clusters.py
+++ b/src/scripts/edgelist_to_clusters.py
@@ -8,7 +8,7 @@
 from Bio.SeqIO.FastaIO import SimpleFastaParser
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.4.17"
+__version__ = "2023.12.11"
 
 def main(args=None):
     # Path info
@@ -26,7 +26,7 @@ def main(args=None):
 
     parser.add_argument("-i","--input", type=str, default="stdin", help = "path/to/edgelist.tsv, No header. [id_1]<tab>[id_2] or [id_1]<tab>[id_2]<tab>[weight] [Default: stdin]") #  
     parser.add_argument("-o","--output", type=str, default="stdout", help = "path/to/clusters.tsv [Default: stdout]")
-    parser.add_argument("-t","--threshold", type=float, default=0.5,  help = "Minimum weight threshold. [Default: 0.5]") 
+    parser.add_argument("-t","--threshold", type=float, default=0.0,  help = "Minimum weight threshold. [Default: 0.0]") 
     parser.add_argument("-n", "--no_singletons", action="store_true", help = "Don't include self-interactions. Self-interactions will ensure unclustered genomes make it into the output")
     parser.add_argument("-b", "--basename", action="store_true", help = "Removes filepath prefix and extension.  Support for gzipped filepaths.")
     parser.add_argument("--identifiers", type=str, help = "Identifiers to include.  If missing identifiers and singletons are allowed, then they will be included as singleton clusters with weight of np.inf")
@@ -53,6 +53,7 @@ def main(args=None):
     opts.script_directory  = script_directory
     opts.script_filename = script_filename
 
+    
     # Input
     if opts.input == "stdin":
         opts.input = sys.stdin 
@@ -62,7 +63,11 @@ def main(args=None):
         opts.output = sys.stdout 
 
     # Edge list
-    df_edgelist = pd.read_csv(opts.input, sep="\t", header=None)
+    try:
+        df_edgelist = pd.read_csv(opts.input, sep="\t", header=None)
+    except pd.errors.EmptyDataError:
+        df_edgelist = pd.DataFrame(columns=["query", "reference"])
+
     assert df_edgelist.shape[1] in  {2,3}, "Must have 2 or 3 columns.  {} provided.".format(df_edgelist.shape[1])
     if opts.basename:
         def get_basename(x):
@@ -72,9 +77,13 @@ def get_basename(x):
             return ".".join(fn.split(".")[:-1])
         df_edgelist.iloc[:,:2] = df_edgelist.iloc[:,:2].applymap(get_basename)
 
-    edgelist = df_edgelist.iloc[:,:2].values.tolist()
-
-    identifiers = set.union(*map(set, edgelist))
+    # Identifiers from edgelist
+    if not df_edgelist.empty:
+        edgelist = df_edgelist.iloc[:,:2].values.tolist()
+        identifiers = set.union(*map(set, edgelist))
+    else:
+        edgelist = list()
+        identifiers = set()
 
     all_identifiers = identifiers
     if opts.identifiers:
@@ -84,6 +93,7 @@ def get_basename(x):
                 id = line.strip()
                 all_identifiers.add(id)
 
+    # Read in fasta
     if opts.fasta:
         id_to_sequence = dict()
         if opts.fasta.endswith(".gz"):
diff --git a/src/scripts/eukaryotic_gene_modeling_wrapper.py b/src/scripts/eukaryotic_gene_modeling_wrapper.py
index 83591e7..cd15774 100755
--- a/src/scripts/eukaryotic_gene_modeling_wrapper.py
+++ b/src/scripts/eukaryotic_gene_modeling_wrapper.py
@@ -13,7 +13,7 @@
 
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.16"
+__version__ = "2023.11.13"
 
 # Tiara
 def get_tiara_cmd(input_filepaths, output_filepaths, output_directory, directories, opts):
@@ -131,6 +131,7 @@ def get_metaeuk_cmd(input_filepaths, output_filepaths, output_directory, directo
         "--threads {}".format(opts.n_jobs),
         "-s {}".format(opts.metaeuk_sensitivity),
         "-e {}".format(opts.metaeuk_evalue),
+        "--split-memory-limit {}".format(opts.metaeuk_split_memory_limit),
         opts.metaeuk_options,
         os.path.join(directories["tmp"], "tmp.fasta"),
         opts.metaeuk_database, # db
@@ -1380,6 +1381,7 @@ def main(args=None):
     parser_metaeuk = parser.add_argument_group('MetaEuk arguments')
     parser_metaeuk.add_argument("--metaeuk_sensitivity", type=float, default=4.0, help="MetaEuk | Sensitivity: 1.0 faster; 4.0 fast; 7.5 sensitive  [Default: 4.0]")
     parser_metaeuk.add_argument("--metaeuk_evalue", type=float, default=0.01, help="MetaEuk | List matches below this E-value (range 0.0-inf) [Default: 0.01]")
+    parser_metaeuk.add_argument("--metaeuk_split_memory_limit", type=str, default="36G", help="MetaEuk | Set max memory per split. E.g. 800B, 5K, 10M, 1G. Use 0 to use all available system memory. (Default value is experimental) [Default: 36G]")
     parser_metaeuk.add_argument("--metaeuk_options", type=str, default="", help="MetaEuk | More options (e.g. --arg 1 ) [Default: ''] https://github.com/soedinglab/metaeuk")
 
     # Pyrodigal
diff --git a/src/scripts/filter_spades_assembly.py b/src/scripts/filter_spades_assembly.py
new file mode 100755
index 0000000..08351a0
--- /dev/null
+++ b/src/scripts/filter_spades_assembly.py
@@ -0,0 +1,100 @@
+#!/usr/bin/env python
+import sys, os, argparse, gzip 
+from Bio.SeqIO.FastaIO import SimpleFastaParser
+from tqdm import tqdm
+
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.12.5"
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <input.fasta> -o <output.fasta>)".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+
+    # Pipeline
+    parser.add_argument("-i","--input", default="stdin", type=str, help = "Input fasta file")
+    parser.add_argument("-o","--output", default="stdout", type=str, help = "Output fasta file")
+    parser.add_argument("-r","--retain_description", action="store_true", help = "Retain description")
+    parser.add_argument("-c","--minimum_coverage", default=0, type=int, help = "Minimum coverage accepted [Default: 0.0]")
+    parser.add_argument("-m","--minimum_sequence_length", default=1, type=int, help = "Minimum sequence length accepted [Default: 1]")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    assert opts.minimum_sequence_length > 0
+
+    # Input
+    f_in = None
+    if opts.input == "stdin":
+        f_in = sys.stdin 
+    else:
+        if opts.input.endswith(".gz"):
+            f_in = gzip.open(opts.input, "rt")
+        else:
+            f_in = open(opts.input, "r")
+    assert f_in is not None
+
+    # Output
+    f_out = None
+    if opts.output == "stdout":
+        f_out = sys.stdout 
+    else:
+        if opts.output.endswith(".gz"):
+            f_out = gzip.open(opts.output, "wt")
+        else:
+            f_out = open(opts.output, "w")
+    assert f_out is not None
+    
+    if opts.retain_description:
+        for header, seq in tqdm(SimpleFastaParser(f_in), "Reading fasta input"):
+            id = header.split(" ")[0].strip()
+            fields = id.split("_")
+            try:
+                length_index = fields.index("length")
+                coverage_index = fields.index("cov")
+            except ValueError:
+                raise "Your fastq identifiers do not look like they are from SPAdes: {}".format(id)
+                sys.exit(1)
+            assert ">" not in seq, "`{}` has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(header)
+            coverage = float(fields[coverage_index + 1])
+            length = int(fields[length_index + 1])
+            if all([coverage >= opts.minimum_coverage, length >= opts.minimum_sequence_length]):
+                print(">{}\n{}".format(header,seq), file=f_out)
+    else:
+        for header, seq in tqdm(SimpleFastaParser(f_in), "Reading fasta input"):
+            id = header.split(" ")[0].strip()
+            fields = id.split("_")
+            try:
+                length_index = fields.index("length")
+                coverage_index = fields.index("cov")
+            except ValueError:
+                raise "Your fastq identifiers do not look like they are from SPAdes: {}".format(id)
+                sys.exit(1)
+            assert ">" not in seq, "`{}` has a '>' character in the sequence which will cause an error.  This can arise from concatenating fasta files where a record is missing a final linebreak".format(header)
+            coverage = float(fields[coverage_index + 1])
+            length = int(fields[length_index + 1])
+            if all([coverage >= opts.minimum_coverage, length >= opts.minimum_sequence_length]):
+                print(">{}\n{}".format(id,seq), file=f_out)
+
+    # Close
+    if f_in != sys.stdin:
+        f_in.close()
+    if f_out != sys.stdout:
+        f_out.close()
+
+if __name__ == "__main__":
+    main()
+    
+                
+
diff --git a/src/scripts/global_clustering.py b/src/scripts/global_clustering.py
index 4783794..8e8b6f3 100755
--- a/src/scripts/global_clustering.py
+++ b/src/scripts/global_clustering.py
@@ -15,7 +15,7 @@
 
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.24"
+__version__ = "2023.12.8"
 
 def get_basename(x):
     _, fn = os.path.split(x)
@@ -50,13 +50,15 @@ def add_executables_to_environment(opts):
     """
     accessory_scripts = {
         "edgelist_to_clusters.py",
-        "mmseqs2_wrapper.py",
+        "clustering_wrapper.py",
         # "table_to_fasta.py",
     }
 
     required_executables={
+        "skani",
         "fastANI",
         "mmseqs",
+        "diamond",
      } 
   
 
@@ -97,7 +99,18 @@ def add_executables_to_environment(opts):
 # Configure parameters
 def configure_parameters(opts, directories):
     
-    assert_acceptable_arguments(opts.algorithm, {"easy-cluster", "easy-linclust"})
+    assert_acceptable_arguments(opts.protein_clustering_algorithm, {"easy-cluster", "easy-linclust", "mmseqs-cluster", "mmseqs-linclust", "diamond-cluster", "diamond-linclust"})
+    if opts.protein_clustering_algorithm in {"easy-cluster", "easy-linclust"}:
+        d = {"easy-cluster":"mmseqs-cluster", "easy-linclust":"mmseqs-linclust"}
+        warnings.warn("\n\nPlease use `{}` instead of `{}` for MMSEQS2 clustering.".format(d[opts.protein_clustering_algorithm], opts.protein_clustering_algorithm))
+        opts.protein_clustering_algorithm = d[opts.protein_clustering_algorithm]
+
+    if opts.skani_nonviral_preset.lower() == "none":
+        opts.skani_nonviral_preset = None
+
+    if opts.skani_viral_preset.lower() == "none":
+        opts.skani_viral_preset = None
+
     assert 0 < opts.minimum_core_prevalence <= 1.0, "--minimum_core_prevalence must be a float between (0.0,1.0])"
     # Set environment variables
     add_executables_to_environment(opts=opts)
@@ -119,10 +132,10 @@ def main(args=None):
     parser_io = parser.add_argument_group('Required I/O arguments')
     parser_io.add_argument("-i", "--genomes_table", type=str, default="stdin",  help = "path/to/genomes_table.tsv, Format: Must include the following columns (No header) [organism_type]<tab>[id_sample]<tab>[id_mag]<tab>[genome]<tab>[proteins]<tab>[cds] but can include additional columns to the right (e.g., [gene_models]).  Suggested input is from `compile_genomes_table.py` script. [Default: stdin]")
     parser_io.add_argument("-o","--output_directory", type=str, default="global_clustering_output", help = "path/to/project_directory [Default: global_clustering_output]")
-    parser_io.add_argument("-e", "--no_singletons", action="store_true", help="Exclude singletons") #isPSLC-1_SSPC-3345__SRR178126
-    parser_io.add_argument("-R", "--no_representative_sequences", action="store_true", help="Do not write representative sequences to fasta") #isPSLC-1_SSPC-3345__SRR178126
-    parser_io.add_argument("-C", "--no_core_sequences", action="store_true", help="Do not write core pagenome sequences to fasta") #isPSLC-1_SSPC-3345__SRR178126
-    # parser_io.add_argument("-M", "--no_marker_sequences", action="store_true", help="Do not write core pagenome sequences to fasta") #isPSLC-1_SSPC-3345__SRR178126
+    parser_io.add_argument("-e", "--no_singletons", action="store_true", help="Exclude singletons")
+    parser_io.add_argument("-R", "--no_representative_sequences", action="store_true", help="Do not write representative sequences to fasta") 
+    parser_io.add_argument("-C", "--no_core_sequences", action="store_true", help="Do not write core pagenome sequences to fasta") 
+    # parser_io.add_argument("-M", "--no_marker_sequences", action="store_true", help="Do not write core pagenome sequences to fasta") 
 
     # Utility
     parser_utility = parser.add_argument_group('Utility arguments')
@@ -132,29 +145,50 @@ def main(args=None):
     parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
     # parser_utility.add_argument("--verbose", action='store_true')
 
-    # FastANI
+    # ANI
+    parser_genome_clustering = parser.add_argument_group('Genome clustering arguments')
+    parser_genome_clustering.add_argument("-G", "--genome_clustering_algorithm", type=str,  choices={"fastani", "skani"}, default="skani", help="Program to use for ANI calculations.  `skani` is faster and more memory efficient. For v1.0.0 - v1.3.x behavior, use `fastani`. [Default: skani]")
+    parser_genome_clustering.add_argument("-A", "--ani_threshold", type=float, default=95.0, help="Species-level cluster (SLC) ANI threshold (Range (0.0, 100.0]) [Default: 95.0]")
+    parser_genome_clustering.add_argument("--genome_cluster_prefix", type=str, default="SLC-", help="Cluster prefix [Default: 'SLC-")
+    parser_genome_clustering.add_argument("--genome_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
+    parser_genome_clustering.add_argument("--genome_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
+
+    parser_skani = parser.add_argument_group('Skani triangle arguments')
+    parser_skani.add_argument("--skani_target_ani",  type=float, default=80, help="skani | If you set --skani_target_ani to --ani_threshold, you may screen out genomes ANI ≥ --ani_threshold [Default: 80]")
+    parser_skani.add_argument("--skani_minimum_af",  type=float, default=15, help="skani | Minimum aligned fraction greater than this value [Default: 15]")
+    parser_skani.add_argument("--skani_no_confidence_interval",  action="store_true", help="skani | Output [5,95] ANI confidence intervals using percentile bootstrap on the putative ANI distribution")
+    # parser_skani.add_argument("--skani_low_memory", action="store_true", help="Skani | More options (e.g. --arg 1 ) https://github.com/bluenote-1577/skani [Default: '']")
+
+    parser_skani = parser.add_argument_group('[Prokaryotic & Eukaryotic] Skani triangle arguments')
+    parser_skani.add_argument("--skani_nonviral_preset", type=str, default="medium", choices={"fast", "medium", "slow", "none"}, help="skani [Prokaryotic & Eukaryotic]| Use `none` if you are setting skani -c (compression factor) {fast, medium, slow, none} [Default: medium]")
+    parser_skani.add_argument("--skani_nonviral_compression_factor", type=int, default=125,  help="skani [Prokaryotic & Eukaryotic]|  Compression factor (k-mer subsampling rate).	[Default: 125]")
+    parser_skani.add_argument("--skani_nonviral_marker_kmer_compression_factor", type=int, default=1000,  help="skani [Prokaryotic & Eukaryotic] | Marker k-mer compression factor. Markers are used for filtering. [Default: 1000]")
+    parser_skani.add_argument("--skani_nonviral_options", type=str, default="", help="skani [Prokaryotic & Eukaryotic] | More options for `skani triangle` (e.g. --arg 1 ) [Default: '']")
+
+    parser_skani = parser.add_argument_group('[Viral] Skani triangle arguments')
+    parser_skani.add_argument("--skani_viral_preset", type=str, default="slow", choices={"fast", "medium", "slow", "none"}, help="skani | Use `none` if you are setting skani -c (compression factor) {fast, medium, slow, none} [Default: slow]")
+    parser_skani.add_argument("--skani_viral_compression_factor", type=int, default=30,  help="skani [Viral] | Compression factor (k-mer subsampling rate).	[Default: 30]")
+    parser_skani.add_argument("--skani_viral_marker_kmer_compression_factor", type=int, default=200,  help="skani [Viral] | Marker k-mer compression factor. Markers are used for filtering. Consider decreasing to ~200-300 if working with small genomes (e.g. plasmids or viruses). [Default: 200]")
+    parser_skani.add_argument("--skani_viral_options", type=str, default="", help="skani [Viral] | More options for `skani triangle` (e.g. --arg 1 ) [Default: '']")
+
     parser_fastani = parser.add_argument_group('FastANI arguments')
-    parser_fastani.add_argument("-A", "--ani_threshold", type=float, default=95.0, help="FastANI | Species-level cluster (SLC) ANI threshold (Range (0.0, 100.0]) [Default: 95.0]")
-    parser_fastani.add_argument("--genome_cluster_prefix", type=str, default="SLC-", help="Cluster prefix [Default: 'SLC-")
-    parser_fastani.add_argument("--genome_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
-    parser_fastani.add_argument("--genome_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
     parser_fastani.add_argument("--fastani_options", type=str, default="", help="FastANI | More options (e.g. --arg 1 ) [Default: '']")
 
-    # MMSEQS2
-    parser_mmseqs2 = parser.add_argument_group('MMSEQS2 arguments')
-    parser_mmseqs2.add_argument("-a", "--algorithm", type=str, default="easy-cluster", help="MMSEQS2 | {easy-cluster, easy-linclust} [Default: easy-cluster]")
-    parser_mmseqs2.add_argument("-t", "--minimum_identity_threshold", type=float, default=50.0, help="MMSEQS2 | SLC-Specific Protein Cluster (SSPC, previously referred to as SSO) percent identity threshold (Range (0.0, 100.0]) [Default: 50.0]")
-    parser_mmseqs2.add_argument("-c", "--minimum_coverage_threshold", type=float, default=0.8, help="MMSEQS2 | SSPC coverage threshold (Range (0.0, 1.0]) [Default: 0.8]")
-    parser_mmseqs2.add_argument("--protein_cluster_prefix", type=str, default="SSPC-", help="Cluster prefix [Default: 'SSPC-")
-    parser_mmseqs2.add_argument("--protein_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
-    parser_mmseqs2.add_argument("--protein_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
-    parser_mmseqs2.add_argument("--mmseqs2_options", type=str, default="", help="MMSEQS2 | More options (e.g. --arg 1 ) [Default: '']")
+    # Clustering
+    parser_protein_clustering = parser.add_argument_group('Protein clustering arguments')
+    parser_protein_clustering.add_argument("-P", "--protein_clustering_algorithm", type=str, choices={"mmseqs-cluster", "mmseqs-linclust", "diamond-cluster", "diamond-linclust"}, default="mmseqs-cluster", help="Clustering algorithm | Diamond can only be used for clustering proteins {mmseqs-cluster, mmseqs-linclust, diamond-cluster, mmseqs-linclust} [Default: mmseqs-cluster]")
+    parser_protein_clustering.add_argument("-t", "--minimum_identity_threshold", type=float, default=50.0, help="Clustering | Percent identity threshold (Range (0.0, 100.0]) [Default: 50.0]")
+    parser_protein_clustering.add_argument("-c", "--minimum_coverage_threshold", type=float, default=0.8, help="Clustering | Coverage threshold (Range (0.0, 1.0]) [Default: 0.8]")
+    parser_protein_clustering.add_argument("--protein_cluster_prefix", type=str, default="SSPC-", help="Cluster prefix [Default: 'SSPC-")
+    parser_protein_clustering.add_argument("--protein_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
+    parser_protein_clustering.add_argument("--protein_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
+    parser_protein_clustering.add_argument("--mmseqs2_options", type=str, default="", help="MMSEQS2 | More options (e.g. --arg 1 ) [Default: '']")
+    parser_protein_clustering.add_argument("--diamond_options", type=str, default="", help="Diamond | More options (e.g. --arg 1 ) [Default: '']")
 
     # Pangenome
     parser_pangenome = parser.add_argument_group('Pangenome arguments')
     parser_pangenome.add_argument("--minimum_core_prevalence", type=float, default=1.0, help="Minimum ratio of genomes detected in a SLC for a SSPC to be considered core (Range (0.0, 1.0]) [Default: 1.0]")
 
-
     # Options
     opts = parser.parse_args()
 
@@ -196,6 +230,19 @@ def main(args=None):
     configure_parameters(opts, directories)
     sys.stdout.flush()
 
+    # Genome clustering algorithm
+    GENOME_CLUSTERING_ALGORITHM = opts.genome_clustering_algorithm.lower()
+    if GENOME_CLUSTERING_ALGORITHM == "fastani":
+        GENOME_CLUSTERING_ALGORITHM = "FastANI"
+    if GENOME_CLUSTERING_ALGORITHM == "skani":
+        GENOME_CLUSTERING_ALGORITHM = "skani"
+
+    # Protein clustering algorithm
+    PROTEIN_CLUSTERING_ALGORITHM = opts.protein_clustering_algorithm.split("-")[0].lower()
+    if PROTEIN_CLUSTERING_ALGORITHM == "mmseqs":
+        PROTEIN_CLUSTERING_ALGORITHM = PROTEIN_CLUSTERING_ALGORITHM.upper()
+    if PROTEIN_CLUSTERING_ALGORITHM == "diamond":
+        PROTEIN_CLUSTERING_ALGORITHM = PROTEIN_CLUSTERING_ALGORITHM.capitalize()
 
     # Make directories
     t0 = time.time()
@@ -279,50 +326,125 @@ def main(args=None):
     # Commands
     f_cmds = open(os.path.join(opts.output_directory, "commands.sh"), "w")
 
-    # FastANI
-    print(format_header("* ({}) Running FastANI:".format(format_duration(t0))), file=sys.stdout)
-    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*",  "genomes.list")), "Running FastANI"):
+    # Pairwise ANI
+    print(format_header("* ({}) Running {}:".format(format_duration(t0), GENOME_CLUSTERING_ALGORITHM)), file=sys.stdout)
+    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*",  "genomes.list")), "Running pairwise ANI"):
         fields = fp.split("/")
         organism_type = fields[-2]
         output_directory = os.path.split(fp)[0]
+
+        if opts.genome_clustering_algorithm == "skani":
+            name = "skani__{}".format(organism_type)
+            description = "[Program = skani] [Organism_Type = {}]".format(organism_type)
+
+            arguments = list()
+
+            if organism_type.lower() in {"viral", "virus", "virion"}:
+                arguments += [
+                os.environ["skani"],
+                "triangle",
+                "--sparse",
+                "-t {}".format(opts.n_jobs),
+                "-l {}".format(fp),
+                "-o {}".format(os.path.join(output_directory, "skani_output.tsv")),
+                "--ci" if not opts.skani_no_confidence_interval else "",
+                "--min-af {}".format(opts.skani_minimum_af),
+                "-s {}".format(opts.skani_target_ani),
+                "-c {}".format(opts.skani_viral_compression_factor),
+                "-m {}".format(opts.skani_viral_marker_kmer_compression_factor),
+                "--{}".format(opts.skani_viral_preset) if opts.skani_viral_preset else "",
+                opts.skani_viral_options,
+                ]
+
+            else:
+                arguments += [
+                os.environ["skani"],
+                "triangle",
+                "--sparse",
+                "-t {}".format(opts.n_jobs),
+                "-l {}".format(fp),
+                "-o {}".format(os.path.join(output_directory, "skani_output.tsv")),
+                "--ci" if not opts.skani_no_confidence_interval else "",
+                "--min-af {}".format(opts.skani_minimum_af),
+                "-s {}".format(opts.skani_target_ani),
+                "-c {}".format(opts.skani_nonviral_compression_factor),
+                "-m {}".format(opts.skani_nonviral_marker_kmer_compression_factor),
+                "--{}".format(opts.skani_nonviral_preset) if opts.skani_nonviral_preset else "",
+                opts.skani_nonviral_options,
+                ]
+
+            arguments += [
+                    "&&",
+
+                "cat",
+                os.path.join(output_directory, "skani_output.tsv"),
+                "|",
+                "cut -f1-3",
+                "|",
+                "tail -n +2",
+                "|",
+                os.environ["edgelist_to_clusters.py"],
+                "--basename",
+                "-t {}".format(opts.ani_threshold),
+                "--no_singletons" if bool(opts.no_singletons) else "",
+                "--cluster_prefix {}{}".format(organism_type[0].upper(), opts.genome_cluster_prefix),
+                "--cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
+                "--cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill),
+                "-o {}".format(os.path.join(output_directory, "genome_clusters.tsv")),
+                "--identifiers {}".format(os.path.join(directories["intermediate"], organism_type, "genome_identifiers.list")),
+                "--export_graph {}".format(os.path.join(directories["serialization"], f"{organism_type}.networkx_graph.pkl")),
+                "--export_dict {}".format(os.path.join(directories["serialization"], f"{organism_type}.dict.pkl")),
+
+                    "&&",
+
+                "rm -rf {}".format(os.path.join(directories["tmp"], "*")),
+
+                ] 
+            
+            cmd = Command(
+                arguments,
+                name=name, 
+                f_cmds=f_cmds,
+                )
         
-        name = "fastani__{}".format(organism_type)
-        description = "[Program = FastANI] [Organism_Type = {}]".format(organism_type)
-        cmd = Command([
-            os.environ["fastANI"],
-            "-t {}".format(opts.n_jobs),
-            "--rl {}".format(fp),
-            "--ql {}".format(fp),
-            "-o {}".format(os.path.join(output_directory, "fastani_output.tsv")),
-            opts.fastani_options,
-
-                "&&",
-
-            "cat",
-            os.path.join(output_directory, "fastani_output.tsv"),
-            "|",
-            "cut -f1-3",
-            "|",
-            os.environ["edgelist_to_clusters.py"],
-            "--basename",
-            "-t {}".format(opts.ani_threshold),
-            "--no_singletons" if bool(opts.no_singletons) else "",
-            "--cluster_prefix {}{}".format(organism_type[0].upper(), opts.genome_cluster_prefix),
-            "--cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
-            "--cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill),
-            "-o {}".format(os.path.join(output_directory, "genome_clusters.tsv")),
-            "--identifiers {}".format(os.path.join(directories["intermediate"], organism_type, "genome_identifiers.list")),
-            "--export_graph {}".format(os.path.join(directories["serialization"], f"{organism_type}.networkx_graph.pkl")),
-            "--export_dict {}".format(os.path.join(directories["serialization"], f"{organism_type}.dict.pkl")),
-
-                "&&",
-
-            "rm -rf {}".format(os.path.join(directories["tmp"], "*")),
-
-            ], 
-            name=name, 
-            f_cmds=f_cmds,
-            )
+        if opts.genome_clustering_algorithm == "fastani":
+            name = "fastani__{}".format(organism_type)
+            description = "[Program = FastANI] [Organism_Type = {}]".format(organism_type)
+            cmd = Command([
+                os.environ["fastANI"],
+                "-t {}".format(opts.n_jobs),
+                "--rl {}".format(fp),
+                "--ql {}".format(fp),
+                "-o {}".format(os.path.join(output_directory, "fastani_output.tsv")),
+                opts.fastani_options,
+
+                    "&&",
+
+                "cat",
+                os.path.join(output_directory, "fastani_output.tsv"),
+                "|",
+                "cut -f1-3",
+                "|",
+                os.environ["edgelist_to_clusters.py"],
+                "--basename",
+                "-t {}".format(opts.ani_threshold),
+                "--no_singletons" if bool(opts.no_singletons) else "",
+                "--cluster_prefix {}{}".format(organism_type[0].upper(), opts.genome_cluster_prefix),
+                "--cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
+                "--cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill),
+                "-o {}".format(os.path.join(output_directory, "genome_clusters.tsv")),
+                "--identifiers {}".format(os.path.join(directories["intermediate"], organism_type, "genome_identifiers.list")),
+                "--export_graph {}".format(os.path.join(directories["serialization"], f"{organism_type}.networkx_graph.pkl")),
+                "--export_dict {}".format(os.path.join(directories["serialization"], f"{organism_type}.dict.pkl")),
+
+                    "&&",
+
+                "rm -rf {}".format(os.path.join(directories["tmp"], "*")),
+
+                ], 
+                name=name, 
+                f_cmds=f_cmds,
+                )
 
         # Run command
         cmd.run(
@@ -339,11 +461,11 @@ def main(args=None):
                 print("Check the following files:\ncat {}".format(os.path.join(directories["log"], "{}.*".format(name))), file=sys.stdout)
                 sys.exit(cmd.returncode_)
 
-    # MMSEQS2
-    print(format_header(" * ({}) Running MMSEQS2:".format(format_duration(t0))), file=sys.stdout)
+    # Protein Clustering
+    print(format_header(" * ({}) Running {}:".format(format_duration(t0), PROTEIN_CLUSTERING_ALGORITHM)), file=sys.stdout)
     mag_to_genomecluster = dict()
     protein_to_proteincluster = dict()
-    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*", "genome_clusters.tsv")), "Running MMSEQS2"):
+    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*", "genome_clusters.tsv")), "Running {}".format(PROTEIN_CLUSTERING_ALGORITHM)):
         fields = fp.split("/")
         organism_type = fields[-2]
 
@@ -363,20 +485,21 @@ def main(args=None):
                 print(*proteins, sep="\n", file=f)
             write_fasta(protein_to_sequence[proteins], os.path.join(genomecluster_directory, "proteins.faa" ))
 
-            # Run MMSEQS2
-            name = "mmseqs2__{}__{}".format(organism_type, id_genomecluster)
-            description = "[Program = MMSEQS2] [Organism_Type = {}] [Genome_Cluster = {}]".format(organism_type, id_genomecluster)
+            # Run Clustering
+            name = "{}__{}__{}".format(PROTEIN_CLUSTERING_ALGORITHM.lower(), organism_type, id_genomecluster)
+            description = "[Program = {}] [Organism_Type = {}] [Genome_Cluster = {}]".format(PROTEIN_CLUSTERING_ALGORITHM, organism_type, id_genomecluster)
 
             cmd = Command([
-                os.environ["mmseqs2_wrapper.py"],
+                os.environ["clustering_wrapper.py"],
                 "--fasta {}".format(os.path.join(genomecluster_directory, "proteins.faa" )),
                 "--output_directory {}".format(genomecluster_directory),
                 "--no_singletons" if bool(opts.no_singletons) else "",
-                "--algorithm {}".format(opts.algorithm),
+                "--algorithm {}".format(opts.protein_clustering_algorithm),
                 "--n_jobs {}".format(opts.n_jobs),
                 "--minimum_identity_threshold {}".format(opts.minimum_identity_threshold),
                 "--minimum_coverage_threshold {}".format(opts.minimum_coverage_threshold),
                 "--mmseqs2_options='{}'" if bool(opts.mmseqs2_options) else "",
+                "--diamond_options='{}'" if bool(opts.diamond_options) else "",
                 "--cluster_prefix {}_{}".format(id_genomecluster, opts.protein_cluster_prefix),
                 "--cluster_suffix {}".format(opts.protein_cluster_suffix) if bool(opts.protein_cluster_suffix) else "",
                 "--cluster_prefix_zfill {}".format(opts.protein_cluster_prefix_zfill),
diff --git a/src/scripts/local_clustering.py b/src/scripts/local_clustering.py
index 198f8c9..e35c27f 100755
--- a/src/scripts/local_clustering.py
+++ b/src/scripts/local_clustering.py
@@ -15,7 +15,7 @@
 
 # from tqdm import tqdm
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.24"
+__version__ = "2023.12.11"
 
 def get_basename(x):
     _, fn = os.path.split(x)
@@ -50,12 +50,14 @@ def add_executables_to_environment(opts):
     """
     accessory_scripts = {
         "edgelist_to_clusters.py",
-        "mmseqs2_wrapper.py",
+        "clustering_wrapper.py",
     }
 
     required_executables={
+        "skani",
         "fastANI",
         "mmseqs",
+        "diamond",
      } 
   
     required_executables |= accessory_scripts
@@ -84,7 +86,6 @@ def add_executables_to_environment(opts):
         executables[name] = "'{}'".format(os.path.join(opts.script_directory, name)) # Can handle spaces in path
 
 
-
     print(format_header( "Adding executables to path from the following source: {}".format(opts.path_config), "-"), file=sys.stdout)
     for name, executable in executables.items():
         if name in required_executables:
@@ -95,9 +96,20 @@ def add_executables_to_environment(opts):
 
 # Configure parameters
 def configure_parameters(opts, directories):
-    assert_acceptable_arguments(opts.algorithm, {"easy-cluster", "easy-linclust"})
-    assert 0 < opts.minimum_core_prevalence <= 1.0, "--minimum_core_prevalence must be a float between (0.0,1.0])"
+    
+    assert_acceptable_arguments(opts.protein_clustering_algorithm, {"easy-cluster", "easy-linclust", "mmseqs-cluster", "mmseqs-linclust", "diamond-cluster", "diamond-linclust"})
+    if opts.protein_clustering_algorithm in {"easy-cluster", "easy-linclust"}:
+        d = {"easy-cluster":"mmseqs-cluster", "easy-linclust":"mmseqs-linclust"}
+        warnings.warn("\n\nPlease use `{}` instead of `{}` for MMSEQS2 clustering.".format(d[opts.protein_clustering_algorithm], opts.protein_clustering_algorithm))
+        opts.protein_clustering_algorithm = d[opts.protein_clustering_algorithm]
+
+    if opts.skani_nonviral_preset.lower() == "none":
+        opts.skani_nonviral_preset = None
+
+    if opts.skani_viral_preset.lower() == "none":
+        opts.skani_viral_preset = None
 
+    assert 0 < opts.minimum_core_prevalence <= 1.0, "--minimum_core_prevalence must be a float between (0.0,1.0])"
     # Set environment variables
     add_executables_to_environment(opts=opts)
 
@@ -130,23 +142,45 @@ def main(args=None):
     parser_utility.add_argument("-v", "--version", action='version', version="{} v{}".format(__program__, __version__))
     # parser_utility.add_argument("--verbose", action='store_true')
 
-    # FastANI
+   # ANI
+    parser_genome_clustering = parser.add_argument_group('Genome clustering arguments')
+    parser_genome_clustering.add_argument("-G", "--genome_clustering_algorithm", type=str,  choices={"fastani", "skani"}, default="skani", help="Program to use for ANI calculations.  `skani` is faster and more memory efficient. For v1.0.0 - v1.3.x behavior, use `fastani`. [Default: skani]")
+    parser_genome_clustering.add_argument("-A", "--ani_threshold", type=float, default=95.0, help="Species-level cluster (SLC) ANI threshold (Range (0.0, 100.0]) [Default: 95.0]")
+    parser_genome_clustering.add_argument("--genome_cluster_prefix", type=str, default="SLC-", help="Cluster prefix [Default: 'SLC-")
+    parser_genome_clustering.add_argument("--genome_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
+    parser_genome_clustering.add_argument("--genome_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
+
+    parser_skani = parser.add_argument_group('Skani triangle arguments')
+    parser_skani.add_argument("--skani_target_ani",  type=float, default=80, help="skani | If you set --skani_target_ani to --ani_threshold, you may screen out genomes ANI ≥ --ani_threshold [Default: 80]")
+    parser_skani.add_argument("--skani_minimum_af",  type=float, default=15, help="skani | Minimum aligned fraction greater than this value [Default: 15]")
+    parser_skani.add_argument("--skani_no_confidence_interval",  action="store_true", help="skani | Output [5,95] ANI confidence intervals using percentile bootstrap on the putative ANI distribution")
+    # parser_skani.add_argument("--skani_low_memory", action="store_true", help="Skani | More options (e.g. --arg 1 ) https://github.com/bluenote-1577/skani [Default: '']")
+
+    parser_skani = parser.add_argument_group('[Prokaryotic & Eukaryotic] Skani triangle arguments')
+    parser_skani.add_argument("--skani_nonviral_preset", type=str, default="medium", choices={"fast", "medium", "slow", "none"}, help="skani [Prokaryotic & Eukaryotic]| Use `none` if you are setting skani -c (compression factor) {fast, medium, slow, none} [Default: medium]")
+    parser_skani.add_argument("--skani_nonviral_compression_factor", type=int, default=125,  help="skani [Prokaryotic & Eukaryotic]|  Compression factor (k-mer subsampling rate).	[Default: 125]")
+    parser_skani.add_argument("--skani_nonviral_marker_kmer_compression_factor", type=int, default=1000,  help="skani [Prokaryotic & Eukaryotic] | Marker k-mer compression factor. Markers are used for filtering. [Default: 1000]")
+    parser_skani.add_argument("--skani_nonviral_options", type=str, default="", help="skani [Prokaryotic & Eukaryotic] | More options for `skani triangle` (e.g. --arg 1 ) [Default: '']")
+
+    parser_skani = parser.add_argument_group('[Viral] Skani triangle arguments')
+    parser_skani.add_argument("--skani_viral_preset", type=str, default="slow", choices={"fast", "medium", "slow", "none"}, help="skani | Use `none` if you are setting skani -c (compression factor) {fast, medium, slow, none} [Default: slow]")
+    parser_skani.add_argument("--skani_viral_compression_factor", type=int, default=30,  help="skani [Viral] | Compression factor (k-mer subsampling rate).	[Default: 30]")
+    parser_skani.add_argument("--skani_viral_marker_kmer_compression_factor", type=int, default=200,  help="skani [Viral] | Marker k-mer compression factor. Markers are used for filtering. Consider decreasing to ~200-300 if working with small genomes (e.g. plasmids or viruses). [Default: 200]")
+    parser_skani.add_argument("--skani_viral_options", type=str, default="", help="skani [Viral] | More options for `skani triangle` (e.g. --arg 1 ) [Default: '']")
+
     parser_fastani = parser.add_argument_group('FastANI arguments')
-    parser_fastani.add_argument("-A", "--ani_threshold", type=float, default=95.0, help="FastANI | Species-level cluster (SLC) ANI threshold (Range (0.0, 100.0]) [Default: 95.0]")
-    parser_fastani.add_argument("--genome_cluster_prefix", type=str, default="SLC-", help="Cluster prefix [Default: 'SLC-")
-    parser_fastani.add_argument("--genome_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
-    parser_fastani.add_argument("--genome_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
     parser_fastani.add_argument("--fastani_options", type=str, default="", help="FastANI | More options (e.g. --arg 1 ) [Default: '']")
 
-    # MMSEQS2
-    parser_mmseqs2 = parser.add_argument_group('MMSEQS2 arguments')
-    parser_mmseqs2.add_argument("-a", "--algorithm", type=str, default="easy-cluster", help="MMSEQS2 | {easy-cluster, easy-linclust} [Default: easy-cluster]")
-    parser_mmseqs2.add_argument("-t", "--minimum_identity_threshold", type=float, default=50.0, help="MMSEQS2 | SLC-Specific Protein Cluster (SSPC, previously referred to as SSO) percent identity threshold (Range (0.0, 100.0]) [Default: 50.0]")
-    parser_mmseqs2.add_argument("-c", "--minimum_coverage_threshold", type=float, default=0.8, help="MMSEQS2 | SSPC coverage threshold (Range (0.0, 1.0]) [Default: 0.8]")
-    parser_mmseqs2.add_argument("--protein_cluster_prefix", type=str, default="SSPC-", help="Cluster prefix [Default: 'SSPC-")
-    parser_mmseqs2.add_argument("--protein_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
-    parser_mmseqs2.add_argument("--protein_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
-    parser_mmseqs2.add_argument("--mmseqs2_options", type=str, default="", help="MMSEQS2 | More options (e.g. --arg 1 ) [Default: '']")
+    # Clustering
+    parser_protein_clustering = parser.add_argument_group('Protein clustering arguments')
+    parser_protein_clustering.add_argument("-P", "--protein_clustering_algorithm", type=str, choices={"mmseqs-cluster", "mmseqs-linclust", "diamond-cluster", "diamond-linclust"}, default="mmseqs-cluster", help="Clustering algorithm | Diamond can only be used for clustering proteins {mmseqs-cluster, mmseqs-linclust, diamond-cluster, mmseqs-linclust} [Default: mmseqs-cluster]")
+    parser_protein_clustering.add_argument("-t", "--minimum_identity_threshold", type=float, default=50.0, help="Clustering | Percent identity threshold (Range (0.0, 100.0]) [Default: 50.0]")
+    parser_protein_clustering.add_argument("-c", "--minimum_coverage_threshold", type=float, default=0.8, help="Clustering | Coverage threshold (Range (0.0, 1.0]) [Default: 0.8]")
+    parser_protein_clustering.add_argument("--protein_cluster_prefix", type=str, default="SSPC-", help="Cluster prefix [Default: 'SSPC-")
+    parser_protein_clustering.add_argument("--protein_cluster_suffix", type=str, default="", help="Cluster suffix [Default: '")
+    parser_protein_clustering.add_argument("--protein_cluster_prefix_zfill", type=int, default=0, help="Cluster prefix zfill. Use 7 to match identifiers from OrthoFinder.  Use 0 to add no zfill. [Default: 0]") #7
+    parser_protein_clustering.add_argument("--mmseqs2_options", type=str, default="", help="MMSEQS2 | More options (e.g. --arg 1 ) [Default: '']")
+    parser_protein_clustering.add_argument("--diamond_options", type=str, default="", help="Diamond | More options (e.g. --arg 1 ) [Default: '']")
 
     # Pangenome
     parser_pangenome = parser.add_argument_group('Pangenome arguments')
@@ -191,6 +225,20 @@ def main(args=None):
     configure_parameters(opts, directories)
     sys.stdout.flush()
 
+    # Genome clustering algorithm
+    GENOME_CLUSTERING_ALGORITHM = opts.genome_clustering_algorithm.lower()
+    if GENOME_CLUSTERING_ALGORITHM == "fastani":
+        GENOME_CLUSTERING_ALGORITHM = "FastANI"
+    if GENOME_CLUSTERING_ALGORITHM == "skani":
+        GENOME_CLUSTERING_ALGORITHM = "skani"
+
+    # Protein clustering algorithm
+    PROTEIN_CLUSTERING_ALGORITHM = opts.protein_clustering_algorithm.split("-")[0].lower()
+    if PROTEIN_CLUSTERING_ALGORITHM == "mmseqs":
+        PROTEIN_CLUSTERING_ALGORITHM = PROTEIN_CLUSTERING_ALGORITHM.upper()
+    if PROTEIN_CLUSTERING_ALGORITHM == "diamond":
+        PROTEIN_CLUSTERING_ALGORITHM = PROTEIN_CLUSTERING_ALGORITHM.capitalize()
+
     # Make directories
     t0 = time.time()
     print(format_header(" " .join(["* ({}) Creating directories:".format(format_duration(t0)), directories["intermediate"]])), file=sys.stdout)
@@ -278,50 +326,127 @@ def main(args=None):
     # Commands
     f_cmds = open(os.path.join(opts.output_directory, "commands.sh"), "w")
 
-    # FastANI
-    print(format_header("* ({}) Running FastANI:".format(format_duration(t0))), file=sys.stdout)
-    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*", "*", "genomes.list")), "Running FastANI"):
+    # Pairwise ANI
+    print(format_header("* ({}) Running {}:".format(format_duration(t0), GENOME_CLUSTERING_ALGORITHM)), file=sys.stdout)
+    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*", "*", "genomes.list")), "Running pairwise ANI"):
         fields = fp.split("/")
         organism_type = fields[-3]
         id_sample =  fields[-2]
+
+        os.makedirs(os.path.join(directories["serialization"], id_sample), exist_ok=True)
+
+        if opts.genome_clustering_algorithm == "skani":
+            name = "skani__{}__{}".format(organism_type, id_sample)
+            description = "[Program = skani] [Organism_Type = {}] [Sample_ID = {}]".format(organism_type, id_sample)
+
+            arguments = list()
+
+            if organism_type.lower() in {"viral", "virus", "virion"}:
+                arguments += [
+                os.environ["skani"],
+                "triangle",
+                "--sparse",
+                "-t {}".format(opts.n_jobs),
+                "-l {}".format(fp),
+                "-o {}".format(os.path.join(os.path.split(fp)[0], "skani_output.tsv")),
+                "--ci" if not opts.skani_no_confidence_interval else "",
+                "--min-af {}".format(opts.skani_minimum_af),
+                "-s {}".format(opts.skani_target_ani),
+                "-c {}".format(opts.skani_viral_compression_factor),
+                "-m {}".format(opts.skani_viral_marker_kmer_compression_factor),
+                "--{}".format(opts.skani_viral_preset) if opts.skani_viral_preset else "",
+                opts.skani_viral_options,
+                ]
+
+            else:
+                arguments += [
+                os.environ["skani"],
+                "triangle",
+                "--sparse",
+                "-t {}".format(opts.n_jobs),
+                "-l {}".format(fp),
+                "-o {}".format(os.path.join(os.path.split(fp)[0], "skani_output.tsv")),
+                "--ci" if not opts.skani_no_confidence_interval else "",
+                "--min-af {}".format(opts.skani_minimum_af),
+                "-s {}".format(opts.skani_target_ani),
+                "-c {}".format(opts.skani_nonviral_compression_factor),
+                "-m {}".format(opts.skani_nonviral_marker_kmer_compression_factor),
+                "--{}".format(opts.skani_nonviral_preset) if opts.skani_nonviral_preset else "",
+                opts.skani_nonviral_options,
+                ]
+
+            arguments += [
+                    "&&",
+
+                "cat",
+                os.path.join(os.path.split(fp)[0], "skani_output.tsv"),
+                "|",
+                "cut -f1-3",
+                "|",
+                "tail -n +2",
+                "|",
+                os.environ["edgelist_to_clusters.py"],
+                "--basename",
+                "-t {}".format(opts.ani_threshold),
+                "--no_singletons" if bool(opts.no_singletons) else "",
+                "--cluster_prefix {}__{}{}".format(id_sample, organism_type[0].upper(), opts.genome_cluster_prefix),
+                "--cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
+                "--cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill),
+                "-o {}".format(os.path.join(os.path.split(fp)[0], "genome_clusters.tsv")),
+                "--identifiers {}".format(os.path.join(directories["intermediate"], organism_type, id_sample, "genome_identifiers.list")),
+                "--export_graph {}".format(os.path.join(directories["serialization"], id_sample, f"{organism_type}.networkx_graph.pkl")),
+                "--export_dict {}".format(os.path.join(directories["serialization"], id_sample, f"{organism_type}.dict.pkl")),
+
+                    "&&",
+
+                "rm -rf {}".format(os.path.join(directories["tmp"], "*")),
+
+                ] 
+            
+            cmd = Command(
+                arguments,
+                name=name, 
+                f_cmds=f_cmds,
+                )
         
-        name = "fastani__{}__{}".format(organism_type, id_sample)
-        description = "[Program = FastANI] [Organism_Type = {}] [Sample_ID = {}]".format(organism_type, id_sample)
-        cmd = Command([
-            os.environ["fastANI"],
-            "-t {}".format(opts.n_jobs),
-            "--rl {}".format(fp),
-            "--ql {}".format(fp),
-            "-o {}".format(os.path.join(os.path.split(fp)[0], "fastani_output.tsv")),
-            opts.fastani_options,
-
-                "&&",
-
-            "cat",
-            os.path.join(os.path.split(fp)[0], "fastani_output.tsv"),
-            "|",
-            "cut -f1-3",
-            "|",
-            os.environ["edgelist_to_clusters.py"],
-            "--basename",
-            "-t {}".format(opts.ani_threshold),
-            "--no_singletons" if bool(opts.no_singletons) else "",
-            "--cluster_prefix {}__{}{}".format(id_sample, organism_type[0].upper(), opts.genome_cluster_prefix),
-            "--cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
-            "--cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill),
-            "-o {}".format(os.path.join(os.path.split(fp)[0], "genome_clusters.tsv")),
-            "--identifiers {}".format(os.path.join(directories["intermediate"], organism_type, id_sample, "genome_identifiers.list")),
-            "--export_graph {}".format(os.path.join(directories["serialization"], f"{organism_type}.networkx_graph.pkl")),
-            "--export_dict {}".format(os.path.join(directories["serialization"], f"{organism_type}.dict.pkl")),
-
-                "&&",
-
-            "rm -rf {}".format(os.path.join(directories["tmp"], "*")),
-
-            ], 
-            name=name, 
-            f_cmds=f_cmds,
-            )
+        if opts.genome_clustering_algorithm == "fastani":
+            name = "fastani__{}__{}".format(organism_type, id_sample)
+            description = "[Program = FastANI] [Organism_Type = {}] [Sample_ID = {}]".format(organism_type, id_sample)
+            cmd = Command([
+                os.environ["fastANI"],
+                "-t {}".format(opts.n_jobs),
+                "--rl {}".format(fp),
+                "--ql {}".format(fp),
+                "-o {}".format(os.path.join(os.path.split(fp)[0], "fastani_output.tsv")),
+                opts.fastani_options,
+
+                    "&&",
+
+                "cat",
+                os.path.join(os.path.split(fp)[0], "fastani_output.tsv"),
+                "|",
+                "cut -f1-3",
+                "|",
+                os.environ["edgelist_to_clusters.py"],
+                "--basename",
+                "-t {}".format(opts.ani_threshold),
+                "--no_singletons" if bool(opts.no_singletons) else "",
+                "--cluster_prefix {}__{}{}".format(id_sample, organism_type[0].upper(), opts.genome_cluster_prefix),
+                "--cluster_suffix {}".format(opts.genome_cluster_suffix) if bool(opts.genome_cluster_suffix) else "",
+                "--cluster_prefix_zfill {}".format(opts.genome_cluster_prefix_zfill),
+                "-o {}".format(os.path.join(os.path.split(fp)[0], "genome_clusters.tsv")),
+                "--identifiers {}".format(os.path.join(directories["intermediate"], organism_type, id_sample, "genome_identifiers.list")),
+                "--export_graph {}".format(os.path.join(directories["serialization"], id_sample, f"{organism_type}.networkx_graph.pkl")),
+                "--export_dict {}".format(os.path.join(directories["serialization"], id_sample, f"{organism_type}.dict.pkl")),
+
+                    "&&",
+
+                "rm -rf {}".format(os.path.join(directories["tmp"], "*")),
+
+                ], 
+                name=name, 
+                f_cmds=f_cmds,
+                )
 
         # Run command
         cmd.run(
@@ -338,11 +463,11 @@ def main(args=None):
                 print("Check the following files:\ncat {}".format(os.path.join(directories["log"], "{}.*".format(name))), file=sys.stdout)
                 sys.exit(cmd.returncode_)
 
-    # MMSEQS2
-    print(format_header(" * ({}) Running MMSEQS2:".format(format_duration(t0))), file=sys.stdout)
+    # Clustering
+    print(format_header(" * ({}) Running {}:".format(format_duration(t0), PROTEIN_CLUSTERING_ALGORITHM)), file=sys.stdout)
     mag_to_genomecluster = dict()
     protein_to_proteincluster = dict()
-    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*", "*", "genome_clusters.tsv")), "Running MMSEQS2"):
+    for fp in pv(glob.glob(os.path.join(directories["intermediate"], "*", "*", "genome_clusters.tsv")), "Running {}".format(PROTEIN_CLUSTERING_ALGORITHM)):
         fields = fp.split("/")
         organism_type = fields[-3]
         id_sample =  fields[-2]
@@ -364,20 +489,21 @@ def main(args=None):
                 print(*proteins, sep="\n", file=f)
             write_fasta(protein_to_sequence[proteins], os.path.join(genomecluster_directory, "proteins.faa" ))
 
-            # Run MMSEQS2
-            name = "mmseqs2__{}__{}".format(organism_type, id_genomecluster)
-            description = "[Program = MMSEQS2] [Organism_Type = {}] [Sample_ID = {}] [Genome_Cluster = {}]".format(organism_type, id_sample, id_genomecluster)
+            # Run Clustering
+            name = "{}__{}__{}".format(PROTEIN_CLUSTERING_ALGORITHM.lower(), organism_type, id_genomecluster)
+            description = "[Program = {}] [Organism_Type = {}] [Sample_ID = {}] [Genome_Cluster = {}]".format(PROTEIN_CLUSTERING_ALGORITHM, organism_type, id_sample, id_genomecluster)
 
             cmd = Command([
-                os.environ["mmseqs2_wrapper.py"],
+                os.environ["clustering_wrapper.py"],
                 "--fasta {}".format(os.path.join(genomecluster_directory, "proteins.faa" )),
                 "--output_directory {}".format(genomecluster_directory),
                 "--no_singletons" if bool(opts.no_singletons) else "",
-                "--algorithm {}".format(opts.algorithm),
+                "--algorithm {}".format(opts.protein_clustering_algorithm),
                 "--n_jobs {}".format(opts.n_jobs),
                 "--minimum_identity_threshold {}".format(opts.minimum_identity_threshold),
                 "--minimum_coverage_threshold {}".format(opts.minimum_coverage_threshold),
                 "--mmseqs2_options='{}'" if bool(opts.mmseqs2_options) else "",
+                "--diamond_options='{}'" if bool(opts.diamond_options) else "",
                 "--cluster_prefix {}_{}".format(id_genomecluster, opts.protein_cluster_prefix),
                 "--cluster_suffix {}".format(opts.protein_cluster_suffix) if bool(opts.protein_cluster_suffix) else "",
                 "--cluster_prefix_zfill {}".format(opts.protein_cluster_prefix_zfill),
@@ -599,6 +725,5 @@ def main(args=None):
     df_proteins["id_protein_cluster"].to_frame().dropna(how="any", axis=0).to_csv(os.path.join(directories["output"], "proteins_to_orthogroups.tsv"), sep="\t", header=None) # Change labels?
     print(*map(lambda fp: " * {}".format(fp), glob.glob(os.path.join(directories["output"],"*.tsv")) + glob.glob(os.path.join(directories["output"],"*.faa"))), sep="\n", file=sys.stdout )
 
-
 if __name__ == "__main__":
     main(sys.argv[1:])
diff --git a/src/scripts/merge_annotations.py b/src/scripts/merge_annotations.py
index 50515fd..821c534 100755
--- a/src/scripts/merge_annotations.py
+++ b/src/scripts/merge_annotations.py
@@ -1,12 +1,12 @@
 #!/usr/bin/env python
-import sys, os, argparse, re
+import sys, os, argparse, re, gzip
 from collections import defaultdict, OrderedDict
 import pandas as pd
 import numpy as np
 from soothsayer_utils import read_hmmer, pv, get_file_object, assert_acceptable_arguments, format_header, flatten
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2023.10.26"
+__version__ = "2023.11.24"
 
 # disclaimer = format_header("DISCLAIMER: Lineage predictions are NOT robust and DO NOT USE CORE MARKERS.  Please only use for exploratory suggestions.")
 
@@ -72,72 +72,78 @@ def compile_identifiers(df, id_protein_cluster):
     if len(organism_types) == 1:
         organism_types = list(organism_types)[0]
 
-    # Genomes
-    genomes = set(df["id_genome"])
+    # # Genomes
+    # genomes = set(df["id_genome"])
 
-    # Samples
-    samples = set(df["sample_of_origin"])
+    # # Samples
+    # samples = set(df["sample_of_origin"])
 
     # Genome clusters
     genome_clusters = set(df["id_genome_cluster"])
     if len(genome_clusters) == 1:
         genome_clusters = list(genome_clusters)[0]
 
-    data = OrderedDict([
-        ("id_genome_cluster", genome_clusters),
-        ("organism_type", organism_types),
-        ("genomes", genomes),
-        ("samples_of_origin", samples),
-    ],
-    )
-    data = pd.Series(data, name=id_protein_cluster)
-    data.index = data.index.map(lambda x: ("Identifiers", x))
+    # data = OrderedDict([
+    #     ("id_genome_cluster", genome_clusters),
+    #     ("organism_type", organism_types),
+    #     ("genomes", genomes),
+    #     ("samples_of_origin", samples),
+    # ],
+    # )
+    # data = pd.Series(data, name=id_protein_cluster)
+    # data.index = data.index.map(lambda x: ("Identifiers", x))
+    data = [genome_clusters, organism_types]#, genomes, samples]
     return data
 
 def compile_uniref(df, id_protein_cluster):
     df = df.dropna(how="all", axis=0)
-    unique_identifiers = set(df["sseqid"].unique())
-    data = OrderedDict([
-        ("number_of_proteins", df.shape[0]),
-        ("number_of_unique_hits", len(unique_identifiers)),
-        ("ids", unique_identifiers),
-        ("names", set(df["product"].unique())),
-    ],
-    )
-    data = pd.Series(data, name=id_protein_cluster)
-    data.index = data.index.map(lambda x: ("UniRef", x))
+    unique_identifiers = list(df["sseqid"].unique())
+    # data = OrderedDict([
+    #     ("number_of_proteins", df.shape[0]),
+    #     ("number_of_unique_hits", len(unique_identifiers)),
+    #     ("ids", unique_identifiers),
+    #     ("names", set(df["product"].unique())),
+    # ],
+    # )
+    # data = pd.Series(data, name=id_protein_cluster)
+    # data.index = data.index.map(lambda x: ("UniRef", x))
+
+    data = [df.shape[0], len(unique_identifiers), unique_identifiers, list(df["product"].unique())]
     return data
 
 def compile_nonuniref_diamond(df, id_protein_cluster, label):
     df = df.dropna(how="all", axis=0)
     unique_identifiers = set(df["sseqid"].unique())
-    data = OrderedDict(
-        [
-        ("number_of_proteins", df.shape[0]),
-        ("number_of_unique_hits", len(unique_identifiers)),
-        ("ids", unique_identifiers),
-        ("names", np.nan),
-        ],
-    )
-    data = pd.Series(data, name=id_protein_cluster)
-    data.index = data.index.map(lambda x: (label, x))
+    # data = OrderedDict(
+    #     [
+    #     ("number_of_proteins", df.shape[0]),
+    #     ("number_of_unique_hits", len(unique_identifiers)),
+    #     ("ids", unique_identifiers),
+    #     ("names", np.nan),
+    #     ],
+    # )
+    # data = pd.Series(data, name=id_protein_cluster)
+    # data.index = data.index.map(lambda x: (label, x))
+    data = [df.shape[0], len(unique_identifiers), list(unique_identifiers)]
+
     return data
 
 def compile_hmmsearch(df, id_protein_cluster, label):
     df = df.dropna(how="all", axis=0).query("number_of_hits > 0")
-    unique_identifiers = flatten(df["ids"], into=set)
-    unique_names = flatten(df["names"], into=set)
+    unique_identifiers = flatten(df["ids"], into=list, unique=True)
+    unique_names = flatten(df["names"], unique=True)
     
-    data = OrderedDict(
-        [
-        ("number_of_proteins", df.shape[0]),
-        ("number_of_unique_hits", len(unique_identifiers)),
-        ("ids", unique_identifiers),
-        ("names", unique_names),
-        ],
-    )
-    data = pd.Series(data, name=id_protein_cluster)
-    data.index = data.index.map(lambda x: (label, x))
+    # data = OrderedDict(
+    #     [
+    #     ("number_of_proteins", df.shape[0]),
+    #     ("number_of_unique_hits", len(unique_identifiers)),
+    #     ("ids", unique_identifiers),
+    #     ("names", unique_names),
+    #     ],
+    # )
+    # data = pd.Series(data, name=id_protein_cluster)
+    # data.index = data.index.map(lambda x: (label, x))
+    data = [df.shape[0], len(unique_identifiers), unique_identifiers, unique_names]
     return data
 
 
@@ -487,63 +493,112 @@ def main(args=None):
     df_annotations.to_csv(os.path.join(opts.output_directory, "annotations.proteins.tsv.gz"), sep="\t")
 
     if opts.identifier_mapping:
-        # Protein clusters
-        protein_to_proteincluster = df_annotations[("Identifiers", "id_protein_cluster")]
-        protein_cluster_annotations = list()
-        for id_protein_cluster, df in pv(df_annotations.groupby(protein_to_proteincluster), description="Compiling consensus annotations for protein clusters", total=protein_to_proteincluster.nunique(), unit=" Protein Clusters"):
-            # Identifiers
-            data_identifiers = compile_identifiers(df["Identifiers"], id_protein_cluster)
-            
-            # UniRef
-            data_uniref = compile_uniref(df["UniRef"], id_protein_cluster)
-            
-            # MIBiG
-            data_mibig = compile_nonuniref_diamond(df["MIBiG"], id_protein_cluster, "MIBiG")
-
-            # VFDB
-            data_vfdb = compile_nonuniref_diamond(df["VFDB"], id_protein_cluster, "VFDB")
-
-            # CAZy
-            data_cazy = compile_nonuniref_diamond(df["CAZy"], id_protein_cluster, "CAZy")
-
-            # Pfam
-            data_pfam = compile_hmmsearch(df["Pfam"], id_protein_cluster, "Pfam")
-
-            # NCBIfam-AMR
-            data_amr = compile_hmmsearch(df["NCBIfam-AMR"], id_protein_cluster, "NCBIfam-AMR")
-
-            # KOFAM
-            data_kofam = compile_hmmsearch(df["KOFAM"], id_protein_cluster, "KOFAM")
-            
-            # AntiFam
-            data_antifam = compile_hmmsearch(df["AntiFam"], id_protein_cluster, "AntiFam")
-
-            # Composite name
-            composite_name = list()
-            composite_name += list(data_uniref[("UniRef","names")]) 
-            composite_name += list(data_kofam[("KOFAM", "names")]) 
-            composite_name += list(data_pfam[("Pfam","names")]) 
-            composite_name = opts.composite_name_joiner.join(composite_name)
-            data_consensus = pd.Series(composite_name, index=[("Consensus", "composite_name")])
-
-            # Concatenate
-            data_concatenated = pd.concat([
-                data_identifiers, 
-                data_consensus,
-                data_uniref,
-                data_mibig,
-                data_vfdb,
-                data_cazy,
-                data_pfam,
-                data_amr,
-                data_kofam,
-                data_antifam,
-            ])
-            data_concatenated.name = id_protein_cluster
-            protein_cluster_annotations.append(data_concatenated)
-
-        df_annotations_proteinclusters = pd.DataFrame(protein_cluster_annotations)
-        df_annotations_proteinclusters.to_csv(os.path.join(opts.output_directory, "annotations.protein_clusters.tsv.gz"), sep="\t")
+        with gzip.open(os.path.join(opts.output_directory, "annotations.protein_clusters.tsv.gz"), "wt") as f:
+            print("\t", 
+                  *["Identifiers"]*2,
+                  *["Consensus"]*1,
+
+                  *["UniRef"]*4,
+                  *["MIBiG"]*3,
+                  *["VFDB"]*3,
+                  *["CAZy"]*3,
+                  *["Pfam"]*4,
+                  *["NCBIfam-AMR"]*4,
+                  *["KOFAM"]*4,
+                  *["AntiFam"]*4,
+                  sep="\t", file=f)
+
+            print(
+                "id_protein_cluster", 
+                *["id_genome_cluster", "organsim_type"], #, "genomes", "samples_of_origin"], # Identifiers
+                *["composite_name"], # Consensus
+                *["number_of_proteins", "number_of_unique_hits", "ids","names"], # UniRef
+                *["number_of_proteins", "number_of_unique_hits", "ids"], # MIBiG
+                *["number_of_proteins", "number_of_unique_hits", "ids"], # VFDB
+                *["number_of_proteins", "number_of_unique_hits", "ids"], # CAZy
+                *["number_of_proteins", "number_of_unique_hits", "ids","names"], # Pfam
+                *["number_of_proteins", "number_of_unique_hits", "ids","names"], # NCBIfam-AMR
+                *["number_of_proteins", "number_of_unique_hits", "ids","names"], # KOFAM
+                *["number_of_proteins", "number_of_unique_hits", "ids","names"], # AntiFam
+                sep="\t",
+                file=f,
+                )
+            # Protein clusters
+            protein_to_proteincluster = df_annotations[("Identifiers", "id_protein_cluster")]
+            protein_cluster_annotations = list()
+            for id_protein_cluster, df in pv(df_annotations.groupby(protein_to_proteincluster), description="Compiling consensus annotations for protein clusters", total=protein_to_proteincluster.nunique(), unit=" Protein Clusters"):
+                # Identifiers
+                data_identifiers = compile_identifiers(df["Identifiers"], id_protein_cluster)
+                
+                # UniRef
+                data_uniref = compile_uniref(df["UniRef"], id_protein_cluster)
+                
+                # MIBiG
+                data_mibig = compile_nonuniref_diamond(df["MIBiG"], id_protein_cluster, "MIBiG")
+
+                # VFDB
+                data_vfdb = compile_nonuniref_diamond(df["VFDB"], id_protein_cluster, "VFDB")
+
+                # CAZy
+                data_cazy = compile_nonuniref_diamond(df["CAZy"], id_protein_cluster, "CAZy")
+
+                # Pfam
+                data_pfam = compile_hmmsearch(df["Pfam"], id_protein_cluster, "Pfam")
+
+                # NCBIfam-AMR
+                data_amr = compile_hmmsearch(df["NCBIfam-AMR"], id_protein_cluster, "NCBIfam-AMR")
+
+                # KOFAM
+                data_kofam = compile_hmmsearch(df["KOFAM"], id_protein_cluster, "KOFAM")
+                
+                # AntiFam
+                data_antifam = compile_hmmsearch(df["AntiFam"], id_protein_cluster, "AntiFam")
+
+                # Composite name
+                composite_name = list()
+                composite_name += list(data_uniref[-1]) 
+                composite_name += list(data_kofam[-1]) 
+                composite_name += list(data_pfam[-1]) 
+                composite_name = list(filter(lambda x: isinstance(x, str), composite_name))
+                if len(composite_name) > 0:
+                    composite_name = opts.composite_name_joiner.join(composite_name)
+                else:
+                    composite_name = np.nan
+
+                print(
+                    id_protein_cluster, 
+                    *data_identifiers, 
+                    composite_name, 
+                    *data_uniref,
+                    *data_mibig,
+                    *data_vfdb,
+                    *data_cazy,
+                    *data_pfam,
+                    *data_amr,
+                    *data_kofam,
+                    *data_antifam,
+                    sep="\t", 
+                    file=f,
+                    )
+                
+                # data_consensus = pd.Series(composite_name, index=[("Consensus", "composite_name")])
+                # # Concatenate
+                # data_concatenated = pd.concat([
+                #     data_identifiers, 
+                #     data_consensus,
+                #     data_uniref,
+                #     data_mibig,
+                #     data_vfdb,
+                #     data_cazy,
+                #     data_pfam,
+                #     data_amr,
+                #     data_kofam,
+                #     data_antifam,
+                # ])
+                # data_concatenated.name = id_protein_cluster
+                # protein_cluster_annotations.append(data_concatenated)
+            # df_annotations_proteinclusters = pd.DataFrame(protein_cluster_annotations)
+            # df_annotations_proteinclusters.to_csv(os.path.join(opts.output_directory, "annotations.protein_clusters.tsv.gz"), sep="\t")
 
 
 
diff --git a/src/scripts/merge_genome_quality_assessments.py b/src/scripts/merge_genome_quality_assessments.py
index a9a8be7..e20a4dd 100755
--- a/src/scripts/merge_genome_quality_assessments.py
+++ b/src/scripts/merge_genome_quality_assessments.py
@@ -4,7 +4,7 @@
 import pandas as pd
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2021.10.16"
+__version__ = "2021.11.9"
 
 def get_prokaryotic_description(x, fields=["Completeness_Model_Used", "Additional_Notes"]):
     output = list()
@@ -93,7 +93,7 @@ def main(args=None):
         print("Could not find any prokaryotic genome assessment tables from CheckM2 in the following directory: {}".format(opts.binning_directory), file=sys.stdout)
 
     # Viral
-    viral_genome_quality_files = glob.glob(os.path.join(opts.binning_directory, opts.viral_subdirectory_name, "*", "output", "checkmv_results.filtered.tsv"))
+    viral_genome_quality_files = glob.glob(os.path.join(opts.binning_directory, opts.viral_subdirectory_name, "*", "output", "checkv_results.filtered.tsv"))
     if viral_genome_quality_files:
         print("* Compiling viral genome quality from following files:", *viral_genome_quality_files, sep="\n    ", file=sys.stdout)
 
diff --git a/src/scripts/merge_taxonomy_classifications.py b/src/scripts/merge_taxonomy_classifications.py
index 46a2c9d..cb0f784 100755
--- a/src/scripts/merge_taxonomy_classifications.py
+++ b/src/scripts/merge_taxonomy_classifications.py
@@ -5,7 +5,7 @@
 from tqdm import tqdm
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2021.10.11"
+__version__ = "2021.12.11"
 
 def main(args=None):
     # Path info
@@ -62,7 +62,7 @@ def main(args=None):
         for fp in genome_taxonomy:
             id_domain = fp.split("/")[-3]
             df = pd.read_csv(fp, sep="\t", index_col=0)
-            if id_domain.lower() in {"viral", "virus"}:
+            if id_domain.lower() in {"viral", "virus", "virion"}:
                 for id_genome, taxonomy in df["lineage"].items():
                     genome_to_data[id_genome] = {"domain":id_domain, "taxonomy_classification":taxonomy}
             if id_domain.lower() in {"prokaryotic", "prokaryotes", "prokarya", "bacteria", "archaea", "bacterial","archael", "prok", "proks"}:
diff --git a/src/scripts/module_completion_ratios.py b/src/scripts/module_completion_ratios.py
index a209a57..7ba04d3 100755
--- a/src/scripts/module_completion_ratios.py
+++ b/src/scripts/module_completion_ratios.py
@@ -29,7 +29,7 @@
 from collections import OrderedDict, defaultdict
 import pandas as pd
 
-__version__ = "2023.10.23"
+__version__ = "2023.12.1"
 __program__ = os.path.split(sys.argv[0])[-1]
 
 ################################################################################
@@ -469,7 +469,7 @@ def main():
     parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
 
 
-    parser.add_argument('-i', '--ko_table', help='path/to/ko_table.tsv following [id_genome]<tab>[id_ko], No header.  Cannot be used with --ko_lists')
+    parser.add_argument('-i', '--ko_table', default="stdin", help='path/to/ko_table.tsv following [id_genome]<tab>[id_ko], No header.  Cannot be used with --ko_lists [Default: stdin]')
     parser.add_argument('-k', '--ko_lists', nargs='+', help='Space-delimited list of filepaths where each file represents a genome and each line in the file is a KO id.  Cannot be used with --ko_table')
     parser.add_argument('-o', '--output',  default="stdout", help='Output file for module completion ratios [Default: stdout]')
     parser.add_argument("-d", '--database_directory', required=True, help='path/to/database_directory with pickle files')
@@ -482,6 +482,9 @@ def main():
 
     opts = parser.parse_args()
 
+    if opts.ko_lists is not None:
+        if opts.ko_table == "stdin":
+            opts.ko_table = None
     assert bool(opts.ko_table) != bool(opts.ko_lists), "Must provide KOs as either a tsv table (--ko_table) or a list of KO ids in different files (--ko_lists)"
     if opts.ko_table == "stdin":
         opts.ko_table = sys.stdin 
diff --git a/src/scripts/partition_unbinned.py b/src/scripts/partition_unbinned.py
index d938507..bf4a2eb 100755
--- a/src/scripts/partition_unbinned.py
+++ b/src/scripts/partition_unbinned.py
@@ -4,7 +4,7 @@
 from Bio.SeqIO.FastaIO import SimpleFastaParser
 
 __program__ = os.path.split(sys.argv[0])[-1]
-__version__ = "2021.08.05"
+__version__ = "2023.12.18"
 
 def main(args=None):
     # Path info
@@ -24,7 +24,7 @@ def main(args=None):
     parser.add_argument("-b","--bins", type=str, required=True, help = "path/to/bins.list, No header")
     parser.add_argument("-f","--fasta", type=str, required=True, help = "path/to/fasta")
     parser.add_argument("-o","--output", type=str, default="stdout",  help = "Output fasta file [Default: stdout]")
-    parser.add_argument("-m", "--minimum_contig_length", type=int, default=1000, help="Minimum contig length.  [Default: 1000] ")
+    parser.add_argument("-m", "--minimum_contig_length", type=int, default=1, help="Minimum contig length.  [Default: 1] ")
     parser.add_argument("--mode", type=str, default="unbinned", help="Get 'unbinned' or 'binned' contigs [Default: 'unbinned'] ")
 
 
diff --git a/src/scripts/reformat_sylph_profile_single_sample_output.py b/src/scripts/reformat_sylph_profile_single_sample_output.py
new file mode 100755
index 0000000..be7a531
--- /dev/null
+++ b/src/scripts/reformat_sylph_profile_single_sample_output.py
@@ -0,0 +1,68 @@
+#!/usr/bin/env python
+import sys, os, argparse, gzip 
+import pandas as pd
+from tqdm import tqdm
+
+__program__ = os.path.split(sys.argv[0])[-1]
+__version__ = "2023.11.10"
+
+def filepath_to_genome(fp, extension):
+    assert fp.endswith(extension)
+    fn = os.path.split(fp)[1]
+    return fn[:-(len(extension) + 1)]
+
+def main(args=None):
+    # Path info
+    script_directory  =  os.path.dirname(os.path.abspath( __file__ ))
+    script_filename = __program__
+
+    # Path info
+    description = """
+    Running: {} v{} via Python v{} | {}""".format(__program__, __version__, sys.version.split(" ")[0], sys.executable)
+    usage = "{} -i <input.fasta> -o <output.fasta>)".format(__program__)
+    epilog = "Copyright 2021 Josh L. Espinoza (jespinoz@jcvi.org)"
+
+    # Parser
+    parser = argparse.ArgumentParser(description=description, usage=usage, epilog=epilog, formatter_class=argparse.RawTextHelpFormatter)
+
+    # Pipeline
+    parser.add_argument("-i","--input", default="stdin", type=str, help = "Input fasta file")
+    parser.add_argument("-o","--output_directory", required=True, type=str, help = "Output directory to write output files")
+    # parser.add_argument("-n", "--name", type=str, required=False, help="Name of sample")
+    parser.add_argument("-c","--genome_clusters", type=str, help = "path/to/mags_to_slcs.tsv. [id_genome]<tab>[id_genome-cluster], No header.")
+    parser.add_argument("-f","--field", type=str, default="Taxonomic_abundance", help = "Field to use for reformating [Default: Taxonomic_abundance]")
+    parser.add_argument("-x","--extension", type=str, default="fa", help = "Fasta file extension for bins [Default: fa]")
+    parser.add_argument("--header", action="store_true",  help = "Do not include header.  Doesn't apply to unstacked dataframe.")
+
+    # Options
+    opts = parser.parse_args()
+    opts.script_directory  = script_directory
+    opts.script_filename = script_filename
+
+    # Input
+    if opts.input == "stdin":
+        opts.input = sys.stdin 
+    
+    # Output
+    os.makedirs(opts.output_directory, exist_ok=True)
+
+    # Process
+    df_sylph = pd.read_csv(opts.input, sep="\t")
+    assert opts.field in df_sylph.columns, "--field {} not in --input columns: {}".format(opts.field, ", ".join(df_sylph.columns))
+
+    genome_to_value = df_sylph.set_index("Genome_file")[opts.field]
+    genome_to_value.index = genome_to_value.index.map(lambda fp: filepath_to_genome(fp, opts.extension))
+
+    # Output genome values
+    genome_to_value.to_frame(opts.field.lower()).to_csv(os.path.join(opts.output_directory, "{}.tsv.gz".format(opts.field.lower())), sep="\t", header=bool(opts.header))
+
+    if opts.genome_clusters:
+        genome_to_slc = pd.read_csv(opts.genome_clusters, sep="\t", index_col=0).iloc[:,0]
+        slc_to_value = genome_to_value.groupby(genome_to_slc).sum()
+        slc_to_value.to_frame(opts.field.lower()).to_csv(os.path.join(opts.output_directory, "{}.clusters.tsv.gz".format(opts.field.lower())), sep="\t", header=bool(opts.header))
+
+if __name__ == "__main__":
+    main()
+    
+                
+
diff --git a/src/veba b/src/veba
new file mode 100755
index 0000000..132127a
--- /dev/null
+++ b/src/veba
@@ -0,0 +1,194 @@
+#!/bin/bash
+# v2023.12.18
+
+# Define available modules
+AVAILABLE_MODULES=(
+"annotate"
+"assembly-long"
+"assembly"
+"binning-eukaryotic"
+"binning-prokaryotic"
+"binning-viral"
+"biosynthetic"
+"classify-eukaryotic"
+"classify-prokaryotic"
+"classify-viral"
+"cluster"
+"coverage-long"
+"coverage"
+"index"
+"mapping"
+"phylogeny"
+"preprocess-long"
+"preprocess"
+"profile-pathway"
+"profile-taxonomy"
+)
+
+# Conda base
+CONDA_BASE=$(conda info --base)
+
+# Script directory
+SCRIPT_DIRECTORY=$(dirname $0)
+
+# Function to display script usage
+show_help() {
+    echo -e "-------------------------------"
+    echo " "
+    echo -e " _    _ _______ ______  _______\n  \  /  |______ |_____] |_____|\n   \/   |______ |_____] |     |"
+    echo " "
+    echo -e "-------------------------------"
+
+    echo "Usage: $0 [-m <module>] [-o <options>] [-v|--version] [-h|--help]"
+    echo -e "Example: veba --module preprocess --params \"-1 S1_1.fq.gz -2 S1_2.fq.gz -n S1 -o veba_output/preprocess\""
+    echo -e "GitHub: https://github.com/jolespin"
+    echo -e "Developer: Josh L. Espinoza, PhD (ORCiD: 0000-0003-3447-3845)"
+    echo " "
+    echo "Options:"
+    echo "  -m, --module    Specify the module.  Available modules: ${AVAILABLE_MODULES[*]}"
+    echo "  -p, --params    Specify parameters to give to each module"
+    echo "  -v, --version   Display the version information"
+    echo "  -h, --help      Display this help message"
+    exit 0
+}
+
+# Parse command-line arguments
+ARGS=$(getopt -o m:p:vh --long module:,params:,version,help -n "$0" -- "$@")
+
+# Exit if getopt encounters an error
+if [ $? -ne 0 ]; then
+    exit 1
+fi
+
+eval set -- "$ARGS"
+
+# Default values
+MODULE=""
+PARAMS="-h"
+
+# Process command-line options
+while true; do
+    case "$1" in
+        -m|--module)
+            MODULE="$2"
+            shift 2
+            ;;
+        -p|--params)
+            PARAMS="$2"
+            shift 2
+            ;;
+        -v|--version)
+            echo "VEBA Version:"
+            cat "${SCRIPT_DIRECTORY}/VEBA_VERSION"
+            exit 0
+            ;;
+        -h|--help)
+            show_help
+            ;;
+        --)
+            shift
+            break
+            ;;
+        *)
+            echo "Unknown option: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# Validate required arguments
+if [ -z "$MODULE" ]; then
+    echo "Module is required. Use --module."
+    exit 1
+fi
+
+
+# Check if the specified module is valid
+if [[ ! " ${AVAILABLE_MODULES[@]} " =~ " $MODULE " ]]; then
+    echo "Invalid module. Must be one of: ${AVAILABLE_MODULES[*]}"
+    exit 1
+fi
+
+# Perform tasks based on the specified module
+case $MODULE in
+    "annotate")
+        source "${CONDA_BASE}/bin/activate" VEBA-annotate_env
+        annotate.py $PARAMS
+        ;;
+    "assembly-long")
+        source "${CONDA_BASE}/bin/activate" VEBA-assembly_env
+        assembly-long.py $PARAMS
+        ;;
+    "assembly")
+        source "${CONDA_BASE}/bin/activate" VEBA-assembly_env
+        assembly.py $PARAMS
+        ;;
+    "binning-eukaryotic")
+        source "${CONDA_BASE}/bin/activate" VEBA-binning-eukaryotic_env
+        binning-eukaryotic.py $PARAMS
+        ;;
+    "binning-prokaryotic")
+        source "${CONDA_BASE}/bin/activate" VEBA-binning-prokaryotic_env
+        binning-prokaryotic.py $PARAMS
+        ;;
+    "binning-viral")
+        source "${CONDA_BASE}/bin/activate" VEBA-binning-viral_env
+        binning-viral.py $PARAMS
+        ;;
+    "biosynthetic")
+        source "${CONDA_BASE}/bin/activate" VEBA-biosynthetic_env
+        biosynthetic.py $PARAMS
+        ;;
+    "classify-eukaryotic")
+        source "${CONDA_BASE}/bin/activate" VEBA-classify_env
+        classify-eukaryotic.py $PARAMS
+        ;;
+    "classify-prokaryotic")
+        source "${CONDA_BASE}/bin/activate" VEBA-classify_env
+        classify-prokaryotic.py $PARAMS
+        ;;
+    "classify-viral")
+        source "${CONDA_BASE}/bin/activate" VEBA-classify_env
+        classify-viral.py $PARAMS
+        ;;
+    "cluster")
+        source "${CONDA_BASE}/bin/activate" VEBA-cluster_env
+        cluster.py $PARAMS
+        ;;
+    "coverage-long")
+        source "${CONDA_BASE}/bin/activate" VEBA-assembly_env
+        coverage-long.py $PARAMS
+        ;;
+    "coverage")
+        source "${CONDA_BASE}/bin/activate" VEBA-assembly_env
+        coverage.py $PARAMS
+        ;;
+    "index")
+        source "${CONDA_BASE}/bin/activate" VEBA-mapping_env
+        index.py $PARAMS
+        ;;
+    "mapping")
+        source "${CONDA_BASE}/bin/activate" VEBA-mapping_env
+        mapping.py $PARAMS
+        ;;
+    "phylogeny")
+        source "${CONDA_BASE}/bin/activate" VEBA-phylogeny_env
+        phylogeny.py $PARAMS
+        ;;
+    "preprocess-long")
+        source "${CONDA_BASE}/bin/activate" VEBA-preprocess_env
+        preprocess-long.py $PARAMS
+        ;;
+    "preprocess")
+        source "${CONDA_BASE}/bin/activate" VEBA-preprocess_env
+        preprocess.py $PARAMS
+        ;;
+    "profile-pathway")
+        source "${CONDA_BASE}/bin/activate" VEBA-profile_env
+        profile-pathway.py $PARAMS
+        ;;
+    "profile-taxonomy")
+        source "${CONDA_BASE}/bin/activate" VEBA-profile_env
+        profile-taxonomy.py $PARAMS
+        ;;
+esac
\ No newline at end of file
diff --git a/src/get_script_versions.sh b/src/veba_versions.sh
similarity index 100%
rename from src/get_script_versions.sh
rename to src/veba_versions.sh
diff --git a/walkthroughs/README.md b/walkthroughs/README.md
index 3586d83..2aecec6 100644
--- a/walkthroughs/README.md
+++ b/walkthroughs/README.md
@@ -31,29 +31,43 @@ sbatch -J ${N} -N 1 -c ${N_JOBS} --ntasks-per-node=1 -o logs/${N}.o -e logs/${N}
 
 #### Available walkthroughs:
 
+##### Accessing SRA: 
+
 *  **[Downloading and preprocessing fastq files](download_and_preprocess_reads.md)** - Explains how to download reads from NCBI and run *VEBA's* `preprocess.py` module to decontaminate either metagenomic and/or metatranscriptomic reads.
+
+##### End-to-end workflows:
+
 * **[Complete end-to-end metagenomics analysis](end-to-end_metagenomics.md)** - Goes through assembling metagenomic reads, binning, clustering, classification, and annotation.  We also show how to use the unbinned contigs in a pseudo-coassembly with guidelines on when it's a good idea to go this route.
 *  **[Recovering viruses from metatranscriptomics](recovering_viruses_from_metatranscriptomics.md)** - Goes through assembling metatranscriptomic reads, viral binning, clustering, and classification.
-*  **[Read mapping and counts tables](read_mapping_and_counts_tables.md)** - Read mapping and generating counts tables at the contig, MAG, SLC, ORF, and SSO levels. 
-* **[Phylogenetic inference](phylogenetic_inference.md)** - Phylogenetic inference of eukaryotic diatoms.
 * **[Setting up *bona fide* coassemblies for metagenomics or metatranscriptomics](setting_up_coassemblies.md)** - In the case where all samples are of low depth, it may be useful to use coassembly instead of sample-specific approaches.  This walkthrough goes through concatenating reads, creating a reads table, coassembly of concatenated reads, aligning sample-specific reads to the coassembly for multiple sorted BAM files, and mapping reads for scaffold/transcript-level counts.  Please note that a coassembly differs from the pseudo-coassembly concept introduced in the VEBA publication.  For more information regarding the differences between *bona fide* coassembly and pseud-coassembly, please refer to [*23. What's the difference between a coassembly and a pseudo-coassembly?*](https://github.com/jolespin/veba/blob/main/FAQ.md#23-whats-the-difference-between-a-coassembly-and-a-pseudo-coassembly). 
+
+##### Phylogenetics:
+
+* **[Phylogenetic inference](phylogenetic_inference.md)** - Phylogenetic inference of eukaryotic diatoms.
+
+##### Bioprospecting:
+
 * **[Bioprospecting for biosynthetic gene clusters](bioprospecting_for_biosynthetic_gene_clusters.md)** - Detecting biosynthetic gene clusters (BGC) with and scoring novelty of BGCs.
+
+##### Mapping reads and rapid profiling:
+
+*  **[Read mapping and counts tables](read_mapping_and_counts_tables.md)** - Read mapping and generating counts tables at the contig, MAG, SLC, ORF, and SSO levels. 
+* **[Taxonomic profiling *de novo* genomes](taxonomic_profiling_de-novo_genomes.md)** - Explains how to build and profile reads to custom `Sylph` databases from *de novo* genomes.
+* **[Pathway profiling *de novo* genomes](pathway_profiling_de-novo_genomes.md)** - Explains how to build and align reads to custom `HUMAnN` databases from *de novo* genomes and annotations.
 * **[Converting counts tables](converting_counts_tables.md)** - Convert your counts table (with or without metadata) to [anndata](https://anndata.readthedocs.io/en/latest/index.html) or [biom](https://biom-format.org/) format.  Also supports [Pandas pickle](https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html) format.
+
+##### Containerization and AWS:
+
 * **[Adapting commands for Docker](adapting_commands_for_docker.md)** - Explains how to download and use Docker for running VEBA.
 * **[Adapting commands for AWS](adapting_commands_for_aws.md)** - Explains how to download and use Docker for running VEBA specifically on AWS.
-* **[Metabolic Profiling *de novo* genomes](metabolic_profiling_de-novo_genomes.md)** - Explains how to build and align reads to custom `HUMAnN` databases from *de novo* genomes and annotations.
-
 
 ___________________________________________
 
 **Coming Soon:**
 
 * Workflow for low-depth samples with no bins
-* Workflow for ASV detection from short-read amplicons
-* Workflows for integrating 3rd party software with *VEBA*:
-	* Using [EukHeist](https://github.com/AlexanderLabWHOI/EukHeist) for eukaryotic binning followed by *VEBA* for mapping and annotation.
-	* Using [EukMetaSanity](https://github.com/cjneely10/EukMetaSanity) for modeling genes for eukaryotic genomes recovered with *VEBA*.
-
+* Assigning eukaryotic taxonomy to unbinned contigs
+* Bioprospecting using [`PlasticDB` database](https://plasticdb.org/)
 ___________________________________________
 
 ##### Notes:
diff --git a/walkthroughs/adapting_commands_for_aws.md b/walkthroughs/adapting_commands_for_aws.md
index bc6acb8..091fe36 100644
--- a/walkthroughs/adapting_commands_for_aws.md
+++ b/walkthroughs/adapting_commands_for_aws.md
@@ -38,7 +38,7 @@ This job definition pulls the [jolespin/veba_preprocess](https://hub.docker.com/
   "jobDefinitionName": "preprocess__S1",
   "type": "container",
   "containerProperties": {
-    "image": "jolespin/veba_preprocess:1.3.0",
+    "image": "jolespin/veba_preprocess:1.4.0",
     "command": [
       "preprocess.py",
       "-1",
diff --git a/walkthroughs/adapting_commands_for_docker.md b/walkthroughs/adapting_commands_for_docker.md
index 939780a..b6d4899 100644
--- a/walkthroughs/adapting_commands_for_docker.md
+++ b/walkthroughs/adapting_commands_for_docker.md
@@ -24,7 +24,7 @@ Refer to the [Docker documentation](https://docs.docker.com/engine/install/).
 Let's say you wanted to use the `preprocess` module.  Download the Docker image as so: 
 
 ```
-VERSION=1.3.0
+VERSION=1.4.0
 docker image pull jolespin/veba_preprocess:${VERSION}
 ``` 
 
@@ -36,7 +36,7 @@ For example, here's how we would run the `preprocess.py` module.  First let's ju
 
 ```bash
 # Version
-VERSION=1.2.0
+VERSION=1.4.0
 
 # Image
 DOCKER_IMAGE="jolespin/veba_preprocess:${VERSION}"
@@ -90,7 +90,7 @@ CMD="preprocess.py -1 ${CONTAINER_INPUT_DIRECTORY}/${R1} -2 ${CONTAINER_INPUT_DI
 
 # Docker
 # Version
-VERSION=1.2.0
+VERSION=1.4.0
 
 # Image
 DOCKER_IMAGE="jolespin/veba_preprocess:${VERSION}"
diff --git a/walkthroughs/bioprospecting_for_biosynthetic_gene_clusters.md b/walkthroughs/bioprospecting_for_biosynthetic_gene_clusters.md
index 23be35e..54f6187 100644
--- a/walkthroughs/bioprospecting_for_biosynthetic_gene_clusters.md
+++ b/walkthroughs/bioprospecting_for_biosynthetic_gene_clusters.md
@@ -12,6 +12,8 @@ _____________________________________________________
 1. Compile table of genomes and gene models
 2. Identify biosynthetic gene clusters and score novelty
 
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
+
 _____________________________________________________
 
 
@@ -37,8 +39,6 @@ We only need the `[id_genome] [path/to/genome.fasta] [path/to/gene_models.gff]`
 
 Now that we have our genome table formatted so it is `[id_genome] [path/to/genome.fasta] [path/to/gene_models.gff]` without headers, we can run the `biosynthetic.py` module to identify biosynthetic gene clusters via `antiSMASH` and detect homology of components to the `MIBiG` database.
 
-**Conda Environment:** `conda activate VEBA-biosynthetic_env`
-
 
 ```
 # Set the number of threads
@@ -55,7 +55,7 @@ GENOMES=veba_output/misc/genomes_gene-models.tsv
 OUT_DIR=veba_output/biosynthetic/prokaryotic
 
 # Directory
-CMD="source activate VEBA-biosynthetic_env && biosynthetic.py -i ${GENOMES} -o ${OUT_DIR} -p ${N_JOBS} -t bacteria"
+CMD="source activate VEBA && veba --module biosynthetic --params \"-i ${GENOMES} -o ${OUT_DIR} -p ${N_JOBS} -t bacteria\""
 
 # Either run this command or use SunGridEnginge/SLURM
 ```
diff --git a/walkthroughs/converting_counts_tables.md b/walkthroughs/converting_counts_tables.md
index 13c2721..1e8a7b6 100644
--- a/walkthroughs/converting_counts_tables.md
+++ b/walkthroughs/converting_counts_tables.md
@@ -15,9 +15,9 @@ _____________________________________________________
 2. Provide a counts table and sample metadata
 3. Provide a counts table, sample metadata, and 
 
-_____________________________________________________
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
 
-**Conda Environment:** `conda activate VEBA-mapping_env`
+_____________________________________________________
 
 
 #### 1. Let's convert to a Python pickle object without any metadata
diff --git a/walkthroughs/download_and_preprocess_reads.md b/walkthroughs/download_and_preprocess_reads.md
index 2da95a5..6219160 100644
--- a/walkthroughs/download_and_preprocess_reads.md
+++ b/walkthroughs/download_and_preprocess_reads.md
@@ -11,7 +11,7 @@ If you want to either remove human contamination or count ribosomal reads then m
 
 ```
 echo $VEBA_DATABASE
-/expanse/projects/jcl110/db/veba/VDB_v4 
+/expanse/projects/jcl110/db/veba/VDB_v6
 # ^_^ Yours will be different obviously #
 ```
 
@@ -111,28 +111,12 @@ Here we are going to count the reads for the human contamination and ribosomal r
 
 * ⚠️ If your host is not human then you will need to use a different contamination reference.  See item #22 in the [FAQ](https://github.com/jolespin/veba/blob/main/FAQ.md).
 
-* ⚠️ As of 2022.10.18 *VEBA* has switched from using the "GRCh38 no alt analysis set" to the "CHM13v2.0 telomore-to-telomere" build for human.  If you've installed *VEBA* before this date or are using `v1.0.0` release from [Espinoza et al. 2022](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04973-8) then you can update with the following code:
-
-```
-conda activate VEBA-database_env
-wget -v -P ${VEBA_DATABASE} https://genome-idx.s3.amazonaws.com/bt/chm13v2.0.zip
-unzip -d ${VEBA_DATABASE}/Contamination/ ${VEBA_DATABASE}/chm13v2.0.zip
-rm -rf ${VEBA_DATABASE}/chm13v2.0.zip
-
-# Use this if you want to remove the previous GRCh38 index
-rm -rf ${VEBA_DATABASE}/Contamination/grch38/
-```
-
-Continuing with the tutorial...just make note of the human index here and swap out GRCh38 for CHM13v2.0 if you decided to update:
 
 ```
 N_JOBS=4
 
-# Human Bowtie2 index
-HUMAN_INDEX=${VEBA_DATABASE}/Contamination/grch38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index
-
-# or use this if you have updated from GRCh38 to CHM13v2.0
-# HUMAN_INDEX=${VEBA_DATABASE}/Contamination/chm13v2.0/chm13v2.0
+# CHM13v2.0
+HUMAN_INDEX=${VEBA_DATABASE}/Contamination/chm13v2.0/chm13v2.0
 
 # Ribosomal k-mer fasta
 RIBOSOMAL_KMERS=${VEBA_DATABASE}/Contamination/kmers/ribokmers.fa.gz
@@ -151,7 +135,7 @@ for ID in  $(cat identifiers.list); do
 	rm -f logs/${N}.*
 	
 	# Set up the command (use source from base environment instead of conda because of the `init` issues)
-	CMD="source activate VEBA-preprocess_env && preprocess.py -n ${ID} -1 ${R1} -2 ${R2} -p ${N_JOBS} -x ${HUMAN_INDEX} -k ${RIBOSOMAL_KMERS} --retain_contaminated_reads 0 --retain_kmer_hits 0 --retain_non_kmer_hits 0 -o veba_output/preprocess"
+	CMD="source activate VEBA && veba --module preprocess --params \"-n ${ID} -1 ${R1} -2 ${R2} -p ${N_JOBS} -x ${HUMAN_INDEX} -k ${RIBOSOMAL_KMERS} --retain_contaminated_reads 0 --retain_kmer_hits 0 --retain_non_kmer_hits 0 -o veba_output/preprocess\""
 	
 	# If you have SunGrid engine, do something like this:
 	# qsub -o logs/${N}.o -e logs/${N}.e -cwd -N ${N} -j y -pe threaded ${N_JOBS} "${CMD}"
@@ -161,7 +145,7 @@ for ID in  $(cat identifiers.list); do
 	
 	done
 ```
-Note: `preprocess.py` is a wrapper around `fastq_preprocessor` which takes in 0 and 1 as False and True, respectively.  The reasoning for this is that I was able to keep the prefix `retain` while setting defaults easier.
+Note: `preprocess` is a wrapper around `fastq_preprocessor`.
 
 It creates the following directory structure where each sample is it's own subdirectory.  Makes globbing much easier:
 
diff --git a/walkthroughs/end-to-end_metagenomics.md b/walkthroughs/end-to-end_metagenomics.md
index baa22b8..c2592c8 100644
--- a/walkthroughs/end-to-end_metagenomics.md
+++ b/walkthroughs/end-to-end_metagenomics.md
@@ -23,12 +23,12 @@ _____________________________________________________
 12. Classify eukaryotic genomes
 13. Annotate proteins
 
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
+
 _____________________________________________________
 
 #### 1. Preprocess reads and get directory set up
 
-**Conda Environment:** `conda activate VEBA-preprocess_env`
-
 Refer to the [downloading and preprocessing reads walkthrough](download_and_preprocess_reads.md).  At this point, it's assumed you have the following: 
 
 * A file with all of your identifiers on a separate line (e.g., `identifiers.list` but you can call it whatever you want)
@@ -41,8 +41,6 @@ Here we are going to assemble all of the reads using `metaSPAdes`.  If you have
 
 **Recommended memory request:** For this *Plastisphere* dataset, I requested `64GB` of memory from my HPC.  Though, this will change depending on how deep your samples are sequenced.
 
-**Conda Environment:** `conda activate VEBA-assembly_env`
-
 ```
 # Set the number of threads to use for each sample. Let's use 4
 N_JOBS=4
@@ -65,7 +63,7 @@ for ID in $(cat identifiers.list); do
 	R2=veba_output/preprocess/${ID}/output/cleaned_2.fastq.gz
 	
 	# Set up command
-	CMD="source activate VEBA-assembly_env && assembly.py -1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -P metaspades.py"
+	CMD="source activate VEBA && veba --module assembly --params \"-1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -P metaspades.py\""
 	
 	# Either run this command or use SunGridEnginge/SLURM
 	
@@ -98,7 +96,6 @@ Let's start the binning with viruses since this is performed on a per-contig bas
 
 **Recommended memory request:** `16 GB`
 
-**Conda Environment:** `conda activate VEBA-binning-viral_env`
 
 ```
 N_JOBS=4
@@ -108,7 +105,7 @@ for ID in $(cat identifiers.list);
 	rm -f logs/${N}.*
 	FASTA=veba_output/assembly/${ID}/output/scaffolds.fasta
 	BAM=veba_output/assembly/${ID}/output/mapped.sorted.bam
-	CMD="source activate VEBA-binning-viral_env && binning-viral.py -f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -m 1500 -o veba_output/binning/viral"
+	CMD="source activate VEBA && veba --module binning-viral --params \"-f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -m 1500 -o veba_output/binning/viral\""
 	# Either run this command or use SunGridEnginge/SLURM
 
 	done
@@ -140,11 +137,8 @@ Here we are going to perform iterative prokaryotic binning.  It's difficult to s
 
 If you have a lot of samples and a lot of contigs then use the `--skip_maxbin2` flag because it takes MUCH longer to run.  For the *Plastisphere* it was going to take 40 hours per `MaxBin2` run (there are 2 `MaxBin2` runs) per iteration.  `Metabat2` and `CONCOCT` can do the heavy lifting much faster and often with better results so it's recommended to skip `MaxBin2` for larger datasets.  
 
-**Recommended memory request:** `10GB` 
-
-*Versions prior to `v1.1.0` were reliant on `GTDB-Tk` which needed at least `60GB`.  `GTDB-Tk` is no longer required with the update of `CheckM` to `CheckM2`.*
+**Recommended memory request:** `16GB` 
 
-**Conda Environment:** `conda activate VEBA-binning-prokaryotic_env`
 
 ```
 N_JOBS=4
@@ -161,7 +155,7 @@ for ID in $(cat identifiers.list); do
 	FASTA=veba_output/binning/viral/${ID}/output/unbinned.fasta
 	BAM=veba_output/assembly/${ID}/output/mapped.sorted.bam
 	
-	CMD="source activate VEBA-binning-prokaryotic_env && binning-prokaryotic.py -f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -o ${OUT_DIR} -m 1500 -I ${N_ITER}"
+	CMD="source activate VEBA && veba --module binning-prokaryotic --params \"-f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -o ${OUT_DIR} -m 1500 -I ${N_ITER}\""
 
 	# Either run this command or use SunGridEnginge/SLURM
 
@@ -194,9 +188,7 @@ for ID in $(cat identifiers.list); do
 #### 5. Recover eukaryotes from metagenomic assemblies
 Let's take the unbinned contigs from the prokaryotic binning and recover eukayoritc genomes.  Unfortunately, we aren't going to do iterative binning here because there aren't any tools that can handle consensus genome binning as there is with prokaryotes (e.g., *DAS Tool*).  We have the option to use either *Metabat2* or *CONCOCT*.  In our experience, *Metabat2* works better for recovering eukaryotic genomes from metagenomes and it's also faster as well.
 
-**Recommended memory request:** `128GB` 
-
-**Conda Environment:** `conda activate VEBA-binning-eukaryotic_env`
+**Recommended memory request:** `48GB` 
 
 ```
 N_JOBS=4
@@ -209,7 +201,7 @@ for ID in $(cat identifiers.list); do
 	rm -f logs/${N}.*
 	FASTA=veba_output/binning/prokaryotic/${ID}/output/unbinned.fasta
 	BAM=veba_output/assembly/${ID}/output/mapped.sorted.bam
-	CMD="source activate VEBA-binning-eukaryotic_env && binning-eukaryotic.py -f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -m 1500 -a metabat2 -o ${OUT_DIR}"
+	CMD="source activate VEBA && veba --module binning-eukaryotic --params \"-f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -m 1500 -a metabat2 -o ${OUT_DIR}\""
 	
 	# Either run this command or use SunGridEnginge/SLURM
 
@@ -257,8 +249,6 @@ That said, if you decide to move forward with the multi-sample approach then the
 
 **Recommended memory request:** `24 GB`
 
-**Conda Environment:** `conda activate VEBA-assembly_env`
-
 ```
 
 
@@ -270,8 +260,6 @@ mkdir -p veba_output/misc
 # I recommend having this light-weight program in base environment 
 # if you do a lot of fasta manipulation. 
 
-conda activate VEBA-preprocess_env
-
 # --------------------------------------------------------------------
 
 # Method 1) Shortcut
@@ -302,11 +290,11 @@ compile_reads_table.py -i veba_output/preprocess/ -r > veba_output/misc/reads_ta
 
 # Now let's map all the reads to the pseudo-coassembly (i.e., all_sample_specific_mags.unbinned_contigs.gt1500.fasta)
 
-N=pseudo-coassembly
+N=Multisample
 
 N_JOBS=16 # Let's use more threads here because we are going to be handling multiple samples at once
 
-CMD="source activate VEBA-assembly_env && coverage.py -f veba_output/misc/all_sample_specific_mags.unbinned_contigs.gt1500.fasta -r veba_output/misc/reads_table.tsv -p ${N_JOBS} -o veba_output/assembly/pseudo-coassembly -m 1500"
+CMD="source activate VEBA && veba --module coverage --params \"-f veba_output/misc/all_sample_specific_mags.unbinned_contigs.gt1500.fasta -r veba_output/misc/reads_table.tsv -p ${N_JOBS} -o veba_output/assembly/Multisample -m 1500\""
 
 # Either run this command or use SunGridEnginge/SLURM
 ```
@@ -327,8 +315,6 @@ Let's try to recover some prokaryotes using the concatenated unbinned contigs.
 **Recommended memory request:** `10 - 24GB` 
 
 
-**Conda Environment:** `conda activate VEBA-binning-prokaryotic_env`
-
 ```
 # Setting more threads since we are only running this once
 N_JOBS=32
@@ -337,14 +323,14 @@ N_JOBS=32
 N_ITER=5
 
 # Set up filepaths and names
-NAME="pseudo-coassembly"
+NAME="Multisample"
 N="binning-prokaryotic__${NAME}"
 rm -f logs/${N}.*
-FASTA=veba_output/assembly/pseudo-coassembly/output/reference.fasta
-BAMS=veba_output/assembly/pseudo-coassembly/output/*/mapped.sorted.bam
+FASTA=veba_output/assembly/${NAME}/output/reference.fasta
+BAMS=veba_output/assembly/${NAME}/output/*/mapped.sorted.bam
 
 # Set up command
-CMD="source activate VEBA-binning-prokaryotic_env && binning-prokaryotic.py -f ${FASTA} -b ${BAMS} -n ${NAME} -p ${N_JOBS} -m 1500 -I ${N_ITER} --skip_maxbin2"
+CMD="source activate VEBA && veba --module binning-prokaryotic --params \"-f ${FASTA} -b ${BAMS} -n ${NAME} -p ${N_JOBS} -m 1500 -I ${N_ITER} --skip_maxbin2\""
 
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -356,9 +342,8 @@ Check Step 4 for the output file descriptions.
 #### ⚠️ 8. Recover eukaryotes from pseudo-coassembly [Optional]
 Let's try to recover some eukaryotes using the updated concatenated unbinned contigs. 
 
-**Recommended memory request:** `128GB` 
+**Recommended memory request:** `48 GB` 
 
-**Conda Environment:** `conda activate VEBA-binning-eukaryotic_env`
 
 
 ```
@@ -366,14 +351,14 @@ Let's try to recover some eukaryotes using the updated concatenated unbinned con
 N_JOBS=32
 
 # Set up filepaths and names
-NAME="pseudo-coassembly"
+NAME="Multisample"
 N="binning-eukaryotic__${NAME}"
 rm -f logs/${N}.*
 FASTA=veba_output/binning/prokaryotic/${NAME}/output/unbinned.fasta
 BAMS=veba_output/assembly/${NAME}/output/*/mapped.sorted.bam
 
 # Set up command
-CMD="source activate VEBA-binning-eukaryotic_env && binning-eukaryotic.py -f ${FASTA} -b ${BAMS} -n ${NAME} -p ${N_JOBS} -m 1500 -a metabat2  -o veba_output/binning/eukaryotic"
+CMD="source activate VEBA && veba --module binning-eukaryotic --params \"-f ${FASTA} -b ${BAMS} -n ${NAME} -p ${N_JOBS} -m 1500 -a metabat2  -o veba_output/binning/eukaryotic\""
 
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -389,8 +374,6 @@ To analyze these data, we are going to generate some counts tables and we want a
 
 **Recommended memory request:** `24 GB` should work for most datasets but you may need to increase for much larger datasets.
 
-**Conda Environment:** `conda activate VEBA-cluster_env`
-
 
 ```
 # We need to generate a table with the following fields:
@@ -403,7 +386,7 @@ compile_genomes_table.py -i veba_output/binning/ > veba_output/misc/genomes_tabl
 N_JOBS=12
 
 # Set up command
-CMD="source activate VEBA-cluster_env && cluster.py -i veba_output/misc/genomes_table.tsv -o veba_output/cluster -p ${N_JOBS}"
+CMD="source activate VEBA && veba --module cluster --params \"-i veba_output/misc/genomes_table.tsv -o veba_output/cluster -p ${N_JOBS}\""
 	
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -433,15 +416,13 @@ CMD="source activate VEBA-cluster_env && cluster.py -i veba_output/misc/genomes_
 * global/pangenome_tables/*.tsv.gz - Pangenome tables for each SLC with prevalence values
 * global/serialization/*.dict.pkl - Python dictionaries for clusters
 * global/serialization/*.networkx_graph.pkl - NetworkX graphs for clusters
-* local/* - If `--no_local_clustering` is not selected then all of the files are generated for local clustering
+* local/* - If `--local_clustering` is selected then all of the files are generated for local clustering
 
 
 #### 10. Classify viral genomes
 Viral classification is performed using `geNomad`.  Classification can be performed using the intermediate binning results which is much quicker.  Alternatively, if you have viruses identified elsewhere you can still classify using the `--genomes` argument instead.
 
-**Recommended memory request:** `1 GB` will work if you've performed viral binning via *VEBA*.  If not, these use `16 GB` for external genomes. 
-
-**Conda Environment:** `conda activate VEBA-classify_env`
+**Recommended memory request:** `1 GB` should work if you've performed viral binning via *VEBA*.  If not, these use `16 GB` for external genomes. 
 
 ```
 N=classify-viral
@@ -458,7 +439,7 @@ CLUSTERS=veba_output/cluster/output/global/mags_to_slcs.tsv
 rm -rf logs/${N}.*
 
 # Set up the command
-CMD="source activate VEBA-classify_env && classify-viral.py -i ${BINNING_DIRECTORY} -c ${CLUSTERS} -o veba_output/classify/viral -p ${N_JOBS}"
+CMD="source activate VEBA && veba --module classify-viral --params \"-i ${BINNING_DIRECTORY} -c ${CLUSTERS} -o veba_output/classify/viral -p ${N_JOBS}\""
 
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -474,7 +455,6 @@ Prokaryotic classification is performed using `GTDB-Tk`.  Classification can be
 
 **Recommended memory request:** `72 GB`
 
-**Conda Environment:** `conda activate VEBA-classify_env`
 
 ```
 N_JOBS=16
@@ -490,7 +470,7 @@ BINNING_DIRECTORY=veba_output/binning/prokaryotic
 CLUSTERS=veba_output/cluster/output/global/mags_to_slcs.tsv
 
 # Set up the command
-CMD="source activate VEBA-classify_env && classify-prokaryotic.py -i ${BINNING_DIRECTORY} -c ${CLUSTERS} -p ${N_JOBS} -o veba_output/classify/prokaryotic"
+CMD="source activate VEBA && veba --module classify-prokaryotic --params \"-i ${BINNING_DIRECTORY} -c ${CLUSTERS} -p ${N_JOBS} -o veba_output/classify/prokaryotic\""
 
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -502,11 +482,10 @@ The following output files will produced:
 * taxonomy.clusters.tsv - Prokaryotic cluster classification (If --clusters are provided)
 
 #### 12. Classify eukaryotic genomes
-*VEBA* is going to use the *MetaEuk/MMSEQS2* protein alignments based on [*VEBA's* microeukaryotic protein database](https://doi.org/10.6084/m9.figshare.19668855.v1).  The default is to use [BUSCO's eukaryota_odb10](https://busco-data.ezlab.org/v5/data/lineages/eukaryota_odb10.2020-09-10.tar.gz) marker set but you can use the annotations from all proteins if you want by providing the `--include_all_genes` flag. The former will take a little bit longer since it needs to run *hmmsearch* but it's more robust and doesn't take that much longer.
+*VEBA* is going to use the *MetaEuk/MMSEQS2* protein alignments based on [*VEBA's* MicroEuk100](https://zenodo.org/records/10139451).  The default is to use [BUSCO's eukaryota_odb10](https://busco-data.ezlab.org/v5/data/lineages/eukaryota_odb10.2020-09-10.tar.gz) marker set but you can use the annotations from all proteins if you want by providing the `--include_all_genes` flag but that's not recommended for classification.
 
 **Recommended memory request:** `12 GB`
 
-**Conda Environment:** `conda activate VEBA-classify_env`
 
 ```
 # This is threaded if you use the default (i.e., core marker detection)
@@ -522,7 +501,7 @@ BINNING_DIRECTORY=veba_output/binning/eukaryotic
 CLUSTERS=veba_output/cluster/output/global/mags_to_slcs.tsv
 
 # Set up the command
-CMD="source activate VEBA-classify_env && classify-eukaryotic.py -i ${BINNING_DIRECTORY} -c ${CLUSTERS} -o veba_output/classify/eukaryotic -p ${N_JOBS}"
+CMD="source activate VEBA && veba --module classify-eukaryotic --params \"-i ${BINNING_DIRECTORY} -c ${CLUSTERS} -o veba_output/classify/eukaryotic -p ${N_JOBS}\""
 
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -539,9 +518,6 @@ Instead of having 3 separate classification tables, it would be much more useful
 
 **Recommended memory request:** `1 GB`
 
-
-**Conda Environment:** `conda activate VEBA-classify_env`
-
 ```
 merge_taxonomy_classifications.py -i veba_output/classify -o veba_output/classify
 ```
@@ -554,8 +530,6 @@ The following output files will produced:
 #### 14. Annotate proteins
 Now that all of the MAGs are recovered and classified, let's annotate the proteins using best-hit against UniRef,MiBIG,VFDB,CAZy Pfam, AntiFam, AMRFinder, and KOFAM.  HMMSearch will fail with sequences ≥ 100k so we need to remove any that are that long (there probably aren't but just to be safe).
 
-**Conda Environment:** `conda activate VEBA-annotate_env`
-
 ```
 # Let's merge all of the proteins.  
 
@@ -582,7 +556,7 @@ PROTEINS=veba_output/misc/all_genomes.all_proteins.lt100k.faa
 IDENTIFIER_MAPPING=veba_output/cluster/output/global/identifier_mapping.proteins.tsv.gz
 
 # Command
-CMD="source activate VEBA-annotate_env && annotate.py -a ${PROTEINS} -i ${IDENTIFIER_MAPPING} -o veba_output/annotation -p ${N_JOBS} -u uniref50"
+CMD="source activate VEBA && veba --module annotate --params \"-a ${PROTEINS} -i ${IDENTIFIER_MAPPING} -o veba_output/annotation -p ${N_JOBS} -u uniref50\""
 
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -607,7 +581,7 @@ If you are restricted by resources or time you may want to do just annotate the
 PROTEINS=veba_output/cluster/output/global/representative_sequences.faa
 
 # Command
-CMD="source activate VEBA-annotate_env && annotate.py -a ${PROTEINS} -o veba_output/annotation -p ${N_JOBS} -u uniref50"
+CMD="source activate VEBA && veba --module annotate --params \"-a ${PROTEINS} -o veba_output/annotation -p ${N_JOBS} -u uniref50\""
 
 ```
 
@@ -641,7 +615,7 @@ for i in $(seq -f "%03g" 1 ${N_PARTITIONS}); do
 	N="annotate-${i}"
 	rm -f logs/${N}.*
 	FAA=${PARTITION_DIRECTORY}/stdin.part_${i}.fasta
-	CMD="source activate VEBA-annotate_env && annotate.py -a ${FAA} -o ${OUT_DIR}/${i} -p ${N_JOBS} -u uniref50"
+	CMD="source activate VEBA && veba --module annotate --params \"-a ${FAA} -o ${OUT_DIR}/${i} -p ${N_JOBS} -u uniref50\""
 
 	# Either run this command or use SunGridEnginge/SLURM
 	
diff --git a/walkthroughs/metabolic_profiling_de-novo_genomes.md b/walkthroughs/pathway_profiling_de-novo_genomes.md
similarity index 91%
rename from walkthroughs/metabolic_profiling_de-novo_genomes.md
rename to walkthroughs/pathway_profiling_de-novo_genomes.md
index fc10d50..52ce9c0 100644
--- a/walkthroughs/metabolic_profiling_de-novo_genomes.md
+++ b/walkthroughs/pathway_profiling_de-novo_genomes.md
@@ -1,4 +1,4 @@
-### Metabolic profiling of *de novo* genomes
+### Pathway profiling of *de novo* genomes
 If you build a comprehensive database, you may want to use a read-based approach to functionally profile a large set of samples.  This tutorial will show you how to build a custom HUMAnN database from your annotations and how to profile your samples where there is full accounting of reads and your genomes.
 
 What you'll end up with at the end of this is a merged taxonomy table, a custom HUMAnN annotation table, and HUMAnN profiles.
@@ -14,8 +14,8 @@ _____________________________________________________
 3. Functional profiling using `HUMAnN` of custom database
 4. Merge the tables
 
-**Conda Environment:** `conda activate VEBA-profile_env`
-
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
+_______________________________________________________
 
 #### 1.  Merge taxonomy from all domains
 
@@ -64,7 +64,7 @@ do
 	rm -f logs/${N}.*
 	R1=veba_output/preprocess/${ID}/output/cleaned_1.fastq.gz
 	R2=veba_output/preprocess/${ID}/output/cleaned_2.fastq.gz
-	CMD="source activate VEBA-profile_env && profile-pathway.py -1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -i ${UNIREF_ANNOTATIONS} -f ${FASTA}"
+	CMD="source activate VEBA && veba --module profile-pathway --params \"-1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -i ${UNIREF_ANNOTATIONS} -f ${FASTA}\""
 	
 	# Either run this command or use SunGridEnginge/SLURM
 
@@ -86,7 +86,7 @@ The following output files will produced for each sample:
 #### 4. Merge the tables
 
 ```
-merge_generalized_mapping.py -o veba_output/profiling/pathways/merged. humann_pathcoverage.tsv veba_output/profiling/pathways/*/output/humann_pathcoverage.tsv
+merge_generalized_mapping.py -o veba_output/profiling/pathways/merged.humann_pathcoverage.tsv veba_output/profiling/pathways/*/output/humann_pathcoverage.tsv
 
 merge_generalized_mapping.py -o veba_output/profiling/pathways/merged.humann_pathabundance.tsv veba_output/profiling/pathways/*/output/humann_pathabundance.tsv
  
diff --git a/walkthroughs/phylogenetic_inference.md b/walkthroughs/phylogenetic_inference.md
index 6e7f4d7..7fbbe5b 100644
--- a/walkthroughs/phylogenetic_inference.md
+++ b/walkthroughs/phylogenetic_inference.md
@@ -12,6 +12,8 @@ _____________________________________________________
 1. Download the proteomes of similar organisms
 2. Perform phylogenetic inference on proteomes
 
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
+
 _____________________________________________________
 
 
@@ -235,8 +237,6 @@ diatoms/SRR17458638__METABAT2__E.1__bin.3.faa
 
 Now that we have all of the files we need, we can perform phylogenetic inference using BUSCO's eukaryota_odb10 markers and score cutoffs.  For eukaryotes, it's advised that you use the eukaryota_odb10 marker set because this is the core marker set used for classification.  This isn't the case for prokaryotes and viruses.  If you don't have enough resources to run maximum likelihood trees via *IQTREE2* then use `--no_iqtree`.
 
-**Conda Environment:** `conda activate VEBA-phylogeny_env`
-
 
 ```
 # Set the number of threads
@@ -260,7 +260,7 @@ MINIMUM_GENOMES_ALIGNED_RATIO=0.95
 OUT_DIR=veba_output/phylogeny/diatoms
 
 # Directory
-CMD="source activate VEBA-phylogeny_env && phylogeny.py -a ${PROTEINS} -o ${OUT_DIR} -p ${N_JOBS}  -f name --no_iqtree -d ${HMM} -s ${SCORES} --minimum_genomes_aligned_ratio ${MINIMUM_GENOMES_ALIGNED_RATIO} 
+CMD="source activate VEBA && veba --module phylogeny --params \"-a ${PROTEINS} -o ${OUT_DIR} -p ${N_JOBS}  -f name --no_iqtree -d ${HMM} -s ${SCORES} --minimum_genomes_aligned_ratio ${MINIMUM_GENOMES_ALIGNED_RATIO}\""
 
 # Either run this command or use SunGridEnginge/SLURM
 ```
diff --git a/walkthroughs/read_mapping_and_counts_tables.md b/walkthroughs/read_mapping_and_counts_tables.md
index 29ab766..a17f404 100644
--- a/walkthroughs/read_mapping_and_counts_tables.md
+++ b/walkthroughs/read_mapping_and_counts_tables.md
@@ -15,6 +15,8 @@ _____________________________________________________
 2. Map reads to global reference and create base counts tables
 3. Merge the counts tables for all the samples
 
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
+
 _____________________________________________________
 
 
@@ -22,7 +24,6 @@ _____________________________________________________
 
 Here we are going to concatenate all of the binned contigs (i.e., MAGs) and their respective gene models (i.e., GFF files) then index using `Bowtie2`.
 
-**Conda Environment:** `conda activate VEBA-mapping_env`
 
 
 ```
@@ -44,7 +45,7 @@ ls veba_output/binning/*/*/output/genomes/*.gff > veba_output/misc/gene_models.l
 GENE_MODELS=veba_output/misc/gene_models.list
 
 # Set up command
-CMD="source activate VEBA-mapping_env && index.py -r ${GENOMES} -g ${GENE_MODELS} -o veba_output/index/global/ -p ${N_JOBS}"
+CMD="source activate VEBA && veba --module index --params \"-r ${GENOMES} -g ${GENE_MODELS} -o veba_output/index/global/ -p ${N_JOBS}\""
 
 # Either run this command or use SunGridEnginge/SLURM
 ```
@@ -63,9 +64,18 @@ Here we are map all of the reads to the global reference and create base counts
 
 **Note:** Versions prior to v1.1.2 require the output directory to include the sample name. (e.g., `-o veba_output/mapping/global/${ID}` where `-n` is not used.  In v1.1.2+, the output directory is automatic (e.g., `veba_output/mapping/global/` and `-n ${ID}` are used)
 
-**Conda Environment:** `conda activate VEBA-mapping_env`
 
 ```
+# If you have run the cluster.py module you can use this:
+SCAFFOLDS_TO_MAGS=veba_output/cluster/output/global/scaffolds_to_mags.tsv
+SCAFFOLDS_TO_SLCS=veba_output/cluster/output/global/scaffolds_to_slcs.tsv
+PROTEINS_TO_ORTHOGROUPS=veba_output/cluster/output/global/proteins_to_orthogroups.tsv
+MAGS_TO_SLCS=veba_output/cluster/output/global/mags_to_slcs.tsv
+
+# If you skipped the clustering, you can oncatenate all of the scaffolds to bins from all of the domains
+cat veba_output/binning/*/*/output/scaffolds_to_bins.tsv > veba_output/misc/all_genomes.scaffolds_to_mags.tsv
+SCAFFOLDS_TO_MAGS=veba_output/misc/all_genomes.scaffolds_to_mags.tsv
+
 # Set a lower number of threads since we are running for each sample
 N_JOBS=2
 
@@ -84,7 +94,7 @@ for ID in $(cat identifiers.list); do
 	OUT_DIR=veba_output/mapping/global
 	
 	# Set up command	
-	CMD="source activate VEBA-mapping_env && mapping.py -1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -x ${INDEX_DIRECTORY}"
+	CMD="source activate VEBA && veba --module mapping --params \"-1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -x ${INDEX_DIRECTORY} --scaffolds_to_bins ${SCAFFOLDS_TO_MAGS}\"" #--scaffolds_to_clusters ${SCAFFOLDS_TO_SLCS} --proteins_to_orthogroups ${PROTEINS_TO_ORTHOGROUPS}
 	
 	# Either run this command or use SunGridEnginge/SLURM
 
@@ -111,16 +121,6 @@ MAPPING_DIRECTORY=veba_output/mapping/global
 # Set output directory (this is default)
 OUT_DIR=veba_output/counts
 
-# If you have run the cluster.py module you can use this:
-SCAFFOLDS_TO_MAGS=veba_output/cluster/output/global/scaffolds_to_mags.tsv
-SCAFFOLDS_TO_SLCS=veba_output/cluster/output/global/scaffolds_to_slcs.tsv
-#MAGS_TO_SLCS=veba_output/cluster/output/global/mags_to_slcs.tsv
-PROTEINS_TO_ORTHOGROUPS=veba_output/cluster/output/global/proteins_to_orthogroups.tsv
-
-# If you skipped the clustering, you can oncatenate all of the scaffolds to bins from all of the domains
-cat veba_output/binning/*/*/output/scaffolds_to_bins.tsv > veba_output/misc/all_genomes.scaffolds_to_mags.tsv
-SCAFFOLDS_TO_MAGS=veba_output/misc/all_genomes.scaffolds_to_mags.tsv
-
 # Merge contig-level counts (excu
 merge_contig_mapping.py -m ${MAPPING_DIRECTORY} -c ${MAGS_TO_SLCS}  -i ${SCAFFOLDS_TO_MAGS} -o ${OUT_DIR}
 
diff --git a/walkthroughs/recovering_viruses_from_metatranscriptomics.md b/walkthroughs/recovering_viruses_from_metatranscriptomics.md
index 9890813..81a2035 100644
--- a/walkthroughs/recovering_viruses_from_metatranscriptomics.md
+++ b/walkthroughs/recovering_viruses_from_metatranscriptomics.md
@@ -15,9 +15,9 @@ _____________________________________________________
 4. Cluster genomes and proteins
 5. Classify viral genomes
 
-#### 1. Preprocess reads and get directory set up
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
 
-**Conda Environment:** `conda activate VEBA-preprocess_env`
+#### 1. Preprocess reads and get directory set up
 
 Refer to the [downloading and preprocessing reads workflow](download_and_preprocess_reads.md).  At this point, it's assumed you have the following: 
 
@@ -29,8 +29,6 @@ Refer to the [downloading and preprocessing reads workflow](download_and_preproc
 
 Here we are going to assemble all of the reads using `rnaSPAdes`.  
 
-**Conda Environment:** `conda activate VEBA-assembly_env`
-
 ```
 # Set the number of threads to use for each sample. Let's use 4
 N_JOBS=4
@@ -53,7 +51,7 @@ for ID in $(cat identifiers.list); do
 	R2=veba_output/preprocess/${ID}/output/cleaned_2.fastq.gz
 	
 	# Set up command
-	CMD="source activate VEBA-assembly_env && assembly.py -1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -P rnaspades.py"
+	CMD="source activate VEBA && veba --module assembly --params \"-1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -P rnaspades.py\""
 	
 	# Either run this command or use SunGridEnginge/SLURM
 	
@@ -83,8 +81,6 @@ Where `g0` refers to the predicted gene and `i0` refers to the isoform transcrip
 #### 3. Recover viruses from metatranscriptomic assemblies
 We use a similar approach to the metagenomics with *geNomad* and *CheckV* but using the assembled transcripts instead.  Again, the criteria for high-quality viral genomes are described by the [*CheckV* author](https://scholar.google.com/citations?user=gmKnjNQAAAAJ&hl=en) [here in this Bitbucket Issue (#38)](https://bitbucket.org/berkeleylab/checkv/issues/38/recommended-cutoffs-for-analyzing-checkv).
 
-**Conda Environment:** `conda activate VEBA-binning-viral_env`
-
 ```
 N_JOBS=4
 
@@ -93,7 +89,7 @@ for ID in $(cat identifiers.list);
 	rm -f logs/${N}.*
 	FASTA=veba_output/transcript_assembly/${ID}/output/transcripts.fasta
 	BAM=veba_output/transcript_assembly/${ID}/output/mapped.sorted.bam
-	CMD="source activate VEBA-binning-viral_env && binning-viral.py -f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -m 1500 -o veba_output/binning/viral -a genomad"
+	CMD="source activate VEBA && veba --module binning-viral --params \"-f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -m 1500 -o veba_output/binning/viral -a genomad\""
 	
 	# Either run this command or use SunGridEnginge/SLURM
 
@@ -123,8 +119,6 @@ for ID in $(cat identifiers.list);
 #### 4. Cluster genomes and proteins
 To analyze these data, we are going to generate some counts tables and we want a single set of features to compare across all samples.  To achieve this, we are going to cluster the genomes into species-level clusters (SLC) and the proteins into SLC-specific protein clusters (SSPC).  Further, this clustering is dual purpose as it alleviates some of the bias from [the curse(s) of dimensionality](https://www.nature.com/articles/s41592-018-0019-x) with dimensionality reduction via feature compression - [a type of feature engineering](https://towardsdatascience.com/what-is-feature-engineering-importance-tools-and-techniques-for-machine-learning-2080b0269f10).
 
-**Conda Environment:** `conda activate VEBA-cluster_env`
-
 
 ```
 # We need to generate a table with the following fields:
@@ -137,7 +131,7 @@ compile_genomes_table.py -i veba_output/binning/ > veba_output/misc/genomes_tabl
 N_JOBS=12
 
 # Set up command
-CMD="source activate VEBA-cluster_env && cluster.py -i veba_output/misc/genomes_table.tsv -o veba_output/cluster -p ${N_JOBS}"
+CMD="source activate VEBA && veba --module cluster --params \"-i veba_output/misc/genomes_table.tsv -o veba_output/cluster -p ${N_JOBS}\""
 	
 # Either run this command or use SunGridEnginge/SLURM
 ```
@@ -173,9 +167,6 @@ CMD="source activate VEBA-cluster_env && cluster.py -i veba_output/misc/genomes_
 #### 5. Classify viral genomes
 Viral classification is performed using `geNomad`.  Classification can be performed using the intermediate binning results which is much quicker.  Alternatively, if you have viruses identified elsewhere you can still classify using the `--genomes` argument instead.
 
-**Conda Environment:** `conda activate VEBA-classify_env`
-
-
 ```
 N=classify-viral
 
@@ -188,7 +179,7 @@ CLUSTERS=veba_output/cluster/viral/output/clusters.tsv
 rm -rf logs/${N}.*
 
 # Set up the command
-CMD="source activate VEBA-classify_env && classify-viral.py -i ${BINNING_DIRECTORY} -c ${CLUSTERS} -o veba_output/classify/viral"
+CMD="source activate VEBA && veba --module classify-viral --params \"-i ${BINNING_DIRECTORY} -c ${CLUSTERS} -o veba_output/classify/viral\""
 
 # Either run this command or use SunGridEnginge/SLURM
 
diff --git a/walkthroughs/setting_up_coassemblies.md b/walkthroughs/setting_up_coassemblies.md
index ebf7a8b..b922e9c 100644
--- a/walkthroughs/setting_up_coassemblies.md
+++ b/walkthroughs/setting_up_coassemblies.md
@@ -14,6 +14,9 @@ _____________________________________________________
 3. Coassembly using assembly.py
 4. Align reads from each sample to the coassembly to create sorted BAM files that will be used for binning and counts tables.
 
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
+______________________________________________________
+
 #### 1. Concatenate forward and reverse reads separately
 
 Refer to the [downloading and preprocessing reads workflow](download_and_preprocess_reads.md).  At this point, it's assumed you have the following: 
@@ -43,8 +46,6 @@ cat veba_output/preprocess/*/output/cleaned_2.fastq.gz > veba_output/misc/concat
 
 Here we are going to coassemble all of the reads using `metaSPAdes` which is default but if you are using metatranscriptomics then use `-P rnaSPAdes.py`.  
 
-**Conda Environment:** `conda activate VEBA-assembly_env`
-
 ```
 # Set the number of threads to use for each sample. Let's use 4
 N_JOBS=4
@@ -67,10 +68,10 @@ R1=veba_output/misc/concatenated_1.fastq.gz
 R2=veba_output/misc/concatenated_2.fastq.gz
 	
 # Set up command
-CMD="source activate VEBA-assembly_env && assembly.py -1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS}"
+CMD="source activate VEBA && veba --module assembly --params \"-1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS}\""
 
 # Use this for metatranscriptomics
-# CMD="source activate VEBA-assembly_env && assembly.py -1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -P rnaspades.py"
+# CMD="source activate VEBA && veba --module assembly --params \"-1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -P rnaspades.py\""
 	
 # Either run this command or use SunGridEnginge/SLURM
 	
@@ -91,10 +92,6 @@ The main one we need is `scaffolds.fasta`  which we will use for binning.  Note
 
 #### 3. Align sample-specific reads to the coassembly
 
-
-
-**Conda Environment:** `conda activate VEBA-assembly_env`
-
 ```
 N_JOBS=4
 
@@ -103,7 +100,7 @@ N="coverage__${ID}";
 rm -f logs/${N}.*
 FASTA=veba_output/assembly/${ID}/output/scaffolds.fasta
 READS=veba_output/misc/reads_table.tsv
-CMD="source activate VEBA-assembly_env && coverage.py -f ${FASTA} -r ${READS} -p ${N_JOBS} -m 1500 -o veba_output/coverage/${ID}"
+CMD="source activate VEBA && veba --module coverage --params \"-f ${FASTA} -r ${READS} -p ${N_JOBS} -m 1500 -o veba_output/coverage/${ID}\""
 	
 # Either run this command or use SunGridEnginge/SLURM
 
@@ -125,7 +122,7 @@ _____________________________________________________
 
 Now that you have a coassembly and multiple sorted BAM files, it's time for binning.  Start at step 3 of the [end-to-end metagenomics](end-to-end_metagenomics.md) or [recovering viruses from metatranscriptomics](recovering_viruses_from_metatranscriptomics.md) workflows depending on whether or not you have metagenomics or metatranscriptomics, respectively.  
 
-**Please do not forget to adapt the BAM argument in the `binning-prokaryotic.py` command to include all the sample-specific sorted BAM files and not the concatenated sorted BAM.**  
+**Please do not forget to adapt the BAM argument in the `binning-prokaryotic` command to include all the sample-specific sorted BAM files and not the concatenated sorted BAM.**  
 
 More specifically, use `BAM="veba_output/coverage/coassembly/output/*/mapped.sorted.bam"` and not `BAM="veba_output/assembly/coassembly/output/mapped.sorted.bam"`.
 
diff --git a/walkthroughs/taxonomic_profiling_de-novo_genomes.md b/walkthroughs/taxonomic_profiling_de-novo_genomes.md
new file mode 100644
index 0000000..a2ae3d3
--- /dev/null
+++ b/walkthroughs/taxonomic_profiling_de-novo_genomes.md
@@ -0,0 +1,90 @@
+### Taxonomic profiling of *de novo* genomes
+If you build a comprehensive database, you may want to use a read-based approach to taxonomicly profile a large set of samples.  This tutorial will show you how to build a custom `Sylph` database from your genomes and how to profile your samples for taxonomic abundance.
+
+What you'll end up with at the end of this is a `Sylph` database and taxonomic abundance profiles.
+
+Please refer to the [end-to-end metagenomics](end-to-end_metagenomics.md) or [recovering viruses from metatranscriptomics](recovering_viruses_from_metatranscriptomics.md) workflows for details on binning, clustering, and annotation.
+
+_____________________________________________________
+
+#### Steps:
+1. Compile custom `Sylph` database from *de novo* genomes
+2. Taxonomic profiling using `Sylph ` of custom database
+3. Merge the tables
+
+**Conda Environment:** `conda activate VEBA`. Use this for intermediate scripts.
+_______________________________________________________
+
+#### 1. Compile custom `Sylph` database from *de novo* genomes
+
+At this point, it's assumed you have the following: 
+
+* Clustering results from the `cluster.py` module
+* A directory of preprocessed reads: `veba_output/preprocess/${ID}/output/cleaned_1.fastq.gz` and `veba_output/preprocess/${ID}/output/cleaned_2.fastq.gz` where `${ID}` represents the identifiers in `identifiers.list`.
+* Genome assemblies.  These can either be MAGs binned with VEBA, binned elsewhere, or even reference genomes you downloaded. 
+
+
+Here we are going to build 2 databases, one for viral genomes and one for non-viral genomes (i.e., prokaryotes and eukaryotes).  The reason for 2 separate databases is because there are presets used for small genomes that are different than medium to large genomes.  We need a table that has `[organism_type]<tab>[path/to/genome.fa]` with no headers.  We already have some version of this with the `veba_output/misc/genomes_table.tsv` we made for clustering.  We can pipe this into stdin for the database build script:
+
+
+```
+cat veba_output/misc/genomes_table.tsv | cut -f1,4 | compile_custom_sylph_sketch_database_from_genomes.py -o veba_output/profiling/databases
+```
+
+This generates 2 `Sylph` databases (assuming you have viruses and non-viruses):
+
+* `veba_output/profiling/databases/genome_database-nonviral.syldb`
+* `veba_output/profiling/databases/genome_database-viral.syldb`
+
+
+#### 2. Taxonomic profiling using `Sylph` of custom database
+
+Now it's time to profile the reads against the `Sylph` databases.  Since `Sylph` takes in a sketch of reads, we can either use a precompute reads sketch with `-s` or with paired-end reads (`-1` and `-2`) to compute the sketch in the backend.  
+
+```
+N_JOBS=4
+OUT_DIR=veba_output/profiling/taxonomy
+DATABASES=veba_output/profiling/databases/*.syldb
+MAGS_TO_SLCS=veba_output/cluster/output/global/mags_to_slcs.tsv # Assuming you have clustering results
+
+mkdir -p logs
+
+for ID in $(cat identifiers.list);
+do 
+	N="profile-taxonomy__${ID}";
+	rm -f logs/${N}.*
+	R1=veba_output/preprocess/${ID}/output/cleaned_1.fastq.gz
+	R2=veba_output/preprocess/${ID}/output/cleaned_2.fastq.gz
+	CMD="source activate VEBA && veba --module profile-taxonomy --params \"-1 ${R1} -2 ${R2} -n ${ID} -o ${OUT_DIR} -p ${N_JOBS} -d ${DATABASES} -c ${MAGS_TO_SLCS}\""
+	
+	# Either run this command or use SunGridEnginge/SLURM
+
+done
+
+```
+
+The following output files will produced for each sample: 
+
+* reads.sylsp - Reads sketch if paired-end reads were provided
+* sylph\_profile.tsv.gz - Output of `sylph profile`
+* taxonomic_abundance.tsv.gz - Genome-level taxonomic abundance (No header)
+* taxonomic_abundance.clusters.tsv.gz - SLC-level taxonomic abundance (No header)
+
+#### 3. Merge the tables
+
+```
+merge_generalized_mapping.py -o veba_output/profiling/taxonomy/merged.taxonomic_abundance.tsv.gz veba_output/profiling/taxonomy/*/output/taxonomic_abundance.tsv.gz
+
+merge_generalized_mapping.py -o veba_output/profiling/taxonomy/merged.taxonomic_abundance.clusters.tsv.gz veba_output/profiling/taxonomy/*/output/taxonomic_abundance.clusters.tsv.gz
+```
+
+The following output files will produced for each sample: 
+
+* merged.taxonomic\_abundance.tsv.gz - Merged genome-level taxonomic abundance matrix
+* merged.taxonomic\_abundance.clusters.tsv.gz - Merged SLC-level taxonomic abundance matrix
+
+_____________________________________________________
+
+#### Next steps:
+
+Subset stratified tables by their respective levels.