-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #220 from nextstrain/yellow-fever-dataset
Add yellow fever virus dataset
- Loading branch information
Showing
18 changed files
with
27,854 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
## Unreleased | ||
|
||
Initial release of yellow fever virus (prM-E region only) dataset. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Yellow fever virus (prM-E region only) dataset | ||
|
||
| Key | Value | | ||
| ----------------- | -----------------------------------------------------------------| | ||
| name | Yellow fever virus (YFV) prM-E region | | ||
| authors | [Nextstrain](https://nextstrain.org) | | ||
| reference | AY640589.1 | | ||
| workflow | <https://github.com/nextstrain/yellow-fever/tree/main/nextclade> | | ||
| path | `nextstrain/yellow-fever/prM-E` | | ||
|
||
## Scope of this dataset | ||
|
||
This dataset assigns clades to yellow fever virus samples based on | ||
strain and genotype information from [Mutebi et al.][] (J Virol. 2001 | ||
Aug;75(15):6999-7008) and [Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75) | ||
|
||
These two papers, collectively, define 7 distinct yellow fever virus | ||
genotypes based on a 670 nucleotide region of the yellow fever virus | ||
genome, (bases 641-1310), called the prM-E region. This region | ||
comprises the 3' end of the pre-membrane protein (prM) gene, the | ||
entire membrane protein (M) gene, and the 5' end of the envelope | ||
protein (E) gene. | ||
|
||
The clades we annotate (Clade I-VII) are roughly equivalent with the | ||
following genotypes as described in the aforementioned two papers: | ||
|
||
| Clade | Genotype | | ||
|-----------|---------------------| | ||
| Clade I | Angola | | ||
| Clade II | East Africa | | ||
| Clade III | East Central/Africa | | ||
| Clade IV | West Africa I | | ||
| Clade V | West Africa II | | ||
| Clade VI | South America I | | ||
| Clade VII | South America II | | ||
|
||
(N.b., the reference sequence used in this data set is actually 672nt | ||
long, from bases 641-1312 of the genome reference. The 2 extra bases | ||
make the reference a complete open reading frame.) | ||
|
||
This dataset can be used to assign genotypes to any sequence that | ||
includes at least 500 bp of the prM-E region, including whole genome | ||
sequences. Sequence data beyond the prM-E region will be reported as an | ||
insertion in the Nextclade output. | ||
|
||
## Features | ||
|
||
This dataset supports: | ||
|
||
- Assignment of genotypes | ||
- Phylogenetic placement | ||
- Sequence quality control (QC) | ||
|
||
## What are Nextclade datasets | ||
|
||
Read more about Nextclade datasets in the Nextclade documentation: | ||
<https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html> | ||
|
||
[Mutebi et al.]: https://pubmed.ncbi.nlm.nih.gov/11435580/ | ||
[Bryant et al.]: https://pubmed.ncbi.nlm.nih.gov/17511518/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
##sequence-region prM-E 1 672 | ||
NC_002031.1 feature source 1 672 . + . gene=nuc | ||
NC_002031.1 feature gene 1 333 . + . gene_name=prM | ||
NC_002031.1 feature gene 109 333 . + . gene_name=M | ||
NC_002031.1 feature gene 334 672 . + . gene_name=E |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
{ | ||
"files": { | ||
"reference": "reference.fasta", | ||
"pathogenJson": "pathogen.json", | ||
"genomeAnnotation": "genome_annotation.gff3", | ||
"treeJson": "tree.json", | ||
"examples": "sequences.fasta", | ||
"readme": "README.md", | ||
"changelog": "CHANGELOG.md" | ||
}, | ||
"attributes": { | ||
"name": "Yellow fever virus (YFV) prM-E region", | ||
"reference name": "Asibi", | ||
"reference accession": "AY640589.1" | ||
}, | ||
"schemaVersion": "3.0.0", | ||
"alignmentParams": { | ||
"minSeedCover": 0.01 | ||
}, | ||
"qc": { | ||
"missingData": { | ||
"enabled": true, | ||
"missingDataThreshold": 20, | ||
"scoreBias": 4 | ||
}, | ||
"mixedSites": { | ||
"enabled": true, | ||
"mixedSitesThreshold": 4 | ||
}, | ||
"frameShifts": { | ||
"enabled": true | ||
}, | ||
"stopCodons": { | ||
"enabled": true | ||
}, | ||
"privateMutations": { | ||
"enabled": true, | ||
"cutoff": 12, | ||
"typical": 4, | ||
"weightLabeledSubstitutions": 1, | ||
"weightReversionSubstitutions": 1, | ||
"weightUnlabeledSubstitutions": 1 | ||
}, | ||
"snpClusters": { | ||
"enabled": true, | ||
"clusterCutOff": 3, | ||
"scoreWeight": 50, | ||
"windowSize": 50 | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
> prM-E region (genome 641-1312, 672 nt) | ||
CCAAGAGAGGAGCCAGATGACATTGATTGCTGGTGCTATGGGGTGGAAAACGTTAGAGTC | ||
GCATATGGTAAGTGTGACTCAGCAGGCAGGTCTAGGAGGTCAAGAAGGGCCATTGACTTG | ||
CCTACGCATGAAAACCATGGTTTGAAGACCCGGCAAGAAAAATGGATGACTGGAAGAATG | ||
GGTGAAAGGCAACTCCAAAAGATTGAGAGATGGCTCGTGAGGAACCCCTTTTTTGCAGTG | ||
ACAGCTCTGACCATTGCCTACCTTGTGGGAAGCAACATGACGCAACGAGTCGTGATTGCC | ||
CTACTGGTCTTGGCTGTTGGTCCGGCCTACTCAGCTCACTGCATTGGAATTACTGACAGG | ||
GATTTCATTGAGGGGGTGCATGGAGGAACTTGGGTTTCAGCTACCCTGGAGCAAGACAAG | ||
TGTGTCACTGTTATGGCCCCTGACAAGCCTTCATTGGACATCTCACTAGAGACAGTAGCC | ||
ATTGATGGACCTGCTGAGGCGAGGAAAGTGTGTTACAATGCAGTTCTCACTCATGTGAAG | ||
ATTAATGACAAGTGCCCCAGCACTGGAGAGGCCCACCTAGCTGAAGAGAACGAAGGGGAC | ||
AATGCGTGCAAGCGCACTTATTCTGATAGAGGCTGGGGCAATGGCTGTGGCCTATTTGGG | ||
AAAGGGAGCATT |
Oops, something went wrong.