Update README.md

jim-schwoebel · Nov 11, 2020 · 47c708d · 47c708d
1 parent ed696a9
commit 47c708d
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -42,7 +42,7 @@ There are two main types of audio datasets: speech datasets and audio event/musi
 * [Spoken Wikipeida Corpora](https://nats.gitlab.io/swc/) - 38 GB in size available in both audio and without audio format.
 * [Tatoeba](https://tatoeba.org/eng/downloads) - Tatoeba is a large database of sentences, translations, and spoken audio for use in language learning. This download contains spoken English recorded by their community.
 * [Ted-LIUM](https://www.openslr.org/51/) - The TED-LIUM corpus was made from audio talks and their transcriptions available on the TED website (noncommercial).
-* [Thorston dataset](https://github.com/thorstenMueller/deep-learning-german-tts/) - German language dataset, 22,668 recorded phrases, 23 hours of audio, phrase length 52 characters on average.
+* [Thorsten dataset](https://github.com/thorstenMueller/deep-learning-german-tts/) - German language dataset, 22,668 recorded phrases, 23 hours of audio, phrase length 52 characters on average.
 * [TIMIT dataset](https://catalog.ldc.upenn.edu/LDC93S1) - TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. It includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16 kHz speech waveform file for each utterance (have to pay).
 * [VCTK dataset](https://datashare.is.ed.ac.uk/handle/10283/3443) - 110 English speakers with various accents; each speaker reads out about 400 sentences. Samples are mostly 2–6 s long, at 48 kHz 16 bits, for a total dataset size of ~10 GiB.
 * [VCTK-2Mix](https://github.com/JorisCos/VCTK-2Mix) - VCTK-2Mix is an open source dataset for source separation in noisy environments. It is derived from VCTK signals and WHAM noise. It is meant as a test set. It will also enable cross-dataset experiments.