Skip to content

Commit

Permalink
Fixed Peter Norvig's last name. :-)
Browse files Browse the repository at this point in the history
  • Loading branch information
dmuth authored Jul 11, 2016
1 parent c4017a3 commit a5c43a5
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ According to the [Google Machine Translation Team](http://googleresearch.blogspo
>
>We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That's why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.
This repo is derived from [Peter Novig's](http://norvig.com/ngrams/) compilation of the [1/3 million most frequent English words](http://norvig.com/ngrams/count_1w.txt). I limited this file to the 10,000 most common words, then removed the appended frequency counts by running this sed command in my text editor:
This repo is derived from [Peter Norvig's](http://norvig.com/ngrams/) compilation of the [1/3 million most frequent English words](http://norvig.com/ngrams/count_1w.txt). I limited this file to the 10,000 most common words, then removed the appended frequency counts by running this sed command in my text editor:

sed 's/[0-9]*//g'

Expand All @@ -28,4 +28,4 @@ To use this list as a training corpus in [Amphetype](http://code.google.com/p/am

In the "Sources" tab, you should see **google-10000-english** available for training. Set WPM at 10 more than your current average, set accuracy to 98%, and you're set to train.

Enjoy!
Enjoy!

0 comments on commit a5c43a5

Please sign in to comment.