a-vs-an

Find the english language indeterminate article ("a" or "an") for a word. Based on real usage patterns extracted from the wikipedia text dump; can therefore even deal with tricky edge cases such as acronyms (FIAT vs. FAA, NASA vs. NSA) and odd symbols.

The implementations (C# and Javascript) in this project determine whether "a" or "an" should precede a word. They are efficient and accurate (using the method described in this stackoverflow response).

You can try the javascript implementation of this library online: A-vs-An.

The dataset used is based on the wikipedia-article-text dump of july 2014. Some additional preprocessing was done to remove as much wiki-markup as possible and extract only things vaguely resembling sentences using regular expressions. If the word following 'a' or 'an' started with a quote or parenthesis, the initial quote or parenthesis was ignored. The resulting prefix-list with the code to query it is less than 10KB in size; excluding the actual counts would reduce the size still further.

The implementations are efficient: on a single thread of a 3.6GHz i7-4770k a benchmark classifying all words of an english dictionary achieves about 37 million words a second; that's just 100 clock cycles per word. The javascript implementations were benchmarked on chrome 35, firefox 32.0a1 (2014-05-22), IE 11, and opera (12 and 21), and are all about 7-10 times slower, at approximately 4-5 million classifications per second.

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
A-vs-An		A-vs-An
.hgignore		.hgignore
.hgtags		.hgtags
CheckJsSize.linq		CheckJsSize.linq
CompareNewOutputToOld.linq		CompareNewOutputToOld.linq
LICENSE		LICENSE
README.md		README.md
Stats.linq		Stats.linq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

a-vs-an

About

Releases

Packages

Languages

License

mogsdad/a-vs-an

Folders and files

Latest commit

History

Repository files navigation

a-vs-an

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages