-
Notifications
You must be signed in to change notification settings - Fork 161
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* fix python 2 division * README docs * add automated tests
- Loading branch information
Showing
8 changed files
with
187 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
language: python | ||
python: | ||
- "2.7" | ||
- "3.4" | ||
- "3.5" | ||
- "3.6" | ||
|
||
install: | ||
- pip install -r requirements/requirements-dev.txt | ||
|
||
script: | ||
- coverage run --source=spellchecker setup.py test | ||
|
||
# commands to run after the tests successfully complete | ||
after_success: | ||
- coveralls |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# pyspellchecker | ||
|
||
## Version 0.1.0 | ||
* Move word frequency to its own class | ||
* Add basic tests | ||
* Readme documentation | ||
|
||
## Version 0.0.1 | ||
* Initial release using code from Peter Norvig | ||
* Initial release to pypi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,92 @@ | ||
# pyspellchecker | ||
Pure Python Spell Checking based on https://norvig.com/spell-correct.html | ||
|
||
Pure Python Spell Checking based on | ||
[Peter Norvig's](https://norvig.com/spell-correct.html) blog post on setting up | ||
a simple spell checking algorithm. | ||
|
||
It uses a [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) | ||
algorithm to find permutations within an edit distance of 2 from the | ||
original word. It then compares all permutations (insertions, deletions, | ||
replacements, and transpositions) to known words in a word frequency list. | ||
Those words that are found more often in the frequency list are `more likely` | ||
the correct results. | ||
|
||
|
||
## Installation | ||
|
||
The easiest method to install is using pip: | ||
|
||
``` bash | ||
pip install pyspellchecker | ||
``` | ||
|
||
To install from source: | ||
``` bash | ||
git clone https://github.com/barrust/pyspellchecker.git | ||
cd pyspellchecker | ||
python setup.py install | ||
``` | ||
|
||
As always, I highly recommend using the [Pipenv](https://github.com/pypa/pipenv) | ||
package to help manage dependencies! | ||
|
||
## Quickstart | ||
|
||
After installation, using pyspellchecker should be fairly straight forward: | ||
|
||
``` python | ||
from spellchecker import SpellChecker | ||
|
||
|
||
spell = SpellChecker() | ||
|
||
# find those words that may be misspelled | ||
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here']) | ||
|
||
for word in misspelled: | ||
# Get the one `most likely` answer | ||
print(spell.correction(word)) | ||
|
||
# Get a list of `likely` options | ||
print(spell.candidates(word)) | ||
``` | ||
|
||
If the Word Frequency list is not to your liking, you can add additional text | ||
to generate a more appropriate list for your use case. | ||
|
||
``` python | ||
from spellChecker import SpellChecker | ||
|
||
spell = SpellChecker() # loads default word frequency list | ||
spell.word_frequency.load_text_file('./my_free_text_doc.txt') | ||
|
||
# if I just want to make sure some words are not flagged as misspelled | ||
spell.word_frequency.load_words(['microsoft', 'apple', 'google']) | ||
spell.known(['microsoft', 'google']) # will return both now! | ||
``` | ||
|
||
More work in storing and loading word frequency lists is planned; stay tuned. | ||
|
||
## Additional Methods | ||
On-line documentation is in the future; until then you can find SpellChecker | ||
here: | ||
|
||
`correction(word)`: Returns the most probable result for the misspelled word | ||
|
||
`candidates(word)`: Returns a set of possible candidates for the misspelled | ||
word | ||
|
||
`known([words])`: Returns those words that are in the word frequency list | ||
|
||
`unknown([words])`: Returns those words that are not in the frequency list | ||
|
||
`word_probability(word)`: The frequency of the given word out of all words in | ||
the frequency list | ||
|
||
#### The following are less likely to be needed by the user but are available: | ||
|
||
`edit_distance_1(word)`: Returns a set of all strings at a Levenshtein Distance | ||
of one | ||
|
||
`edit_distance_2(word)`: Returns a set of all strings at a Levenshtein Distance | ||
of two |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# needed for testing purposes | ||
pycodestyle | ||
isort | ||
astroid | ||
pylint | ||
coveralls |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
''' SpellChecker Module ''' | ||
from . spellchecker import SpellChecker | ||
from . spellchecker import SpellChecker, WordFrequency | ||
from . info import (__author__, __maintainer__, __email__, __license__, | ||
__version__, __credits__, __url__, __bugtrack_url__) | ||
|
||
|
||
__all__ = ['SpellChecker'] | ||
__all__ = ['SpellChecker', 'WordFrequency'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
__maintainer__ = 'Tyler Barrus' | ||
__email__ = '[email protected]' | ||
__license__ = 'MIT' | ||
__version__ = '0.0.1' | ||
__version__ = '0.1.0' | ||
__credits__ = ['Peter Norvig'] | ||
__url__ = 'https://github.com/barrust/pyspellchecker' | ||
__bugtrack_url__ = '{0}/issues'.format(__url__) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# -*- coding: utf-8 -*- | ||
''' | ||
Unittest class | ||
''' | ||
import unittest | ||
|
||
from spellchecker import SpellChecker | ||
|
||
|
||
class TestSpellChecker(unittest.TestCase): | ||
def test_correction(self): | ||
''' test spell checker corrections ''' | ||
spell = SpellChecker() | ||
self.assertEqual(spell.correction('ths'), 'the') | ||
self.assertEqual(spell.correction('ergo'), 'ergot') | ||
self.assertEqual(spell.correction('this'), 'this') | ||
|
||
def test_candidates(self): | ||
''' test spell checker candidates ''' | ||
spell = SpellChecker() | ||
self.assertEqual(spell.candidates('ths'), {'tis', 'tss', 'th', 'thus', 'the', 'this', 'thy'}) | ||
self.assertEqual(spell.candidates('the'), {'the'}) | ||
|
||
def test_words(self): | ||
spell = SpellChecker() | ||
self.assertEqual(spell.words('This is a test of this'), ['this', 'is', 'a', 'test', 'of', 'this']) | ||
|
||
def test_word_frequency(self): | ||
spell = SpellChecker() | ||
# if the default load changes so will this... | ||
self.assertEqual(spell.word_frequency.dictionary['the'], 79809) | ||
|
||
def test_word_probability(self): | ||
spell = SpellChecker() | ||
# if the default load changes so will this... | ||
self.assertEqual(spell.word_probability('the'), 0.07154004401278254) | ||
|
||
def test_word_known(self): | ||
''' test if the word is a `known` word or not ''' | ||
spell = SpellChecker() | ||
self.assertEqual(spell.known(['this']), {'this'}) | ||
self.assertEqual(spell.known(['sherlock']), {'sherlock'}) | ||
self.assertEqual(spell.known(['holmes']), {'holmes'}) | ||
self.assertEqual(spell.known(['known']), {'known'}) | ||
|
||
self.assertEqual(spell.known(['foobar']), set()) | ||
self.assertEqual(spell.known(['ths']), set()) | ||
self.assertEqual(spell.known(['ergo']), set()) | ||
|
||
def test_unknown_words(self): | ||
spell = SpellChecker() | ||
self.assertEqual(spell.unknown(['this']), set()) | ||
self.assertEqual(spell.unknown(['sherlock']), set()) | ||
self.assertEqual(spell.unknown(['holmes']), set()) | ||
self.assertEqual(spell.unknown(['known']), set()) | ||
|
||
self.assertEqual(spell.unknown(['foobar']), {'foobar'}) | ||
self.assertEqual(spell.unknown(['ths']), {'ths'}) | ||
self.assertEqual(spell.unknown(['ergo']), {'ergo'}) |