- Please consider giving a ⭐ to the repository if you find this useful.
The AI Autocorrect System is a Python-based application that offers real-time autocorrection suggestions to users while they type. It is designed to help users avoid spelling errors and improve the overall accuracy of their text input. The system uses statistical methods to analyze the user's input and suggest the most likely correct word based on a provided corpus.
-
Autocorrection Suggestions: As the user types, the application continuously monitors the input and suggests corrections for misspelled words. The autocorrection logic is based on edit distance and probability scores.
-
Edit Distance: The system calculates the similarity between the typed word and the words in the vocabulary using edit distance. It generates candidate corrections based on single-character edits, such as insertions, deletions, substitutions, and transpositions.
-
Probability-based Scoring: Each candidate correction is assigned a probability score based on the frequency of words in the provided text file. The system considers words with higher probabilities as better suggestions.
-
Graphical User Interface: The application provides a user-friendly GUI developed using the Tkinter library. Users can type in the input box and receive autocorrection suggestions in real-time.
-
Selecting Suggestions: The system displays suggestions in a list box format after waiting for 5 seconds of inactivity, allowing users to select the appropriate correction for the misspelled word. The suggestions are also ordered based on the similarity of the longer prefix, which means that similar words with longer common prefixes are suggested first in the list.
-
Automatic Word Replacement: If the user doesn't select from the list of autocorrection options and still the word is incorrect upon hitting the space bar, the application automatically replaces the misspelled word with the correction having the highest probability in the provided text corpus. This ensures a seamless and efficient autocorrection process without the need for explicit user selection.
The success of the AI Autocorrect System highly depends on the quality and diversity of the corpus. Your corpus should include a large number of unique words to ensure that the system can effectively correct different misspelled words.
To create my corpus.txt
, I explored multiple text-based datasets covering a wide range of subjects, including literature, healthcare, dictionaries, and various other domains. Each dataset contributed a unique set of words, enriching the vocabulary of the corpus. If the corpus does not have cleaned text, it is important to process it, ensuring that it only contains valid English words while removing any irrelevant characters, numbers, special symbols, and non-English words.
In short, you can make your corpus as big as possible, but it is crucial to ensure that it contains valid words.
- Run the application by executing the
main_script.py
file. - The GUI window will open, displaying an input box.
- Start typing in the input box, and the application will provide real-time autocorrection suggestions.
- If you encounter a misspelled word, the system will suggest corrections in the list box below the input box.
- To accept a suggestion, either click on the suggested word in the list box or press the space bar.
- The application will automatically replace the misspelled word with the selected suggestion.
Additionally some others libraries you will need to install besides python to run this application:
pip install numpy editdistance
The current implementation of the system does not involve comprehensive linguistic analysis to understand the context or grammar of the input. Instead, it relies on statistical methods to generate suggestions based on edit distance and word probabilities.