Levenshtein Algorithm (Edit distance algorithm) is used to find the similarity between two strings. It is also referred to as Edit Distance Algorithm. The Levenshtein distance is a number that tells you how different two strings are. The higher the number, the more different the two strings are. More specifically, Levenshtein Distance tells about the total number of edits (insertions, deletions and substitutions) needed to make in one string to convert it to another.
For example, the Levenshtein distance between “kitten” and “sitting” is 3 since, at a minimum, 3 edits are
required to change one into the other.
- kitten → sitten (substitution of “s” for “k”)
- sitten → sittin (substitution of “i” for “e”)
- sittin → sitting (insertion of “g” at the end).
An “edit” is defined by either an insertion of a character, a deletion of a character, or a replacement of a character.
We should all thank Vladimir Levenshtein, who came up with his algorithm in 1965. The algorithm hasn’t been improved in over 50 years and for good reason. According to MIT, it may very well be that Levenshtein’s algorithm is the best that we’ll ever get in terms of efficiency. (Levenshtein_distance Wikipedia)
- To Implementation and Algorithm Analysis of Levenshtein Algorithm.
- To Modify the Levenshtein Algorithm.
- To Match Human DNA sequences and find their similarities.
- To Understanding Linguistic Differences using the normal and modified Levenshtein Algorithm.
- To compare Images and determine their similarity using the normal and modified Levenshtein Algorithm.
- Add your file in data folder.
- The file format should be like (check data folder)
-
5 (no. of languages) 6 (no of comparing strings) Hindi Marathi Bengali Punjabi Bhojpuri (languages) Somvaar Somavar Shombar Somvaar Somaar Mangalvaar Mangalavaar Mongolbar Mangalvaar Mangar Budhvaar Budhavar Budhbar Budhvaar Budh Shukravaar Sukravar Shukrobar Shukarvaar Shukkar Shanivaar Shanivar Shonibar Shanivaar Sanichar Ravivaar Ravivar Robibar Aitvaar Etvaar
- To check the output, compile and run the cpp code by giving the text file as input (go to file and).
- On ubuntu use
g++ M_levenshtein.cpp && ./a.out < ../data/misc.txt
- On ubuntu use
- Levenshtein Distance- From Wikipedia, link
- Hussain A Chowdhury and Dhruba K Bhattacharyya, Plagiarism: Taxonomy, Tools and Detection Techniques, link (Plagiarism detection)
- Ethan Nam, Understanding the Levenshtein Distance Equation for Beginners link (levenshtein-distance)
- The Levenshtein Algorithm, by Cuelogic Insights link
- Lailil Muflikhah, Identifying Cancer Disease through Deoxyribonucleic Acid (DNA) Sequential Pattern Mining (DNA example)
- C#: Calculating Percentage Similarity of 2 strings link (similarity of 2 strings)
- The MNIST dataset provided in a easy-to-use CSV format MNIST in CSV (MNIST Dataset)
- Sequence Alignment problem, GFG, Sequence Alignment problem (algorithm intuition)