Lingustic-differences

Levenshtein Algorithm (Edit distance algorithm) is used to find the similarity between two strings. It is also referred to as Edit Distance Algorithm. The Levenshtein distance is a number that tells you how different two strings are. The higher the number, the more different the two strings are. More specifically, Levenshtein Distance tells about the total number of edits (insertions, deletions and substitutions) needed to make in one string to convert it to another.

For example, the Levenshtein distance between “kitten” and “sitting” is 3 since, at a minimum, 3 edits are required to change one into the other.

kitten → sitten (substitution of “s” for “k”)
sitten → sittin (substitution of “i” for “e”)
sittin → sitting (insertion of “g” at the end).
An “edit” is defined by either an insertion of a character, a deletion of a character, or a replacement of a character.
We should all thank Vladimir Levenshtein, who came up with his algorithm in 1965. The algorithm hasn’t been improved in over 50 years and for good reason. According to MIT, it may very well be that Levenshtein’s algorithm is the best that we’ll ever get in terms of efficiency. (Levenshtein_distance Wikipedia)

Expected Goals

To Implementation and Algorithm Analysis of Levenshtein Algorithm.
To Modify the Levenshtein Algorithm.
To Match Human DNA sequences and find their similarities.
To Understanding Linguistic Differences using the normal and modified Levenshtein Algorithm.
To compare Images and determine their similarity using the normal and modified Levenshtein Algorithm.

How to compare two strings?

Add your file in data folder.
The file format should be like (check data folder)

 5 (no. of languages)
 6 (no of comparing strings)
 Hindi Marathi Bengali Punjabi Bhojpuri (languages)
 Somvaar Somavar Shombar Somvaar Somaar
 Mangalvaar Mangalavaar Mongolbar Mangalvaar Mangar
 Budhvaar Budhavar Budhbar Budhvaar Budh
 Shukravaar Sukravar Shukrobar Shukarvaar Shukkar
 Shanivaar Shanivar Shonibar Shanivaar Sanichar
 Ravivaar Ravivar Robibar Aitvaar Etvaar

To check the output, compile and run the cpp code by giving the text file as input (go to file and).
- On ubuntu use
  
  g++ M_levenshtein.cpp && ./a.out < ../data/misc.txt

Image similarity (check report)

REFERENCES

Levenshtein Distance- From Wikipedia, link
Hussain A Chowdhury and Dhruba K Bhattacharyya, Plagiarism: Taxonomy, Tools and Detection Techniques, link (Plagiarism detection)
Ethan Nam, Understanding the Levenshtein Distance Equation for Beginners link (levenshtein-distance)
The Levenshtein Algorithm, by Cuelogic Insights link
Lailil Muflikhah, Identifying Cancer Disease through Deoxyribonucleic Acid (DNA) Sequential Pattern Mining (DNA example)
C#: Calculating Percentage Similarity of 2 strings link (similarity of 2 strings)
The MNIST dataset provided in a easy-to-use CSV format MNIST in CSV (MNIST Dataset)
Sequence Alignment problem, GFG, Sequence Alignment problem (algorithm intuition)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Levenshtein_a		Levenshtein_a
Levenshtein_algo_mod		Levenshtein_algo_mod
Levenshtein_algo_mod2		Levenshtein_algo_mod2
data		data
Project Report (1) (2).pdf		Project Report (1) (2).pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lingustic-differences

Expected Goals

How to compare two strings?

Image similarity (check report)

REFERENCES

About

Releases

Packages

Languages

Amagnum/Lingustic-differences

Folders and files

Latest commit

History

Repository files navigation

Lingustic-differences

Expected Goals

How to compare two strings?

Image similarity (check report)

REFERENCES

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages