Can somebody explain me what the `_preprocess()` function is doing? #33

samarth12 · 2020-10-20T23:12:37Z

I ran the standard example and walk through it in the debugger.

from jiwer import wer

ground_truth = "hello world"
hypothesis = "hello duck"

error = wer(ground_truth, hypothesis)

In the steps, it uses the _preprocess() function to convert the words/tokens into integer representations.

hello world is converted to \x00\x02 and hello duck is converted to \x00\x01. Then the Levenshtein distance is calculated on these strings rather than the original words. I am not sure how that maps to the original definition of Word Error Rate.

Can somebody explain to me what is happening?

The text was updated successfully, but these errors were encountered:

nikvaessen · 2020-10-21T08:32:04Z

Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
(https://en.wikipedia.org/wiki/Levenshtein_distance)

Due to the fact that we convert each word in a sentence to an unique integer such as \x00, the ground truth and hypothesis sentences become two "words" of integer tokens. We can then use the Levenshtein distance to calculate the insertions (I), deletions (D) or substitutions (S) required to change the hypothesis "integer word" into the ground truth "integer word". We can than simply plug the S, D, and I values we found in the WER formula.

samarth12 · 2020-10-21T23:53:11Z

Oh, I realized this a little late, the Levenshtein distance in WER is calculated on words rather than characters. So that is done to convert each word into a unique ID, makes sense now. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can somebody explain me what the `_preprocess()` function is doing? #33

Can somebody explain me what the `_preprocess()` function is doing? #33

samarth12 commented Oct 20, 2020

nikvaessen commented Oct 21, 2020 •

edited

Loading

samarth12 commented Oct 21, 2020

Can somebody explain me what the _preprocess() function is doing? #33

Can somebody explain me what the _preprocess() function is doing? #33

Comments

samarth12 commented Oct 20, 2020

nikvaessen commented Oct 21, 2020 • edited Loading

samarth12 commented Oct 21, 2020

Can somebody explain me what the `_preprocess()` function is doing? #33

Can somebody explain me what the `_preprocess()` function is doing? #33

nikvaessen commented Oct 21, 2020 •

edited

Loading