-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can somebody explain me what the _preprocess()
function is doing?
#33
Comments
Due to the fact that we convert each word in a sentence to an unique integer such as |
Oh, I realized this a little late, the Levenshtein distance in WER is calculated on words rather than characters. So that is done to convert each word into a unique ID, makes sense now. Thank you! |
I ran the standard example and walk through it in the debugger.
In the steps, it uses the
_preprocess()
function to convert the words/tokens into integer representations.hello world
is converted to\x00\x02
andhello duck
is converted to\x00\x01
. Then the Levenshtein distance is calculated on these strings rather than the original words. I am not sure how that maps to the original definition of Word Error Rate.Can somebody explain to me what is happening?
The text was updated successfully, but these errors were encountered: