You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am using punctuator2 library to train a model for Albanian Language which is part of Indo-European languages with latin-derived alphabet.
I use 206,000 articles from an Albanian magazine. So my corpus is large enough to train the model.
I have successfully trained the model and I am satisfied with the results. However, when I test the model for a random text, it converts all the single lower case " i-s " into upper case " I ". In Albanian language, a single " i " within a sentence represents a conjunction which should be written in lowercase. So this made me think that the model somehow is using something pre-trained or hardcoded from english language (which I am not aware of).
I checked the code (data.py, models.py and main.py) but I could not notice anything hardcoded for that matter, except the "We.pcl" file referenced in the code which does not exist on my path since I do not use it.
Do you have any suggestion or idea why is this happening?
The text was updated successfully, but these errors were encountered:
preniqivjosa
changed the title
The model trained for non-english language is converting lower case '' i " into upper case " I "
The model trained for a non-english language is converting lower case '' i " into upper case " I "
Jul 17, 2020
preniqivjosa
changed the title
The model trained for a non-english language is converting lower case '' i " into upper case " I "
The model trained for a non-english language is converting the single lower case '' i " into upper case " I "
Jul 17, 2020
are you using convert_to_readable.py or demo_play_with_model.py scripts? These two convert the first letter of the first word in each sentence to uppercase ("Title"-case or .title() in python)
Hi @ottokart,
Thank you for the reply!
I was using a different script created for testing the model, but the problem is solved when using demo_play_with_model.py.
Hi,
I am using punctuator2 library to train a model for Albanian Language which is part of Indo-European languages with latin-derived alphabet.
I use 206,000 articles from an Albanian magazine. So my corpus is large enough to train the model.
I have successfully trained the model and I am satisfied with the results. However, when I test the model for a random text, it converts all the single lower case " i-s " into upper case " I ". In Albanian language, a single " i " within a sentence represents a conjunction which should be written in lowercase. So this made me think that the model somehow is using something pre-trained or hardcoded from english language (which I am not aware of).
I checked the code (data.py, models.py and main.py) but I could not notice anything hardcoded for that matter, except the "We.pcl" file referenced in the code which does not exist on my path since I do not use it.
Do you have any suggestion or idea why is this happening?
The text was updated successfully, but these errors were encountered: