-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on Russian #20
Comments
I don't know about Russian BERTs but what you want to care about is tokeization. In particular the preprocessing stage needs to normalize punctuation in a tokenization-neutral manner. |
There are already trained models https://alphacephei.com/vosk/models/vosk-recasepunc-ru-0.22.zip |
@nshmyrev thanks for your reply. I noticed that to run this models you mentioned there are more dependencies than the one reposrted on this repository. Am I correct? |
I'me getting this error when trying to run prediction with this russian model:
I'm using an enviroment with all requirements requested here: https://github.com/benob/recasepunc |
Hi. I replied you on alphacep/vosk-api#1459, it needs transformers==4.25.0 |
In order to train a model on Russian dara from Web Crawl, do you suggest a specifc pre-trained bert model?
The text was updated successfully, but these errors were encountered: