Punctuation and Capitalization on non-english text #2297

iry47 · 2021-06-02T09:44:40Z

iry47
Jun 2, 2021

Hi, I'm looking to train the punctuation and capitalization model for the french language.
Has anyone worked with this model on non english text?

Answered by ekmb

Jun 3, 2021

We haven't done any experiments with this model on non-English data but the model should work for other languages out-of-box.
The quickest wait to try this model with French data, would be to use a pre-trained BERT-like model, for example, model.language_model.pretrained_model_name=bert-base-multilingual-cased or amine/bert-base-5lang-cased. To prepare data for the punctuation and capitalization tasks, please see this tutorial. The Tatoeba dataset contains French sentences as well, you would need to modify this line to get Fr data.

View full answer

ekmb · 2021-06-03T16:51:20Z

ekmb
Jun 3, 2021
Collaborator

We haven't done any experiments with this model on non-English data but the model should work for other languages out-of-box.
The quickest wait to try this model with French data, would be to use a pre-trained BERT-like model, for example, model.language_model.pretrained_model_name=bert-base-multilingual-cased or amine/bert-base-5lang-cased. To prepare data for the punctuation and capitalization tasks, please see this tutorial. The Tatoeba dataset contains French sentences as well, you would need to modify this line to get Fr data.

5 replies

iry47 Jun 7, 2021
Author

Amazing, thank you so much for the information!
I went directly into completing the tutorial and the results of my trained model are pretty good, although I'm getting the PyTorch 1.1.0 error, which I think might have prevented my model from being saved... Is it possible this bug has been fixed and I'm using an old version? If not, there are these functions being called?
Here's the full error

Detected call of lr_scheduler.step()beforeoptimizer.step(). " "In PyTorch 1.1.0 and later, you should call them in the opposite order: " "optimizer.step()beforelr_scheduler.step(). Failure to do this " "will result in PyTorch skipping the first value of the learning rate schedule. " "See more details at

iry47 Jun 8, 2021
Author

Also, I'm having trouble converting the ckpt to a model that I can use and apply in my application. I was looking at the transformers cli for converting tensorflow checkpoints but I'm sure if I'm using the right files...
Whats the best way to produce a useable model ?

By the way, it seems like the model is doing pretty well with the french data!
@ekmb

ekmb Jun 8, 2021
Collaborator

This is a warning message and shouldn't preclude the model from being saved. The .pt checkpoints and .nemo file (.nemo file is just .tar file with .pt checkpoints and model artifacts, for example, model config) by default are stored under nemo_experiments/Punctuation_and_Capitalization/<DATE_TIME>/checkpoints.

iry47 Jun 9, 2021
Author

You're right @ekmb, the files were there but I had to ls the folder to see them (colab gui doesn't display them).
I'm still really confused how to convert a nemo model to a py file that I can use... Could you please shine some light onto how to do this ?

ekmb Jun 15, 2021
Collaborator

@iry47 .nemo file is a tar.gz file, you can run tar -xf your_model.nemo to get .pt PyTorch checkpoint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Punctuation and Capitalization on non-english text #2297

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Punctuation and Capitalization on non-english text #2297

Uh oh!

iry47 Jun 2, 2021

Replies: 1 comment · 5 replies

Uh oh!

Uh oh!

ekmb Jun 3, 2021 Collaborator

Uh oh!

iry47 Jun 7, 2021 Author

Uh oh!

iry47 Jun 8, 2021 Author

Uh oh!

ekmb Jun 8, 2021 Collaborator

Uh oh!

iry47 Jun 9, 2021 Author

Uh oh!

ekmb Jun 15, 2021 Collaborator

iry47
Jun 2, 2021

Replies: 1 comment 5 replies

ekmb
Jun 3, 2021
Collaborator

iry47 Jun 7, 2021
Author

iry47 Jun 8, 2021
Author

ekmb Jun 8, 2021
Collaborator

iry47 Jun 9, 2021
Author

ekmb Jun 15, 2021
Collaborator