You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
beam-size to 4 should be enough for transformer-big models.
valid-mini-batch to 8 is a bit low, could be set to 32 or 64.
Regarding quality:
max-length is set to 100, which is pretty low on my opinion (I typically use 400). Specially if we are including sentences from certain EU corpora from OPUS that have very long lines and segments from HPLT in backtranslation (remember that HPLT does not have sentences splitted, they are in the corpus just as they appear in the original website). The valid-max-length is set to 300, which is ok, but max-length to 100 is causing all the training sentences over 100 tokens to be omitted, so the model is not learning from them (unless I'm missing a third configuration file in the pipeline).
I've been using swish always with no issues, but maybe there's no difference in using relu.
The text was updated successfully, but these errors were encountered:
I think the
teacher-big
task configuration used heretranslations/pipeline/train/configs/model/teacher.yml
Line 12 in e714bb7
and here
can be optimized.
Regarding the speed of training:
beam-size
to 4 should be enough for transformer-big models.valid-mini-batch
to 8 is a bit low, could be set to 32 or 64.Regarding quality:
max-length
is set to 100, which is pretty low on my opinion (I typically use 400). Specially if we are including sentences from certain EU corpora from OPUS that have very long lines and segments from HPLT in backtranslation (remember that HPLT does not have sentences splitted, they are in the corpus just as they appear in the original website). Thevalid-max-length
is set to 300, which is ok, butmax-length
to 100 is causing all the training sentences over 100 tokens to be omitted, so the model is not learning from them (unless I'm missing a third configuration file in the pipeline).swish
always with no issues, but maybe there's no difference in usingrelu
.The text was updated successfully, but these errors were encountered: