Teacher training configuration improvements #987

ZJaume · 2025-01-13T17:35:12Z

I think the teacher-big task configuration used here

Line 12 in e714bb7

task: transformer-big

and here
can be optimized.

Regarding the speed of training:

Regarding quality:

max-length is set to 100, which is pretty low on my opinion (I typically use 400). Specially if we are including sentences from certain EU corpora from OPUS that have very long lines and segments from HPLT in backtranslation (remember that HPLT does not have sentences splitted, they are in the corpus just as they appear in the original website). The valid-max-length is set to 300, which is ok, but max-length to 100 is causing all the training sentences over 100 tokens to be omitted, so the model is not learning from them (unless I'm missing a third configuration file in the pipeline).
I've been using swish always with no issues, but maybe there's no difference in using relu.

The text was updated successfully, but these errors were encountered:

eu9ene · 2025-01-15T21:03:36Z

Related to #89

ZJaume mentioned this issue Jan 13, 2025

[meta] General translation quality improvements #216

Open

eu9ene added the quality Improving robustness and translation quality label Jan 13, 2025

eu9ene marked this as a duplicate of #986 Jan 13, 2025

eu9ene mentioned this issue Jan 13, 2025

Teacher training configuration improvements #986

Closed

marco-c mentioned this issue Jan 30, 2025

[meta] Retrain older high resource models #891

Open

Provide feedback