Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teacher training configuration improvements #987

Open
Tracked by #216 ...
ZJaume opened this issue Jan 13, 2025 · 1 comment
Open
Tracked by #216 ...

Teacher training configuration improvements #987

ZJaume opened this issue Jan 13, 2025 · 1 comment
Labels
quality Improving robustness and translation quality

Comments

@ZJaume
Copy link
Collaborator

ZJaume commented Jan 13, 2025

I think the teacher-big task configuration used here


and here
can be optimized.

Regarding the speed of training:

  • beam-size to 4 should be enough for transformer-big models.
  • valid-mini-batch to 8 is a bit low, could be set to 32 or 64.

Regarding quality:

  • max-length is set to 100, which is pretty low on my opinion (I typically use 400). Specially if we are including sentences from certain EU corpora from OPUS that have very long lines and segments from HPLT in backtranslation (remember that HPLT does not have sentences splitted, they are in the corpus just as they appear in the original website). The valid-max-length is set to 300, which is ok, but max-length to 100 is causing all the training sentences over 100 tokens to be omitted, so the model is not learning from them (unless I'm missing a third configuration file in the pipeline).
  • I've been using swish always with no issues, but maybe there's no difference in using relu.
@eu9ene eu9ene added the quality Improving robustness and translation quality label Jan 13, 2025
@eu9ene eu9ene marked this as a duplicate of #986 Jan 13, 2025
@eu9ene
Copy link
Collaborator

eu9ene commented Jan 15, 2025

Related to #89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quality Improving robustness and translation quality
Projects
None yet
Development

No branches or pull requests

2 participants