Skip to content

Missing key tokenizer in nemo NMT multilingual finetunning #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
syedhamza671 opened this issue Apr 17, 2025 · 0 comments
Open

Missing key tokenizer in nemo NMT multilingual finetunning #210

syedhamza671 opened this issue Apr 17, 2025 · 0 comments

Comments

@syedhamza671
Copy link

Hello there,

I followed the tutorial for NMT multilingual models finetunning. But in the last step where we run the megatron_nmt_training.py as told in the tutorial t gives an error regarding a key(tokenizer not being found in the pretrained models config file).

I ran this command :

HYDRA_FULL_ERROR=1
python /opt/NeMo/examples/nlp/machine_translation/megatron_nmt_training.py
trainer.precision=32
trainer.devices=1
trainer.max_epochs=5
trainer.max_steps=200000
trainer.val_check_interval=5000
trainer.log_every_n_steps=5000
model.multilingual=True
model.pretrained_model_path=workspace/model/pretrained_ckpt/megatronnmt_any_en_500m.nemo
model.micro_batch_size=1
model.global_batch_size=2
model.encoder_tokenizer.library=sentencepiece
model.decoder_tokenizer.library=sentencepiece
model.encoder_tokenizer.model=workspace/tokenizer/spm_64k_all_32_langs_plus_en_nomoses.model
model.decoder_tokenizer.model=workspace/tokenizer/spm_64k_all_32_langs_plus_en_nomoses.model
model.src_language=['es, pt']
model.tgt_language=en
model.train_ds.src_file_name=workspace/data/train_src_files
model.train_ds.tgt_file_name=workspace/data/train_tgt_files
model.test_ds.src_file_name=workspace/data/en_es_final_es_test_filepath
model.test_ds.tgt_file_name=workspace/data/en_es_final_en_test_filepath
model.validation_ds.src_file_name=workspace/data/val_src_files
model.validation_ds.tgt_file_name=workspace/data/val_tgt_files
model.optim.lr=0.00001
model.train_ds.concat_sampling_probabilities=['0.1, 0.1']
++model.pretrained_language_list=None
+model.optim.sched.warmup_steps=500
~model.optim.sched.warmup_ratio
exp_manager.resume_if_exists=True
exp_manager.resume_ignore_no_checkpoint=True
exp_manager.create_checkpoint_callback=True
exp_manager.checkpoint_callback_params.monitor=val_sacreBLEU_avg
exp_manager.checkpoint_callback_params.mode=max
exp_manager.checkpoint_callback_params.save_top_k=5
+exp_manager.checkpoint_callback_params.save_best_model=true

and it gives this error :

Traceback (most recent call last):
File "/opt/NeMo/examples/nlp/machine_translation/megatron_nmt_training.py", line 113, in main
pretrained_cfg.encoder_tokenizer = pretrained_cfg.tokenizer
omegaconf.errors.ConfigAttributeError: Missing key tokenizer
full_key: tokenizer
object_type=dict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant