Missing key tokenizer in nemo NMT multilingual finetunning #210

syedhamza671 · 2025-04-17T08:14:06Z

Hello there,

I followed the tutorial for NMT multilingual models finetunning. But in the last step where we run the megatron_nmt_training.py as told in the tutorial t gives an error regarding a key(tokenizer not being found in the pretrained models config file).

I ran this command :

HYDRA_FULL_ERROR=1
python /opt/NeMo/examples/nlp/machine_translation/megatron_nmt_training.py
trainer.precision=32
trainer.devices=1
trainer.max_epochs=5
trainer.max_steps=200000
trainer.val_check_interval=5000
trainer.log_every_n_steps=5000
model.multilingual=True
model.pretrained_model_path=workspace/model/pretrained_ckpt/megatronnmt_any_en_500m.nemo
model.micro_batch_size=1
model.global_batch_size=2
model.encoder_tokenizer.library=sentencepiece
model.decoder_tokenizer.library=sentencepiece
model.encoder_tokenizer.model=workspace/tokenizer/spm_64k_all_32_langs_plus_en_nomoses.model
model.decoder_tokenizer.model=workspace/tokenizer/spm_64k_all_32_langs_plus_en_nomoses.model
model.src_language=['es, pt']
model.tgt_language=en
model.train_ds.src_file_name=workspace/data/train_src_files
model.train_ds.tgt_file_name=workspace/data/train_tgt_files
model.test_ds.src_file_name=workspace/data/en_es_final_es_test_filepath
model.test_ds.tgt_file_name=workspace/data/en_es_final_en_test_filepath
model.validation_ds.src_file_name=workspace/data/val_src_files
model.validation_ds.tgt_file_name=workspace/data/val_tgt_files
model.optim.lr=0.00001
model.train_ds.concat_sampling_probabilities=['0.1, 0.1']
++model.pretrained_language_list=None
+model.optim.sched.warmup_steps=500
~model.optim.sched.warmup_ratio
exp_manager.resume_if_exists=True
exp_manager.resume_ignore_no_checkpoint=True
exp_manager.create_checkpoint_callback=True
exp_manager.checkpoint_callback_params.monitor=val_sacreBLEU_avg
exp_manager.checkpoint_callback_params.mode=max
exp_manager.checkpoint_callback_params.save_top_k=5
+exp_manager.checkpoint_callback_params.save_best_model=true

and it gives this error :

Traceback (most recent call last):
File "/opt/NeMo/examples/nlp/machine_translation/megatron_nmt_training.py", line 113, in main
pretrained_cfg.encoder_tokenizer = pretrained_cfg.tokenizer
omegaconf.errors.ConfigAttributeError: Missing key tokenizer
full_key: tokenizer
object_type=dict

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing key tokenizer in nemo NMT multilingual finetunning #210

Missing key tokenizer in nemo NMT multilingual finetunning #210

syedhamza671 commented Apr 17, 2025

Missing key tokenizer in nemo NMT multilingual finetunning #210

Missing key tokenizer in nemo NMT multilingual finetunning #210

Comments

syedhamza671 commented Apr 17, 2025