We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
I am pre-training a Mistral model from scratch with Nemo. I have a checkpoint saved by the trainer in the following format.
I want to transfer it to a HuggingFace format. I use the official script:
from pathlib import Path from nemo.collections.llm import export_ckpt if __name__ == "__main__": export_ckpt( path=Path("/workspace/model"), target="hf", output_path=Path("/workspace/model_hf"), )
I get the following errors. It seems like a bug in NeMo's official codes, as the model checkpoint is automatically saved by the trainer.
[rank0]: Traceback (most recent call last): [rank0]: File "/workspace/export_ckpt.py", line 6, in <module> [rank0]: export_ckpt( [rank0]: File "/opt/NeMo/nemo/collections/llm/api.py", line 663, in export_ckpt [rank0]: output = io.export_ckpt(path, target, output_path, overwrite, load_connector) [rank0]: File "/opt/NeMo/nemo/lightning/io/api.py", line 229, in export_ckpt [rank0]: return exporter(overwrite=overwrite, output_path=_output_path) [rank0]: File "/opt/NeMo/nemo/lightning/io/connector.py", line 99, in __call__ [rank0]: to_return = self.apply(_output_path) [rank0]: File "/opt/NeMo/nemo/collections/llm/gpt/model/mistral.py", line 202, in apply [rank0]: target = self.convert_state(source, target) [rank0]: File "/opt/NeMo/nemo/collections/llm/gpt/model/mistral.py", line 220, in convert_state [rank0]: return io.apply_transforms( [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/opt/NeMo/nemo/lightning/io/state.py", line 180, in apply_transforms [rank0]: assert target_orig_dtypes == extract_dtypes(_target.named_parameters()), ( [rank0]: AssertionError: dtype mismatch between source and target state dicts. Left side is {}, Right side is {'model.embed_tokens.weight': torch.float32, 'model.layers.0.self_attn.q_proj.weight': torch.float32, 'model.layers.0.self_attn.k_proj.weight': torch.float32, 'model.layers.0.self_attn.v_proj.weight': torch.float32, 'model.layers.0.self_attn.o_proj.weight': torch.float32, .......
Expected behavior
Transfer the trained model to HF format.
Environment overview (please complete the following information)
Official Nemo container (nvcr.io/nvidia/nemo:24.12).
The text was updated successfully, but these errors were encountered:
+1 I am facing this as well
Sorry, something went wrong.
No branches or pull requests
Describe the bug
I am pre-training a Mistral model from scratch with Nemo. I have a checkpoint saved by the trainer in the following format.
I want to transfer it to a HuggingFace format. I use the official script:
I get the following errors. It seems like a bug in NeMo's official codes, as the model checkpoint is automatically saved by the trainer.
Expected behavior
Transfer the trained model to HF format.
Environment overview (please complete the following information)
Official Nemo container (nvcr.io/nvidia/nemo:24.12).
The text was updated successfully, but these errors were encountered: