Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated several bin files instead of only one bin file using zero_to_fp32.py #155

Open
skx6 opened this issue Nov 1, 2024 · 1 comment

Comments

@skx6
Copy link

skx6 commented Nov 1, 2024

Great work! I trained a 13B model, however, when I try to run the following codes, an error occurs:

python zero_to_fp32.py . ../pytorch_model.bin

The error is:

RuntimeError: Parent directory ../pytorch_model.bin does not exist.

Is the output of this step a file or directory? Is it a problem about deepspeed version or config?

@Hishamew
Copy link

Hishamew commented Nov 3, 2024

That's because the version of deepspeed must be under or equal to 0.15.2. There is a change made to file zero_to_fp32.py in deepspeed0.15.3 which causes this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants