Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose FSDP2 MixedPrecisionPolicy params #2267

Open
EugenHotaj opened this issue Jan 14, 2025 · 1 comment
Open

Expose FSDP2 MixedPrecisionPolicy params #2267

EugenHotaj opened this issue Jan 14, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request triaged This issue has been assigned an owner and appropriate label

Comments

@EugenHotaj
Copy link
Contributor

It would be a good user experience improvement to expose FSDP2 MixedPrecisionPolicy to be set through the config, at least for param_dtype and reduce_dtype. These are important parameters when training in low precision (e.g. bf16) and right now are only changeable by hardcoding training.fully_shard. See #2254 for why these are important parameters.

As a suggestion, we may want to hardcode reduce_dtype=torch.float32 by default. I don't think it reduces training speed at all (but we should check) and helps with convergence / stability.

@joecummings joecummings added the triaged This issue has been assigned an owner and appropriate label label Jan 14, 2025
@joecummings joecummings self-assigned this Jan 14, 2025
@joecummings joecummings added the enhancement New feature or request label Jan 14, 2025
@joecummings
Copy link
Contributor

Agreed! Our usual approach is to limit the scope of APIs until such time as there is reasonable demand. Based on your experiments, I'd say this is very reasonable :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triaged This issue has been assigned an owner and appropriate label
Projects
None yet
Development

No branches or pull requests

2 participants