Expose FSDP2 MixedPrecisionPolicy params #2267

EugenHotaj · 2025-01-14T19:47:19Z

It would be a good user experience improvement to expose FSDP2 MixedPrecisionPolicy to be set through the config, at least for param_dtype and reduce_dtype. These are important parameters when training in low precision (e.g. bf16) and right now are only changeable by hardcoding training.fully_shard. See #2254 for why these are important parameters.

As a suggestion, we may want to hardcode reduce_dtype=torch.float32 by default. I don't think it reduces training speed at all (but we should check) and helps with convergence / stability.

The text was updated successfully, but these errors were encountered:

joecummings · 2025-01-14T19:52:39Z

Agreed! Our usual approach is to limit the scope of APIs until such time as there is reasonable demand. Based on your experiments, I'd say this is very reasonable :)

joecummings added the triaged This issue has been assigned an owner and appropriate label label Jan 14, 2025

joecummings self-assigned this Jan 14, 2025

joecummings added the enhancement New feature or request label Jan 14, 2025

ebsmothers mentioned this issue Jan 16, 2025

Lora and Dora finetuning produces identical results #2250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose FSDP2 MixedPrecisionPolicy params #2267

Expose FSDP2 MixedPrecisionPolicy params #2267

EugenHotaj commented Jan 14, 2025

joecummings commented Jan 14, 2025

Expose FSDP2 MixedPrecisionPolicy params #2267

Expose FSDP2 MixedPrecisionPolicy params #2267

Comments

EugenHotaj commented Jan 14, 2025

joecummings commented Jan 14, 2025