Enable direct MX4→BF16 dequantization to reduce memory #5206

armandsauzay · 2025-12-09T19:54:37Z

Summary:
X-link: meta-pytorch/torchrec#3602

X-link: https://github.com/facebookresearch/FBGEMM/pull/2202

Add output_dtype parameter to MX4 dequantization stack to support direct
conversion to BF16/FP16, avoiding expensive FP32 intermediate step.

Differential Revision: D87826479

Summary: X-link: meta-pytorch/torchrec#3602 X-link: facebookresearch/FBGEMM#2202 Add output_dtype parameter to MX4 dequantization stack to support direct conversion to BF16/FP16, avoiding expensive FP32 intermediate step. Differential Revision: D87826479

meta-codesync · 2025-12-09T19:54:45Z

@armandsauzay has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87826479.

meta-cla bot added the cla signed label Dec 9, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable direct MX4→BF16 dequantization to reduce memory #5206

Enable direct MX4→BF16 dequantization to reduce memory #5206

Uh oh!

armandsauzay commented Dec 9, 2025

Uh oh!

meta-codesync bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Enable direct MX4→BF16 dequantization to reduce memory #5206

Are you sure you want to change the base?

Enable direct MX4→BF16 dequantization to reduce memory #5206

Uh oh!

Conversation

armandsauzay commented Dec 9, 2025

Uh oh!

meta-codesync bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant