[WIP] Tensor scale nvfp4 #3022

nastya236 · 2026-01-20T15:39:17Z

Add tensor fp32 scale for nvfp4 quantization.
Python API changes:

quantize returns tensor_scale if mode==nvfp4
dequantize inputs tensor_scale if mode==nvfp4
qqmm inputs 4 arrays if weights provided quantized and mode==nvfp4: quantized weights, scales and tensor_scale

Changes to the quantization:

Quantize::eval_gpu quantize : 2 kernels launched: 1) all_reduce with AbsMax reduction 2) use computed absmax in the fp_quantize kernel
Quantize::eval_gpu dequantize: inputs absmax and uses it in fp_dequantize.

Changes to qqmm:

quantize as described above
use tensor_amax_x and tensor_amax_w to compute alpha for BLAS operation. Also allocate beta = 0 because cublaslt requires both pointers to be on device or host.

It still looks ugly.

…into tensor-scale-nvfp4

awni · 2026-01-20T20:02:39Z

python/src/ops.cpp


          * w_q (array): The quantized version of ``w``
          * scales (array): The quantization scales
+          * tensor_scale (array): The per-tensor float32 absolute max


This is a bit of a nit, but I don't really love the name tensor_scale. Partly because we call them array in MLX. What do you think about global_scale, or maybe array_scale?

nastya236 and others added 8 commits January 16, 2026 00:47

adding tensor scale [wip]

98eedd1

Merge branch 'main' into tensor-scale-nvfp4

438830a

added absmax reduction, changed fp_quanitze api [wip]

6892404

refactoring

15d684b

Merge branch 'ml-explore:main' into tensor-scale-nvfp4

8c67953

alpha device ptr for qqmm

a7fab99

Merge branch 'tensor-scale-nvfp4' of https://github.com/nastya236/mlx …

9fdfce6

…into tensor-scale-nvfp4

device alpha, beta

47be994

nastya236 closed this Jan 20, 2026

nastya236 reopened this Jan 20, 2026

nastya236 changed the title ~~Tensor scale nvfp4~~ [WIP] Tensor scale nvfp4 Jan 20, 2026

nastya236 added 2 commits January 20, 2026 20:43

harcoded absmax to output float

7e4c6e8

fixed ops python dequantize

11ff19a

awni reviewed Jan 20, 2026

View reviewed changes

input global_scale

2a86dc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Tensor scale nvfp4 #3022

[WIP] Tensor scale nvfp4 #3022

nastya236 commented Jan 20, 2026 •

edited

Loading

Uh oh!

awni Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Tensor scale nvfp4 #3022

Are you sure you want to change the base?

[WIP] Tensor scale nvfp4 #3022

Conversation

nastya236 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awni Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nastya236 commented Jan 20, 2026 •

edited

Loading