Skip to content

Comments

MXFP4 Workgroup Reordering Based on SP3 Kernel#914

Open
suryajasper wants to merge 4 commits intoiree-org:mainfrom
suryajasper:mxfp4-workgroup-reordering
Open

MXFP4 Workgroup Reordering Based on SP3 Kernel#914
suryajasper wants to merge 4 commits intoiree-org:mainfrom
suryajasper:mxfp4-workgroup-reordering

Conversation

@suryajasper
Copy link
Contributor

@suryajasper suryajasper commented Feb 19, 2026

MXFP4 Workgroup Reordering

Remaps the workgroup grid by creating groups of 32 tiles along the N dimension. This way, the first 32 workgroups in launch order are assigned along the N dimension on the same M row (0), then the next 32 to the next M row (1), and so on until wrapping around back to the first row.

Through this grouping, consecutive workgroups are able to reuse the same A row slices rather than spreading over a larger, sparse footprint. This improves L2 cache reuse since workgroups running close together in launch order work on the same row of output tiles.

The index remapping is calculated through two cases:

  • Main case: Creates full groups of 32 consecutive tiles along N dimension
  • Tail case: If the number of tiles along the N dimension is not a multiple of our group size (32), after forming as many full groups as possible, we are left with a small rectangle in the launch grid, which we fill with the tailing workgroups in row-major order.

@suryajasper suryajasper marked this pull request as ready for review February 21, 2026 00:30
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
@suryajasper suryajasper force-pushed the mxfp4-workgroup-reordering branch from f3953d5 to 7b5735b Compare February 21, 2026 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant