Skip to content

Comments

Expand 7.2 example to have double buffer scheduling and scale preshuffle#912

Open
willghatch wants to merge 1 commit intomainfrom
users/willghatch/mxfp4-preshuffle-with-schedule
Open

Expand 7.2 example to have double buffer scheduling and scale preshuffle#912
willghatch wants to merge 1 commit intomainfrom
users/willghatch/mxfp4-preshuffle-with-schedule

Conversation

@willghatch
Copy link
Contributor

Previously 7.2 just did scale preshuffle, and didn't build on top of the optimiziations of 7.1. This adds a scheduled preshuffle kernel to 7.2 like in 7.1. However, the exact schedule is not the same, because 7.2 does not use LDS memory for the scales, since optimization to merge loads into vector loads is not yet working for LDS. So this includes a new schedule. It has a --benchmark flag that can be used to benchmark and see results of vanilla mxfp4, scheduled, preshuffled (non-scheduled), and preshuffled scheduled.

Previously 7.2 just did scale preshuffle, and didn't build on top of the optimiziations of 7.1.
This adds a scheduled preshuffle kernel to 7.2 like in 7.1.  However, the exact schedule is not the same, because 7.2 does not use LDS memory for the scales, since optimization to merge loads into vector loads is not yet working for LDS.  So this includes a new schedule.
It has a `--benchmark` flag that can be used to benchmark and see results of vanilla mxfp4, scheduled, preshuffled (non-scheduled), and preshuffled scheduled.

Signed-off-by: William G Hatch <william@hatch.uno>
@willghatch
Copy link
Contributor Author

This is mostly just LLM merging things. It imports kernels from 7.1 for the benchmarks to try to make it fair. In running the benchmarks a few times on a shared machine, the scheduled preshuffle kernel wins typically, but there is enough jitter that the runtime of any of the kernels can go high enough to lose.

@xintin
Copy link
Contributor

xintin commented Feb 18, 2026

Just a thought, I think we shall keep 7.2, just scale preshuffle. Benchmarking script could be a separate one.
This will align with the current file structure

@willghatch
Copy link
Contributor Author

I'm fine with moving out the benchmark, it's just been convenient when working on it. But do you mean that you want the preshuffle scale + scheduling in a different file, too? Or leave it as it is here except to remove the benchmarking?

@xintin
Copy link
Contributor

xintin commented Feb 18, 2026

I think moving benchmark do a different file should be fine.
Right now, FWIU, each schedule x.y can have multiple tests.

req @panditsa to comment.

@willghatch
Copy link
Contributor Author

@panditsa since you also made a similar PR to this one that merges the preshuffle and the scheduling, would you drop a link to yours here for reference? Then I'm happy to close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants