support blockwise fp8 matmul kernel #3267

yizhang2077 · 2025-02-03T10:22:21Z

Motivation

support fp8 blockwise kernel (Currently only supports scale_a block shapes of 1x128 and scale_b block shapes of 128x128 for deepseek v3), mainly from vllm
correctness
python3 tests/test_fp8_blockwise_gemm.py
benchmark
python3 benchmark/bench_fp8_blockwise_gemm.py --models meta-llama/Llama-3.1-8B-Instruct
TODO (update benchmark results)

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.

zhyncs · 2025-02-03T10:30:31Z

sgl-kernel/src/sgl-kernel/csrc/cutlass_extensions/gemm/collective/collective_builder.hpp

@yizhang2077 Delete all of these, we should use the version directly from 3rdparty CUTLASS instead.
BTW @BBuf will integrate a higher-performance version than this baseline, this PR won't be used directly.

zhyncs · 2025-02-03T10:45:37Z

sgl-kernel/benchmark/bench_fp8_blockwise_gemm.py

+    return torch.round(tensor.clamp(min=-128, max=127)).to(dtype=torch.int8)
+
+
+WEIGHT_SHAPES = {


I think DeepSeek V3 is sufficient, other models use per-tensor FP8, so we don't need a benchmark in that form.

yizhang2077 requested review from zhyncs, ispobock, HandH1998, BBuf and merrymercy as code owners February 3, 2025 10:22

zhyncs requested changes Feb 3, 2025

View reviewed changes

support blockwise fp8 matmul kernel

7b9ee19

yizhang2077 force-pushed the support-fp8-blockwise branch from 0cee1ef to 7b9ee19 Compare February 3, 2025 10:40

zhyncs reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support blockwise fp8 matmul kernel #3267

support blockwise fp8 matmul kernel #3267

yizhang2077 commented Feb 3, 2025

zhyncs Feb 3, 2025

zhyncs Feb 3, 2025

		return torch.round(tensor.clamp(min=-128, max=127)).to(dtype=torch.int8)


		WEIGHT_SHAPES = {

support blockwise fp8 matmul kernel #3267

Are you sure you want to change the base?

support blockwise fp8 matmul kernel #3267

Conversation

yizhang2077 commented Feb 3, 2025

Motivation

Modifications

Checklist

zhyncs Feb 3, 2025

Choose a reason for hiding this comment

zhyncs Feb 3, 2025

Choose a reason for hiding this comment