Grouped GEMM with split k #1536

masahi · 2024-05-12T19:39:22Z

masahi
May 12, 2024

Is grouped GEMM with split k supported? I'm interested in both 2.x and 3.x API.

The motivation is the blog by Meta https://pytorch.org/blog/accelerating-moe-model/#30-work-decomposition---splitk, which found that for small M problems typical in LLM inference, split K works really well.

@hwu36

masahi · 2024-05-12T19:41:19Z

masahi
May 12, 2024
Author

Sorry just found #692. But maybe good to hear an update on this issue, given the recent interest in small M problems from LLM domain. Mixtral and LoRA might benefit from this.

0 replies

masahi · 2024-07-09T06:05:53Z

masahi
Jul 9, 2024
Author

I just learned that TRT-LLM has implemented split-K grouped gemm in https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/splitk_gemm_grouped.h

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grouped GEMM with split k #1536

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Grouped GEMM with split k #1536

masahi May 12, 2024

Replies: 2 comments

masahi May 12, 2024 Author

masahi Jul 9, 2024 Author

masahi
May 12, 2024

masahi
May 12, 2024
Author

masahi
Jul 9, 2024
Author