Replies: 2 comments
-
Sorry just found #692. But maybe good to hear an update on this issue, given the recent interest in small |
Beta Was this translation helpful? Give feedback.
0 replies
-
I just learned that TRT-LLM has implemented split-K grouped gemm in https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/splitk_gemm_grouped.h |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is grouped GEMM with split k supported? I'm interested in both 2.x and 3.x API.
The motivation is the blog by Meta https://pytorch.org/blog/accelerating-moe-model/#30-work-decomposition---splitk, which found that for small
M
problems typical in LLM inference, split K works really well.@hwu36
Beta Was this translation helpful? Give feedback.
All reactions