Device-side grouped GEMM #1620

osayamenja · 2024-07-07T04:00:40Z

osayamenja
Jul 7, 2024

Hello!

Is there a way to do a device-side grouped (or batched) gemm? That is, my goal is to perform a set of GEMMs with non uniform sizes, flexible constraint as we can pad zeros, inside a kernel. I could serially invoke N cute::cooperative_gemm calls, but that may not be as optimal as highlighted in this GTC'24 talk.

Any suggestions for actualizing the grouped gemm, preferably for capability >= 70? Thanks.

Answered by thakkarV

Jul 9, 2024

yes, that example ultimately uses this collective: https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp

which you can use directly in your code on the device side.

View full answer

jackkosaian · 2024-07-08T17:52:00Z

jackkosaian
Jul 8, 2024

By "device-side," do you mean that you'd like to perform the grouped-GEMM from within an existing kernel, rather than launching two separate kernels?

6 replies

jackkosaian Jul 8, 2024

There is not currently a way to do this.

cuBLASDx provides some methods for doing this for basic GEMMs. However, I'm not sure that it has been extended to support grouped GEMM.

thakkarV Jul 8, 2024
Collaborator

Note that CUTLASS collective already support this. There's no such thing as a separate "device side grouped GEMM". That's exactly what the collectives are

osayamenja Jul 9, 2024
Author

@thakkarV thanks for the suggestion, I had only been looking at cute gemm and had no idea about the Collective API in cutlass. I'll be doing some reading about that here :)

So, achieving this would mean constructing a collective in device code and configuring for grouped like this example, is this similar to what you had in mind?

thakkarV Jul 9, 2024
Collaborator

yes, that example ultimately uses this collective: https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp

which you can use directly in your code on the device side.

Answer selected by osayamenja

osayamenja Jul 9, 2024
Author

thanks! @thakkarV @jackkosaian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device-side grouped GEMM #1620

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Device-side grouped GEMM #1620

osayamenja Jul 7, 2024

Replies: 1 comment · 6 replies

jackkosaian Jul 8, 2024

jackkosaian Jul 8, 2024

thakkarV Jul 8, 2024 Collaborator

osayamenja Jul 9, 2024 Author

thakkarV Jul 9, 2024 Collaborator

osayamenja Jul 9, 2024 Author

osayamenja
Jul 7, 2024

Replies: 1 comment 6 replies

jackkosaian
Jul 8, 2024

thakkarV Jul 8, 2024
Collaborator

osayamenja Jul 9, 2024
Author

thakkarV Jul 9, 2024
Collaborator

osayamenja Jul 9, 2024
Author