How to modify threadblock shape when A/B is transposed in GEMM? #1281

imoneoi · 2023-12-26T16:10:39Z

imoneoi
Dec 26, 2023

I wrote a custom kernel for grouped GEMM using cutlass. Should I transpose the threadblock, warp and instruction shapes here accordingly when A/B is transposed?

Besides, how to tune these shapes for optimal performance?

Answered by jackkosaian

Dec 28, 2023

Thanks for clarifying. A column-major input is unlikely to alter the best-performing combination of these parameters.

View full answer

jackkosaian · 2023-12-27T12:46:36Z

jackkosaian
Dec 27, 2023

Do you mean when there is an "internal transposition" due to the output being column major? If so, you should not need to change those template parameters.

To tune for performance, it's suggested to try out many valid combinations of threadblock shape, warp shape, and stage count. You may find it useful to follow some of the combinations listed by the CUTLASS kernel generator. For example, these lines show a subset of vali combinations threadblock shape, stage count, and warp count for an SM80 FP16 GEMM.

2 replies

imoneoi Dec 28, 2023
Author

Thank you for your response! What I mean is when the input matrix A or B is transposed (column major) while C is always row major. Also, if any input becomes column major, does the best performing threadblock/warp shape/stage count change?

jackkosaian Dec 28, 2023

Thanks for clarifying. A column-major input is unlikely to alter the best-performing combination of these parameters.

Answer selected by imoneoi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to modify threadblock shape when A/B is transposed in GEMM? #1281

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

How to modify threadblock shape when A/B is transposed in GEMM? #1281

imoneoi Dec 26, 2023

Replies: 1 comment · 2 replies

jackkosaian Dec 27, 2023

imoneoi Dec 28, 2023 Author

jackkosaian Dec 28, 2023

imoneoi
Dec 26, 2023

Replies: 1 comment 2 replies

jackkosaian
Dec 27, 2023

imoneoi Dec 28, 2023
Author