You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUTLASS GemmDevice Operator contains compile-time attributes (functional and performance attribute). The GemmDevice Operator is consumed by GemmOperation[3xBase]. In the past, I have found some of the values in this data structure incorrect and sometimes just completely missing.
For e.g. given the kernel by its full procedural name = cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_bf16_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_cooperative_epi_tma.
This kernel has the following functional and performance attribute:
Mainloop Kind : warpspecialized_cooperative // This is missing from GemmDescription
Epilogue Kind : epi_tma // This is missing from GemmDescription
I have added a test so someone at NVIDIA can start on this. Can you please uncomment the two lines, add whatever is needed to fix this?
You can follow any other enum that is lifted up to GemmDevice Operator from internal templates and used to set the data members of GemmDescription class.
We can then commit this test and add more for Hopper and make sure this class is also covered for Blackwell. The tests are CPU-only and should not take too much time in the CI, this will allow us to catch bugs like this one.
The text was updated successfully, but these errors were encountered:
Issue
CUTLASS GemmDevice
Operator
contains compile-time attributes (functional and performance attribute). The GemmDeviceOperator
is consumed by GemmOperation[3xBase]. In the past, I have found some of the values in this data structure incorrect and sometimes just completely missing.For e.g. given the kernel by its full procedural name =
cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_bf16_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_cooperative_epi_tma
.This kernel has the following functional and performance attribute:
Functional Attribute
Performance Attribute
I have added a test so someone at NVIDIA can start on this. Can you please uncomment the two lines, add whatever is needed to fix this?
You can follow any other enum that is lifted up to GemmDevice
Operator
from internal templates and used to set the data members ofGemmDescription
class.We can then commit this test and add more for Hopper and make sure this class is also covered for Blackwell. The tests are CPU-only and should not take too much time in the CI, this will allow us to catch bugs like this one.
The text was updated successfully, but these errors were encountered: