You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed this:
"Note: The NVIDIA Blackwell SM100 architecture used in the datacenter products has a different compute capability than the one underpinning NVIDIA Blackwell GeForce RTX 50 series GPUs. As a result, kernels compiled for Blackwell SM100 architecture with arch conditional features (using sm100a) are not compatible with RTX 50 series GPUs."
When building cutlass examples, I tried both DCUTLASS_NVCC_ARCHS="100a" and DCUTLASS_NVCC_ARCHS="100".
When setting it to "100", examples such as 70_blackwell_gemm disappeared from the Makefile.
Does this mean that non-datacenter sm_100 Blackwell GPUs do not have the new TensorCore features? If so, do they fall back to Hopper? Can I use the hopper tensorcore examples to get max TFLOPS on sm_100 GPUs? Or does this mean cutlass currently only support sm_100a TensorCore operations?
Thank you so much!
The text was updated successfully, but these errors were encountered:
All kernels we have released in 3.8 can be run on 100 or 101. Just compile with the right CUTLASS_NVCC_ARCH set. SM120a support will come in a future release
What is your question?
Hello!
I noticed this:
"Note: The NVIDIA Blackwell SM100 architecture used in the datacenter products has a different compute capability than the one underpinning NVIDIA Blackwell GeForce RTX 50 series GPUs. As a result, kernels compiled for Blackwell SM100 architecture with arch conditional features (using sm100a) are not compatible with RTX 50 series GPUs."
When building cutlass examples, I tried both DCUTLASS_NVCC_ARCHS="100a" and DCUTLASS_NVCC_ARCHS="100".
When setting it to "100", examples such as 70_blackwell_gemm disappeared from the Makefile.
Does this mean that non-datacenter sm_100 Blackwell GPUs do not have the new TensorCore features? If so, do they fall back to Hopper? Can I use the hopper tensorcore examples to get max TFLOPS on sm_100 GPUs? Or does this mean cutlass currently only support sm_100a TensorCore operations?
Thank you so much!
The text was updated successfully, but these errors were encountered: