-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix arch guards in a few examples #2567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Some examples build their own kernel layer (FMHA, DistGEMM), and they must guard them according to the supported features. Issue: NVIDIA#2559
The cmake config already guards targets by checking if we're building for SM100A (although unsure whether it's an exact match or not). For safety, it's best to just have the arch-specific MMA guards in place.
@@ -507,6 +507,9 @@ struct Sm100FmhaMlaKernelTmaWarpspecialized { | |||
|
|||
|
|||
CUTLASS_DEVICE void operator()(Params const& params, char* smem_raw) { | |||
#if ! defined(__CUDA_ARCH_FEAT_SM100_ALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alihassanijr , this only covers 100a but not 100f. You could take a look at launch control header file for the 100f macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't using the family macros break builds with CTK 12.8 and earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both CUTLASS_ARCH_MMA_SM100F_SUPPORTED
and CUTLASS_ARCH_MMA_SM100F_ENABLED
are conditioned on CTK >= 12.9, so Sm100 users with CTK 12.8 will wind up with empty kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. so if < 12.9, 100a; else 100a || 100f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is CUDA_ARCH_FAMILY(1000)
just a false in 12.8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right; defined(CUTLASS_ARCH_MMA_SM100A_ENABLED) || defined(CUTLASS_ARCH_MMA_SM100F_ENABLED)
should work -- they're already conditioned on the correct CTK compiler version.
Some examples build their own kernel layer (FMHA, DistGEMM), and they must guard them according to the supported features.
Issue: #2559
Will close: #2559 #2558
CC @hwu36