Fix arch guards in a few examples #2567

alihassanijr · 2025-08-13T15:13:22Z

Some examples build their own kernel layer (FMHA, DistGEMM), and they must guard them according to the supported features.

Issue: #2559

Will close: #2559 #2558

CC @hwu36

Some examples build their own kernel layer (FMHA, DistGEMM), and they must guard them according to the supported features. Issue: NVIDIA#2559

The cmake config already guards targets by checking if we're building for SM100A (although unsure whether it's an exact match or not). For safety, it's best to just have the arch-specific MMA guards in place.

Flamefire · 2025-08-14T09:21:15Z

I can confirm that applying this to 4.1.0 solves it as well as #2558 and fixes #2559

hwu36 · 2025-08-18T01:16:41Z

examples/77_blackwell_fmha/kernel/sm100_fmha_mla_tma_warpspecialized.hpp

@@ -507,6 +507,9 @@ struct Sm100FmhaMlaKernelTmaWarpspecialized {


  CUTLASS_DEVICE void operator()(Params const& params, char* smem_raw) {
+#if ! defined(__CUDA_ARCH_FEAT_SM100_ALL)


@alihassanijr , this only covers 100a but not 100f. You could take a look at launch control header file for the 100f macro.

Wouldn't using the family macros break builds with CTK 12.8 and earlier?

Both CUTLASS_ARCH_MMA_SM100F_SUPPORTED and CUTLASS_ARCH_MMA_SM100F_ENABLED are conditioned on CTK >= 12.9, so Sm100 users with CTK 12.8 will wind up with empty kernels.

yes. so if < 12.9, 100a; else 100a || 100f

or is CUDA_ARCH_FAMILY(1000) just a false in 12.8?

You're right; defined(CUTLASS_ARCH_MMA_SM100A_ENABLED) || defined(CUTLASS_ARCH_MMA_SM100F_ENABLED) should work -- they're already conditioned on the correct CTK compiler version.

alihassanijr added 2 commits August 13, 2025 08:08

Fix arch guards in a few examples

546dddb

Some examples build their own kernel layer (FMHA, DistGEMM), and they must guard them according to the supported features. Issue: NVIDIA#2559

Add guards to example 77

4ff8179

The cmake config already guards targets by checking if we're building for SM100A (although unsure whether it's an exact match or not). For safety, it's best to just have the arch-specific MMA guards in place.

This was referenced Aug 13, 2025

[BUG] 88_hopper_fmha_fp8 example fails to compile on some CUDA archs #2559

Open

Add missing CUDA_ARCH guard for __nanosleep in example #2558

Open

hwu36 reviewed Aug 18, 2025

View reviewed changes

Use cutlass macros

34f8e21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix arch guards in a few examples #2567

Fix arch guards in a few examples #2567

Uh oh!

alihassanijr commented Aug 13, 2025

Uh oh!

Flamefire commented Aug 14, 2025

Uh oh!

hwu36 Aug 18, 2025

Uh oh!

alihassanijr Aug 18, 2025

Uh oh!

alihassanijr Aug 18, 2025

Uh oh!

hwu36 Aug 20, 2025

Uh oh!

hwu36 Aug 20, 2025

Uh oh!

alihassanijr Aug 20, 2025

Uh oh!

Uh oh!

		@@ -507,6 +507,9 @@ struct Sm100FmhaMlaKernelTmaWarpspecialized {


		CUTLASS_DEVICE void operator()(Params const& params, char* smem_raw) {
		#if ! defined(__CUDA_ARCH_FEAT_SM100_ALL)

Fix arch guards in a few examples #2567

Are you sure you want to change the base?

Fix arch guards in a few examples #2567

Uh oh!

Conversation

alihassanijr commented Aug 13, 2025

Uh oh!

Flamefire commented Aug 14, 2025

Uh oh!

hwu36 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

alihassanijr Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

alihassanijr Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

hwu36 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

hwu36 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

alihassanijr Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!