-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[Bug Fix]Bypass launch grids for SM120 Kernel with SM90 Mainloop & SM100 TileScheduler #2865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@Junkai-Wu I believe the current solution is only temporary, and in the future we should add |
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
Outdated
Show resolved
Hide resolved
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
Outdated
Show resolved
Hide resolved
Agreed. @HydraQYH Do you think if you can do the corresponding changes in this PR? I think just an empty const function with false return should be OK. |
… SM100 TileScheduler.
@Junkai-Wu Thank you for your suggestion, I will do it as soon as possible. |
42d17b3 to
4ece099
Compare
|
@Junkai-Wu I rebase code and just add There are no compilation errors anymore. And this PR is ready for review. |
|
@HydraQYH The changes look much cleaner now. Thanks for the quick action. Leave minor comments above. |
|
@HydraQYH I ran the internal pipeline with these changes and got a timeout issue of this unit test: https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/sm120_tensorop_gemm/CMakeLists.txt#L51 The issue disappeared when I added the macro back on the |
|
@Junkai-Wu Sorry, I don't have an SM120 device, so all I can do is compile. However, I noticed that removing this macro in Pingpong causes the program to return immediately sometime: cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp Lines 803 to 814 in d4e16f5
I think this radical approach may be risky, and may result in some semaphores not being released yet. In contrast, other changes will not affect the program's execution. I've reverted the changes to the early stop part. Could you try this? |
Follow: #2719. Some SM120 kernels will use SM90 Cooperative/Pingpong Mainloop with SM100 TileScheduler. SM100 TileScheduler do not have
is_last_tilefunction. This can cause compilation errors when we remove unnecessary conditional compilation.In this PR, If SM120 blockscaled is detected at compile time, then bypass
launch_dependent_grids.