-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Limit unrolling of all circular buffered loops to depth equal to pref…
…etch (#3627) Currently for dynamic shapes with circular buffered loops we unroll the following loops to different depths: - epilogue: stages - 1 supposedly, but often specified as `#pragma unroll` probably due to use of `ensureStaticIndexing` in the indexing pass since this loop always has constant extent. - main loop: unrolled as `#pragma unroll stages` - prologue: fully unrolled `#pragma unroll` similar to epilogue. This PR unrolls each of these loops explicitly by `#pragma prefetch` where prefetch is the circular buffering prefetch distance which is usually set to `stages - 1`. ### Motivation When using static shapes like in Fusions we receive from Thunder, I noticed that our matmul main loops are being fully unrolled (at least this is requested but the compiler likely does not fully unroll). For example I have seen this: ```c++ #pragma unroll for(nvfuser_index_t i68 = 0; i68 < 160; ++i68) ``` This particular kernel took 35 _seconds_ to compile. After this change, we will instead do the following: ```c++ #pragma unroll 3 for(nvfuser_index_t i68 = 0; i68 < 160; ++i68) ``` and the compile time is under 400 ms with no change to kernel runtime.
- Loading branch information
1 parent
6143a6b
commit e214d37
Showing
3 changed files
with
26 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters