Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Differences between papers and cutlass implementations about streamk algorithm #1923

Closed
CalebDu opened this issue Nov 6, 2024 · 1 comment

Comments

@CalebDu
Copy link
Contributor

CalebDu commented Nov 6, 2024

Image

In streamk paper, ¬tile_started store partial sum to global memory workspace, ¬tile_ended accumulate partial sum from global memory workspace. ‘’‘Stream-K is better able to hide the latency of inter-CTA synchronization due to the temporal skew between writers and readers when sharing partial sums.’‘’

But I read streamk implementation in cutlass. https://github.com/NVIDIA/cutlass/blob/19f51596e8be9fe87d583616466581ab5740c19d/include/cutlass/gemm/kernel/gemm_universal_streamk.h#L968C1-L982C8
The section of code shows that ¬tile_ended store partial sum to global memory workspace, ¬tile_startedaccumulate partial sum from global memory workspace. This is the exact opposite of the logic in the paper.
Image

In cutlass implementation, CTA1 tile1 will be stall because waiting CTA0 tile1 partial sum completed.
How to explain this discrepancy?

@CalebDu CalebDu changed the title [QST] question about streamk implementation in cutlass [QST] Differences between papers and cutlass implementations about streamk algorithm Nov 6, 2024
@CalebDu
Copy link
Contributor Author

CalebDu commented Nov 7, 2024

https://github.com/NVIDIA/cutlass/blob/d656afbd2a01112c0e4d90aafe0f8f78145c6585/include/cutlass/gemm/kernel/gemm_universal_streamk.h#L1063C1-L1063C75
I figure it out, sk block does reverse order from iter_end to iter_begin in cutlass implementation. so ¬tile_started accumulate partial sum from global memory workspace.

@CalebDu CalebDu closed this as completed Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant