-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Copy Accumulator to GMEM directly? #1920
Comments
I have tried something like the below but it fails to compile with // ...
copy(gmem_tiled_copy_C, accum, tDgC); |
please see #1905 |
@thakkarV Thanks for responding! My mistake for not giving enough information. My use case is actually different from that, as there is no vectorization. Here is the changed tiled copy. auto gmem_tiled_copy_C = cute::make_tiled_copy(
cute::Copy_Atom<cute::UniversalCopy<float>, float>{},
cute::Layout<cute::Shape<cute::_16, cute::_8>>{},
cute::Layout<cute::Shape<cute::_1, cute::_1>>{}); // 1x1 per thread, is this the problem? Below are the layouts. ((_2,_2),_4,_4):((_1,_2),_4,_16) //accum
--------------------------------------------
TiledCopy // gmem_tiled_copy_C
Tiler_MN: (_16,_8)
TiledLayout_TV: (_128,_1):(_1,_0)
Copy_Atom
ThrID: _1:_0
ValLayoutSrc: (_1,_1):(_0,_1)
ValLayoutDst: (_1,_1):(_0,_1)
ValLayoutRef: (_1,_1):(_0,_1)
ValueType: 32b
--------------------------------------------
gmem_ptr[32b](0x420001800) o ((_1,_1),_8,_8):((_0,_0),16,1024) // tDgC |
if you don't care about vectorization, just drop the tiled copy. Partition the gmem tensor with the tiled mma and then just call copy on the partitioned rmem tensor.
|
@thakkarV Life saver, thanks a ton! It compiles now! Honestly, I would rather use vectorization, but I am following I know I can vectorize that layout by changing |
For vectorization "how to" you can follow the other issue I linked |
What is your question?
Hello! How do you copy the accumulator registers to global memory directly?
For example, in ampere_conv_kernel.h, how would we copy
accum
directly togC
, skipping the copy to sC?Thanks!
The text was updated successfully, but these errors were encountered: