Replies: 2 comments 11 replies
-
There are a few places where the term "Pipeline" occurs in CUTLASS - all related to software techniques. One is the "Global Memory -> Shared memory software pipelining strategy" used to hide memory latency, the other is the Asynchronous Pipeline classes which help handle this software pipelining strategy (this is the link shared in the original question) and finally there are also Multi-Stage pipelined kernels which refers to how the kernel's MMA operations are pipelined. The overarching goal of all these software techniques is to maximize throughput of MMA. As for the warp scheduler, that is a hardware level optimization for latency hiding - one CUTLASS very much exploits as well by means of having multiple threads ready to issue loads or store or MMA operations. That said, often times - it is not sufficient to maximize throughput. Hope that helps. |
Beta Was this translation helpful? Give feedback.
-
The moment I saw this doc https://github.com/NVIDIA/cutlass/blob/main/media/docs/pipeline.md, I just had this question, why do we need to implement a pipeline since warp scheduler does it for us in terms of keeping compute resources busy. Can you list some aspects that warp scheduler can not handle well in pipeline, resulting in the necessary of software pipeline?
Also, in the example at the end of the above document, two producer threads and one consumer thread are in the same warp or they refer to 3 threads in 3 warps?
Beta Was this translation helpful? Give feedback.
All reactions