Why software pipeline is needed since warp scheduler does all the pipeline work for us? #1482

wzhcz8902 · 2024-04-14T08:53:52Z

wzhcz8902
Apr 14, 2024

The moment I saw this doc https://github.com/NVIDIA/cutlass/blob/main/media/docs/pipeline.md, I just had this question, why do we need to implement a pipeline since warp scheduler does it for us in terms of keeping compute resources busy. Can you list some aspects that warp scheduler can not handle well in pipeline, resulting in the necessary of software pipeline?
Also, in the example at the end of the above document, two producer threads and one consumer thread are in the same warp or they refer to 3 threads in 3 warps?

hwu36 · 2024-04-24T02:37:15Z

hwu36
Apr 24, 2024
Maintainer

@IonThruster

3 replies

wzhcz8902 Apr 26, 2024
Author

@thakkarV Hello, can you help me figure out this question?

thakkarV Apr 26, 2024
Collaborator

I'm not sure I understand your question. Software pipelining to tolerate global memory latency isn't something new. We've done two stage kernels since Volta. The only thing different with newer architectures is even deeper pipelines

wzhcz8902 Apr 26, 2024
Author

@thakkarV Thanks for trying to help, I will look into the code to get more details.

IonThruster · 2024-04-26T05:41:51Z

IonThruster
Apr 26, 2024
Collaborator

There are a few places where the term "Pipeline" occurs in CUTLASS - all related to software techniques. One is the "Global Memory -> Shared memory software pipelining strategy" used to hide memory latency, the other is the Asynchronous Pipeline classes which help handle this software pipelining strategy (this is the link shared in the original question) and finally there are also Multi-Stage pipelined kernels which refers to how the kernel's MMA operations are pipelined. The overarching goal of all these software techniques is to maximize throughput of MMA.

As for the warp scheduler, that is a hardware level optimization for latency hiding - one CUTLASS very much exploits as well by means of having multiple threads ready to issue loads or store or MMA operations. That said, often times - it is not sufficient to maximize throughput. Hope that helps.

8 replies

wzhcz8902 Apr 26, 2024
Author

And I think my next question is what is the guideline when to use software pipeline.

IonThruster Apr 26, 2024
Collaborator

We always recommended to use some level of software pipelining for GEMMs and Convolutions in order to maximize MMA throughput. @hwu36 can correct me if otherwise.

wzhcz8902 Apr 26, 2024
Author

Another question I have is, for tasks that are bounded by global<->shared memory accesses, deep pipelines might not be needed. For example in the cases of flash attention, these gemms are usually memory bound, may not benefit much from deep pipelines. Can you comment on that?

IonThruster Apr 26, 2024
Collaborator

If you know that your algorithm is bound by memory bandwidth, then the depth of the pipeline can be tweaked just enough to maximize it. Can it be completely avoided depends on the nature of the memory subsystem capability and the number of such parallel workers in the GPU.

wzhcz8902 Apr 28, 2024
Author

One difference I noticed between warp scheduler pipeline and software pipeline is that, the former deals with the concurrency of memory accesses and computation between different warps, while the later deals with the concurrency in the same warp. In the case where there is only one warp allocated in each of the four sections in a SM, software pipeline does help the performance by increasing the concurrency in a single warp. More SMs are added in the newer generations of GPUs, which makes the low occupancy of SMs more severe, thus software pipeline is more needed to maximize the throughput, I guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why software pipeline is needed since warp scheduler does all the pipeline work for us? #1482

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why software pipeline is needed since warp scheduler does all the pipeline work for us? #1482

wzhcz8902 Apr 14, 2024

Replies: 2 comments · 11 replies

hwu36 Apr 24, 2024 Maintainer

wzhcz8902 Apr 26, 2024 Author

thakkarV Apr 26, 2024 Collaborator

wzhcz8902 Apr 26, 2024 Author

IonThruster Apr 26, 2024 Collaborator

wzhcz8902 Apr 26, 2024 Author

IonThruster Apr 26, 2024 Collaborator

wzhcz8902 Apr 26, 2024 Author

IonThruster Apr 26, 2024 Collaborator

wzhcz8902 Apr 28, 2024 Author

wzhcz8902
Apr 14, 2024

Replies: 2 comments 11 replies

hwu36
Apr 24, 2024
Maintainer

wzhcz8902 Apr 26, 2024
Author

thakkarV Apr 26, 2024
Collaborator

wzhcz8902 Apr 26, 2024
Author

IonThruster
Apr 26, 2024
Collaborator

wzhcz8902 Apr 26, 2024
Author

IonThruster Apr 26, 2024
Collaborator

wzhcz8902 Apr 26, 2024
Author

IonThruster Apr 26, 2024
Collaborator

wzhcz8902 Apr 28, 2024
Author