What are the advantages of flashinfer.sampling over the torch version? #1266
Closed
CSEEduanyu
started this conversation in
General
Replies: 1 comment
-
Flashinfer implement the sampling algorithms in a way that do not require sorting, which avoids writing to global memory. And the fused kernel reduces kernel launching overhead (the original torch implementation launches dozens of cuda kernels), which is not efficient. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What are the advantages of flashinfer.sampling over the torch version?
Beta Was this translation helpful? Give feedback.
All reactions