What are the advantages of flashinfer.sampling over the torch version? #1266

CSEEduanyu · 2024-08-30T06:34:18Z

CSEEduanyu
Aug 30, 2024

What are the advantages of flashinfer.sampling over the torch version?

yzh119 · 2024-09-06T09:42:57Z

Flashinfer implement the sampling algorithms in a way that do not require sorting, which avoids writing to global memory.

And the fused kernel reduces kernel launching overhead (the original torch implementation launches dozens of cuda kernels), which is not efficient.

0 replies