Enable -tritonintelgpu-optimize-reduction-locality
by default
#2748
Labels
Milestone
-tritonintelgpu-optimize-reduction-locality
by default
#2748
After pending issues that will be listed below are merged, this pass can be enabled by default:
triton_gen.simdblockread
andtriton_gen.simdblockwrite
type arguments #2750-tritonintelgpu-optimize-reduction-locality
#2752Speedups after these changes in
victor/perf-test
(attn benchmark):Min Speedup: 0.997086
Quartile 1: 1.000324
Median: 1.026954
Quartile 2: 1.121962
Max speedup: 1.181823
Average: 1.064171
Average if improved (>=1.05): 1.131925
#Improved (>=1.05): 12/26
#Worse (>=1.05): 0
The text was updated successfully, but these errors were encountered: