Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable -tritonintelgpu-optimize-reduction-locality by default #2748

Open
victor-eds opened this issue Nov 19, 2024 · 1 comment · May be fixed by #2846
Open

Enable -tritonintelgpu-optimize-reduction-locality by default #2748

victor-eds opened this issue Nov 19, 2024 · 1 comment · May be fixed by #2846
Assignees

Comments

@victor-eds
Copy link
Contributor

victor-eds commented Nov 19, 2024

After pending issues that will be listed below are merged, this pass can be enabled by default:

Speedups after these changes in victor/perf-test (attn benchmark):
Min Speedup: 0.997086
Quartile 1: 1.000324
Median: 1.026954
Quartile 2: 1.121962
Max speedup: 1.181823
Average: 1.064171
Average if improved (>=1.05): 1.131925
#Improved (>=1.05): 12/26
#Worse (>=1.05): 0

@victor-eds victor-eds self-assigned this Nov 19, 2024
@vlad-penkin vlad-penkin added this to the 4.0 [Performance] Core milestone Nov 21, 2024
@vlad-penkin vlad-penkin added the enhancement New feature or request label Nov 21, 2024
@victor-eds victor-eds linked a pull request Nov 27, 2024 that will close this issue
@victor-eds victor-eds linked a pull request Nov 28, 2024 that will close this issue
@victor-eds
Copy link
Contributor Author

Blocked: There are some performance regressions after latest optimizations were pushed. Trying alternative approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants