New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Faster kernels for quantized matmul on cuda #2060

Merged

LaurentMazare merged 7 commits into main from quantized-mm-cuda

Apr 15, 2024

Commits on Apr 14, 2024

Hook the quantized matmul cuda kernels.

LaurentMazare committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for f2dae85

Browse repository at this point
Copy the full SHA

f2dae85 View commit details

Browse the repository at this point in the history
Add a (currently broken) test.

LaurentMazare committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for 28d6b4b

Browse repository at this point
Copy the full SHA

28d6b4b View commit details

Browse the repository at this point in the history
Kernel fixes.

LaurentMazare committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for b23eed8

Browse repository at this point
Copy the full SHA

b23eed8 View commit details

Browse the repository at this point in the history
Fix by transposing the rhs matrix.

LaurentMazare committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for c81ad77

Browse repository at this point
Copy the full SHA

c81ad77 View commit details

Browse the repository at this point in the history
Add the q4-1 kernels.

LaurentMazare committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for 4c387a7

Browse repository at this point
Copy the full SHA

4c387a7 View commit details

Browse the repository at this point in the history
Proper block sizes.

LaurentMazare committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for 7a70207

Browse repository at this point
Copy the full SHA

7a70207 View commit details

Browse the repository at this point in the history
More details in the tests.

LaurentMazare committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for e609c07

Browse repository at this point
Copy the full SHA

e609c07 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster kernels for quantized matmul on cuda #2060

Faster kernels for quantized matmul on cuda #2060

Commits on Apr 14, 2024