Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU JIT compiler #453

Merged
merged 2 commits into from
Aug 5, 2024
Merged

AMDGPU JIT compiler #453

merged 2 commits into from
Aug 5, 2024

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Aug 4, 2024

🔥 🔥 🔥

This adds an end-to-end LLVM IR -> AMD GPU JIT compiler.
ctt_amdgpu

The good news is that AMD GPUs support vectorized add-with-carry. The bad news is that unlike Nvidia GPUs, you cannot use inline assembly to guarantee it so you need to cajole the compiler into producing those:

Another good news is that the device function is properly vectorized without needing to use tricks like __forceinline__ or "Scalable Vector" types in LLVM.

@mratsim mratsim added the enhancement :shipit: New feature or request label Aug 4, 2024
@mratsim mratsim merged commit 1e34ec2 into master Aug 5, 2024
24 checks passed
@mratsim mratsim deleted the amdgpu branch August 5, 2024 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement :shipit: New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant