Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some HIP and CUDA tests were disabled during the pulldown. #8796

Closed
KornevNikita opened this issue Mar 7, 2023 · 3 comments
Closed

Some HIP and CUDA tests were disabled during the pulldown. #8796

KornevNikita opened this issue Mar 7, 2023 · 3 comments
Labels
bug Something isn't working cuda CUDA back-end hip Issues related to execution on HIP backend.

Comments

@KornevNikita
Copy link
Contributor

See: intel/llvm-test-suite#1633

These tests are failing (time out) with #8412, so we decided to temporarily disable them to unblock the pulldown.
CUDA test may be okay, but CUDA testing workflow is broken for now, so we can't verify it.
"The AMD failures are fairly limited as they only happen at -O0 so we can fix them in the sycl branch after the merge."

@KornevNikita KornevNikita added cuda CUDA back-end hip Issues related to execution on HIP backend. labels Mar 7, 2023
@JackAKirk
Copy link
Contributor

JackAKirk commented Mar 17, 2023

Update: The offending commit that caused the hangs at -O0 is llvm/llvm-project@97ba3c2

We could just switch this mapping from -O0 to lto -O0 off in sycl mode; this fixes the failures, however there is also another upstream commit that interacts with this that hasn't been pulled down yet: https://reviews.llvm.org/D144505
I plan to investigate these issues further next week. Note however that currently all tests seem to be failing at -O0 for amd. There are also existing hip-amd xfails at default opt level that are hanging. It would be good to investigate these within this context.

@aelovikov-intel aelovikov-intel transferred this issue from intel/llvm-test-suite Mar 28, 2023
@bader bader added the bug Something isn't working label Mar 28, 2023
@JackAKirk
Copy link
Contributor

The bug from this pulldown was fixed upstream. However there is another O0 bug currently, which a proposed fix: ROCm/clr#13

At the moment you can only compile at O0 in the hip backend when using rocm 5.7. Earlier versions lead to seg faults for even an empty kernel.

@JackAKirk
Copy link
Contributor

Closing this since this issue hasn't reoccurred on latest rocm versions for a while. The fix above was for old rocm versions and it is up to amd to merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end hip Issues related to execution on HIP backend.
Projects
None yet
Development

No branches or pull requests

3 participants