-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix support for Intel Compute Runtime with VectorSize > 1 #15
base: master
Are you sure you want to change the base?
Conversation
Current status (updated):
Please don't use this PR in production unless it passes the self-test on your system! TODO:
|
e5cb251
to
4e5709f
Compare
Update:
|
4e5709f
to
fcd5f84
Compare
Update:
Confirmed that 64-bit shifts are faster than 32-bit shifts on Intel with |
src/common.cl
Outdated
// generic_bitalign emulates amd_bitalign using shifts. generic_bitalign can be | ||
// used instead of amd_bitalign if benchmarks show that it's faster. | ||
#ifdef cl_intel_subgroups | ||
// Workaround for Intel NEO that miscompiles shifts on uint vectors - use ulong instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth to let Intel know about this problem -- maybe they would like to fix it? (independently of our workaround here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I plan to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reported the issue to Intel: intel/compute-runtime#790
I'm glad I could put together a simple demo.
The breakage must have happened between versions 22.14.22890 and 23.43.27642. The precompiled binaries are for Ubuntu and I only have Fedora now, compiling intel-opencl is very time consuming, but I might give it another try later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to use Ubuntu in WSL. If turns out that 23.17.26241.22 is the last good release and
23.22.26516.18 is the first bad release. Also, the latest release, 24.48.31907.7, fixes the issue. But it's too new and most users don't have it yet.
Not WIP anymore - ready for review.
|
The fallback implementation of amd_bitalign() triggers a bug with Intel Compute Runtime (NEO) versions from 23.22.26516.18 to 24.45.31740.9 inclusive. intel/intel-graphics-compiler#358 The bug affects all but the first component of the vectors, so the self-tests would pass with VectorSize=1. For higher values of VectorSize, including the default VectorSize=2, approximately half of the self-tests fail, all in barrett32 kernels. Add generic_bitalign() that is always implemented using shifts. Use it in all cases when the destination is the same as one of the sources. If Intel Compute Runtime is detected, use 64-bit shifts in generic_bitalign(). For other platforms, keep using 32-bit shifts. Make amd_bitalign() an alias to generic_bitalign() on systems where amd_bitalign() is not available. That way, it would also expand to 64-bit shifts for Intel Compute Runtime.
|
Fix support for Intel Compute Runtime with VectorSize > 1
The fallback implementation of amd_bitalign() triggers a bug with Intel Compute
Runtime (NEO) versions from 23.22.26516.18 to 24.45.31740.9 inclusive.
intel/intel-graphics-compiler#358
The bug affects all but the first component of the vectors, so the self-tests
would pass with VectorSize=1. For higher values of VectorSize, including the
default VectorSize=2, approximately half of the self-tests fail, all in
barrett32 kernels.
Add generic_bitalign() that is always implemented using shifts. Use it in all
cases when the destination is the same as one of the sources.
If Intel Compute Runtime is detected, use 64-bit shifts in generic_bitalign().
For other platforms, keep using 32-bit shifts.
Make amd_bitalign() an alias to generic_bitalign() on systems where
amd_bitalign() is not available. That way, it would also expand to 64-bit
shifts for Intel Compute Runtime.