-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix support for Intel Compute Runtime with VectorSize > 1
The fallback implementation of amd_bitalign() triggers a bug with Intel Compute Runtime (NEO) versions from 23.22.26516.18 to 24.45.31740.9 inclusive. intel/compute-runtime#790 The bug exhibits itself as a failure to find factors in approximately half of the self-tests using barrett32 kernels. The bug affects all but the first component of the vectors, so using VectorSize=1 would fix the self-tests. Add generic_bitalign() that is always implemented using shifts. Use 64-bit shifts for Intel Compute Runtime, 32-bit shifts for other platforms. Use generic_bitalign() instead of the equivalent shifts in all cases when the destination is the same as one of the sources. Make amd_bitalign() an alias to generic_bitalign() on systems where amd_bitalign() is not available.
- Loading branch information
Showing
2 changed files
with
46 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters