Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

barrett15_74: add workaround for MacOS bug #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

proski
Copy link
Contributor

@proski proski commented Dec 29, 2024

barrett15_74: add workaround for MacOS bug

This fixes failures in barrett15_74 on MacOS 14 with Radeon Pro 560.

Keep 2 extra bits for a.d4 before loop just like it's done later inside
the loop. It may not be necessary mathematically, but it prevents an
optimization bug that turns a.d4 into a random number on MacOS.

Recognize "Radeon Pro 560" that exhibits the issue as GCN4.

@proski
Copy link
Contributor Author

proski commented Dec 29, 2024

Apparently & 0xFFFF triggers some incorrect optimization. OpenCL is buggy on MacOS as it's considered deprecated (even printf is flaky). Fortunately, this fix is simple and makes the implementation more consistent - look for another keep 2 extra bits below in the same function.

Note that there is only one other occurrence of 0xFFFF in mfakto OpenCL code, and it's not used as a mask.

@proski
Copy link
Contributor Author

proski commented Dec 30, 2024

I don't always have access to the Mac that exhibits the problem, but I tested this PR on an Intel GPU. Self-test with -st2 shows no failures, so the math should be good. The performance of the cl_barrett15_74_gs kernel increases by about 0.34% when averaged over all tested exponents. I don't know if the data is reliable, but it's encouraging to see no degradation.

This fixes failures in barrett15_74 on MacOS 14 with Radeon Pro 560.

Keep 2 extra bits for a.d4 before loop just like it's done later inside
the loop. It may not be necessary mathematically, but it prevents an
optimization bug that turns a.d4 into a random number on MacOS.

Recognize "Radeon Pro 560" that exhibits the issue as GCN4.
@proski proski changed the title barrett15_74: add workaround for compile bug on MacOS barrett15_74: add workaround for MacOS bug Jan 1, 2025
@proski
Copy link
Contributor Author

proski commented Jan 1, 2025

Added a small change to recognize the GPU that exhibits the problem as GCN4. Reference: https://www.techpowerup.com/gpu-specs/amd-polaris-21.g812

@proski
Copy link
Contributor Author

proski commented Jan 4, 2025

I made some tests to see if leaving another extra bit would affect any calculations. I disabled all kernels other than barrett15_74 and ran self-tests. It turns out that bit (0x10000) is always zero. In fact, the greatest value of a.d4 & 0x1ffff I observed was 0x4016. For the extra bit to affect calculations, a.d4 & 0x1ffff would have to be 0x10000 or greater, i.e. almost 4 times greater. That's a good margin of safety.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant