-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CRC32_u64, V128_Low64, and V128_Extract64 for x86 builds #1654
base: master
Are you sure you want to change the base?
Conversation
Can this be changed to draft? This PR isn't ready, there is also |
I converted it to a draft for you. Reply here when it is ready for me to look at. |
This comment was marked as outdated.
This comment was marked as outdated.
this is sample program that verify correctness. If you run it in 64-bit with asserts enabled you'll see that alternative code is equivalent:
|
@derekmauro PR is ready. |
For some reason Release builds in x86 will hung or throw SEH errors in |
`_mm_cvtsi128_si64`, `_mm_crc32_u64`, and `_mm_extract_epi64` intrinsics are available only when building for x64. In order not to disable crc32 optimizations for 32-bit builds, equivalent code is implemented using intrinsics that are available when targeting 32-bit builds.
I'm convinced that MS compiler has some sort of bug: https://developercommunity.visualstudio.com/t/Compiler-produces-bad-code-in-x86-releas/10642810 |
It looks like MSVC confirmed this is a compiler bug and have a fix pending. The good news is that you have made MSVC better for everyone, so congrats. The bad news is that I don't think we should accept this if the compiler is going to get is wrong. We probably need some preprocessor guarding of the code for the buggy compiler versions. |
Yep, quite shocking. Nothing unusual about the code, seems like some intrinsic isn't handled properly. I think the bug was introduced years ago.
yep, compiler check is needed, or the code could be modified to avoid the bug.
should perhaps be changed to
because it does |
IMO, it should be taken, no point to exclude all x86 32-bit builds because MS compiler gets it wrong. Should perhaps be guarded by |
Here are the changes to
in short, if I replace The other change that should be made there in the function:
with
Or, the rest of the function could be moved back to regular registers (this doesn't fix msvc miscompile):
|
I believe with your fix for |
are you referring to add(a, a) vs a << 1? No, the change has no effect on the miscompile issue. I think this PR can be merged, but should be disabled for ms compiler: Separately, the code could be amended to avoid compilation issue with ms compiler. |
It seem that this change is needed to fix miscompile (at least it fixes the test I reported to MS):
With the change ms compiler actually generates much better code (compare it to the result that comes with _mm_set_epi64x). However, it was still not enough. Looks like crc code has too many issues. After I applied the fix above, tests still failed. When I tried to debug, it appears that the CPU detection code is... not sure how to say, but it's very questionable. It tests for some old CPUs to enable optimized code, but even my a couple of generations old CPU gets skipped and tests as |
_mm_cvtsi128_si64
,_mm_crc32_u64
, and_mm_extract_epi64
intrinsics are available only when building for x64. In order not to disable crc32 optimizations for 32-bit builds, equivalent code is implemented using intrinsics that are available when targeting 32-bit builds.