-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird znver4 performance hit compared to x86-64-v4 #359
Comments
Hi, Thanks for benchmarking this. I would also check this locally. Do you use the default provided config from Cachy? I can only retest on a 9950X currently. checked also with bin-cpuflags-x86 on the compiled binary: znver4:
v4:
|
well for us matters if LTO really introduce regression with our php PKGBUILD. znver4 vs v4 diff can be on the margin of error |
Sure, LTO is the real regression, and the margin between znver4 and v4 is small, but it still is significant (and reproducible). |
Got the LTO regression also verified, disabled LTO for now, as archlinux does. |
@ptr1337 I've published the set of scripts used to make the benchmarks: https://github.com/nicelocal/microarch-benchmarks |
Thanks! i try to reprod them in my vacation :) |
Awesome! I've been thinking about it for a bit, and while I didn't look too much into it, the fact that |
Currently I pass to the makepkg flags the output of https://github.com/hartwork/resolve-march-native to get performance that is higher than both znver4 and x86-64-v4 (also self-built, not the repo versions) |
Actually. we are passing -march=native to the znerv4 repositroy. To the v4 repository we pass x86-64-v4. The main reason behind this, because -march=native on a Zen4 CPU passes shstk (shadowstack), while -march=znver4 does not pass them. |
Aha! That might be the issue then... |
From https://gitlab.archlinux.org/archlinux/packaging/packages/php/-/merge_requests/3: as can be seen by the benchmarks, the new znver4 repos actually have worse performance than the x86-64-v4 repos (both OOTB with packages from the repo, and when self-building php with or without LTO).
This seems quite strange to me, as I've looked through GCC's source code, specifically the flag selection logic for the various arches, and I've verified znver4 is a strict superset of x86-64-v4:
x86-64-v4:
znver4:
And same goes for the processor info flags:
{"x86-64-v4", PROCESSOR_K8, CPU_GENERIC, PTA_X86_64_V4 | PTA_NO_TUNE, 0, P_NONE}
{"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4, PTA_ZNVER4, M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F}
So I can't explain the weird performance hit of znver4...
Note that all tests were fully automated using docker, actually the exact same dockerfile was used, switching out just the architecture in makepkg.conf and in the repos (appropriately re-installing all packages after doing that).
The text was updated successfully, but these errors were encountered: