-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segmentation fault with numpy on POWER9 (only) when using FlexiBLAS #17
Comments
Can you provide the backtrace with debug information? How does it look like in valgrind? |
Backtrace with debug info:
I'll look into valgrind too. |
@grisuthedragon No segmentation fault when running via Valgrind it seems (though a bunch of unrelated " |
That's weird. I try to compile FB + Numpy on my power system asap. |
To quickly trigger the segfault, you can use |
I tried this too on a real ppc machine and the minimal reproducer for "issues" I got is I also see messages in stderr:
Those are from the numpy xerblas error handler and I guess those are a good hint on to the real problem |
More minimal reproducer: I suspect a stackoverflow due to GCC misoptimizing OpenBLAS which becomes apparent by FlexiBLAS as FlexiBLAS uses a the stack to save a register which gets overwritten by the bug. I reported this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799 |
@Flamefire |
The IBM compiler guys are looking into this. It seems to be indeed a compiler issue since GCC 7. So I'd say this can be closed as there is nothing short of providing a better error message that can be done here |
@Flamefire Any updates on this? |
Small update here from our side: we've side-stepped this problem by compiling OpenBLAS with |
The GCC developers determined this a bug in the usage related to the Fortran calling convention:
|
@Flamefire For FlexiBLAS I will do some tests and, if successful, integrate it in the next release. |
I'm seeing a
Segmentation fault
when running thenumpy
1.20.3 tests when using FlexiBLAS 3.0.4 with OpenBLAS 0.3.15, but not when linking to OpenBLAS 0.3.15 directly, which tells me FlexiBLAS is somehow causing the segmentation fault...I'm not seeing this problem on Intel (Haswell, Skylake X), AMD (Rome), or Arm (AWS Graviton2).
Here's a partial backtrace I obtained when running the numpy tests via
gdb
:This only happens when
numpy
is linked withFlexiBLAS
:Any ideas on what may be causing this segmentation fault?
I tried using
ulimit -s unlimited
(default is8192
on that system), no change.After
export FLEXIBLAS=netlib
to make FlexiBLAS use the fallbacknetlib
backend, the segmentation fault doesn't happen either...The text was updated successfully, but these errors were encountered: