Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on linux inside libnvidia-glcore.so #122

Closed
c42f opened this issue Aug 27, 2016 · 10 comments · May be fixed by #126
Closed

Segfault on linux inside libnvidia-glcore.so #122

c42f opened this issue Aug 27, 2016 · 10 comments · May be fixed by #126
Labels
Milestone

Comments

@c42f
Copy link
Owner

c42f commented Aug 27, 2016

As reported by Hans De Visser, loading a pair of files (too large to attach here), then pressing the refresh button causes an intermittent segfault.

Release build backtrace:

(gdb) bt
#0  0x0000000041429eff in ?? ()
#1  0x00007fac9e8fb912 in ?? ()
   from /usr/lib/nvidia-346/libnvidia-glcore.so.346.82
#2  0x00007fac9e514958 in ?? ()
   from /usr/lib/nvidia-346/libnvidia-glcore.so.346.82
#3  0x00000000004d1abf in PointArray::drawPoints(QGLShaderProgram&, TransformState const&, double, bool) const ()
#4  0x00000000004da63f in View3D::drawPoints(TransformState const&, std::vector<Geometry const*, std::allocator<Geometry const*> > const&, double, bool) ()
#5  0x00000000004dade2 in View3D::paintGL() ()
#6  0x00007faca370c6e4 in QGLWidget::glDraw() ()
   from /usr/lib/x86_64-linux-gnu/libQt5OpenGL.so.5
#7  0x00007faca37096f9 in QGLWidget::paintEvent(QPaintEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5OpenGL.so.5
#8  0x00007faca2cf5302 in QWidget::event(QEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#9  0x00007faca2cb9c8c in QApplicationPrivate::notify_helper(QObject*, QEvent*)
    () from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#10 0x00007faca2cbee56 in QApplication::notify(QObject*, QEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#11 0x00007faca4204c2d in QCoreApplication::notifyInternal(QObject*, QEvent*)
    () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#12 0x00007faca2cefbea in QWidgetPrivate::drawWidget(QPaintDevice*, QRegion cons---Type <return> to continue, or q <return> to quit---
t&, QPoint const&, int, QPainter*, QWidgetBackingStore*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#13 0x00007faca2cc64c1 in QWidgetPrivate::repaint_sys(QRegion const&) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#14 0x00007faca2ce5e4f in QWidgetPrivate::syncBackingStore() ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#15 0x00007faca2cf5112 in QWidget::event(QEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#16 0x00007faca2cb9c8c in QApplicationPrivate::notify_helper(QObject*, QEvent*)
    () from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#17 0x00007faca2cbee56 in QApplication::notify(QObject*, QEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#18 0x00007faca4204c2d in QCoreApplication::notifyInternal(QObject*, QEvent*)
    () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#19 0x00007faca4206e07 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#20 0x00007faca4251cd3 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007faca0bc7e04 in g_main_context_dispatch ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x00007faca0bc8048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00007faca0bc80ec in g_main_context_iteration ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#24 0x00007faca425198c in QEventDispatcherGlib::processEvents(QFlags<QEventLoop:---Type <return> to continue, or q <return> to quit---
:ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#25 0x00007faca420396b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007faca420a0e1 in QCoreApplication::exec() ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x0000000000489032 in main ()

Debug and DebWithRelInfo build backtraces:



More 
8 of 191

displaz
Inbox
    x
Hans De Visser

Attachments16:49 (23 hours ago)

to me
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x000000004112feff in ?? ()
(gdb) bt
#0  0x000000004112feff in ?? ()
#1  0x00007f427b795912 in ?? ()
#2  0x0000000000000000 in ?? ()
(gdb) quit
A debugging session is active.

    Inferior 1 [process 6462] will be detached.

Quit anyway? (y or n) EOF [assumed Y]
Detaching from program: /usr/local/bin/displaz-gui, process 6462
hans@hans-pc:~$ ls
anaconda  Dev               Installs                   model.png  point_cloud.png  s2t70.png                           temp       tmp_line1.txt   _tmp_locations.txt             towers_test_NG4TQ_1_0.tree.gz
Code      Documents         matlab_crash_dump.1078-1   Music      Programs         Scripts                             Templates  tmp_line2.ewkt  toplog.txt                     Videos
Data      Downloads         model_fitted_close-up.png  out        Public           takilberan_tower0.png               test7.txt  tmp_line2.txt   towers_test_NG4TQ_1_0.json.gz
Desktop   examples.desktop  model_fitted.png           Pictures   ROAMES_git       takilberan_transposition_tower.png  test.txt   tmp_line3.txt   towers_test_NG4TQ_1_0.rdf
hans@hans-pc:~$ sudo gdb displaz-gui $(ps -A | grep displaz-gui | awk '{print $1}')
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from displaz-gui...(no debugging symbols found)...done.
Attaching to program: /usr/local/bin/displaz-gui, process 7378
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f41b8ef512d in ?? ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000041693eff in ?? ()
(gdb) bt
#0  0x0000000041693eff in ?? ()
#1  0x00007f41b5ac1912 in ?? ()
#2  0x0000000000000000 in ?? ()
(gdb) 

A bit odd - looks like the stack has been smashed in the last two, but not in Release mode.

Could be due to more leaking OpenGL state, as partly fixed in #113.

@c42f c42f added the bug label Aug 27, 2016
@nigels-com
Copy link
Collaborator

346.82 is a fairly old driver branch. It would be worth trying a newer one in case it's a driver bug.

@c42f
Copy link
Owner Author

c42f commented Aug 30, 2016

Could be, though I noticed there's a bit of a horror show in the framebuffer allocation code (left over from the Qt5 port) which surely needs cleanup.

@c42f
Copy link
Owner Author

c42f commented Aug 31, 2016

Ah, github has a new more reasonable attachment system, here's the files.

envir_220501_points_0_9.txt
envir_220501_points_1_7.txt

@c42f
Copy link
Owner Author

c42f commented Sep 1, 2016

Something super screwy is going on here. If I comment out the glBufferSubData inside drawPoints(), some position data is still getting into OpenGL. How can this be possible!?

@c42f
Copy link
Owner Author

c42f commented Sep 1, 2016

Commenting out all the code in initializeBboxGL() seems to fix the problem (after commenting out nearly everything else!), so the stale state is coming in from there.

c42f added a commit that referenced this issue Sep 4, 2016
There was a segfault with Nvidia driver 346.82, inside
PointArray::drawPoints() after reloading a pair of files with the
keyboard shortcut F5 (see #122).  Commenting out the code associated
with initializing the bounding box vertex array inside Geometry.cpp
works around the segfault.  However, it's not clear that this is the
root cause and there was actually a bug in this part of the code.

For now, refactor things to send the bounding box vertices to the GPU
every frame in the hope that this will fix things.  I suppose this is a
bit inefficient, but the main point cloud data is already sent this way.
@c42f
Copy link
Owner Author

c42f commented Sep 4, 2016

Or not. Try as I might, I can't see a problem with initializeBboxGL() - everything there seems to be valid operations on the vertex array object, and it's unbound at the end of the function too. Unless it's somehow a problem with the context not being current when the qt signal arrives, leading to wrongly calculated "position" shader location, or something equally odd.

@c42f
Copy link
Owner Author

c42f commented Sep 5, 2016

#126 has mysteriously cured a subset of problems (and cleaned a lot of things up), but a crash in the same vein as #113 is still happening, probably haven't got the root cause yet.

c42f added a commit that referenced this issue Sep 6, 2016
There was a segfault with Nvidia driver 346.82, inside
PointArray::drawPoints() after reloading a pair of files with the
keyboard shortcut F5 (see #122).  Commenting out the code associated
with initializing the bounding box vertex array inside Geometry.cpp
works around the segfault.  However, it's not clear that this is the
root cause and there was actually a bug in this part of the code.

For now, refactor things to send the bounding box vertices to the GPU
every frame in the hope that this will fix things.  I suppose this is a
bit inefficient, but the main point cloud data is already sent this way.
c42f added a commit that referenced this issue Sep 6, 2016
There was a segfault with Nvidia driver 346.82, inside
PointArray::drawPoints() after reloading a pair of files with the
keyboard shortcut F5 (see #122).  Commenting out the code associated
with initializing the bounding box vertex array inside Geometry.cpp
works around the segfault.  However, it's not clear that this is the
root cause and there was actually a bug in this part of the code.

For now, refactor things to send the bounding box vertices to the GPU
every frame in the hope that this will fix things.  I suppose this is a
bit inefficient, but the main point cloud data is already sent this way.
@c42f
Copy link
Owner Author

c42f commented Sep 9, 2016

Oh gosh, I just discovered apitrace, why didn't I know about this before!?! OpenGL debugging just got about a thousand times easier!

@c42f
Copy link
Owner Author

c42f commented Sep 10, 2016

Should be fixed in #132

@c42f c42f added this to the 0.4 milestone Sep 10, 2016
@c42f
Copy link
Owner Author

c42f commented Sep 12, 2016

Fix confirmed, closing.

@c42f c42f closed this as completed Sep 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants