Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with dash::Array using CSR-Pattern and odd number of units #256

Closed
BenjaProg opened this issue Jan 27, 2017 · 12 comments
Closed

Segfault with dash::Array using CSR-Pattern and odd number of units #256

BenjaProg opened this issue Jan 27, 2017 · 12 comments

Comments

@BenjaProg
Copy link
Member

Hey @dash-project/developers ,
There is some strange behaviour with my test scenario (main.cpp).
CSRBug.zip

I get a segmentation fault if I execute the test with an odd number of units and at least three.
In the loop lines 42-45 is where the segmentation fault happens.
Unfortunately I and @ddiefenthaler couldn't track it further down until now.
(Note: this fault hasn't occured on ddiefenthaler's machine.)

If I enabled the count variable and(!) the cout in the loop the fault is prevented (at my machine...).
So If there is a segfault, the line DASH_LOG_DEBUG( "Message: ", "made it." );
isn't reached anymore.

The variant_x.txt files hold some of the output possibilites on my machine.
With variant_1 and variant_3 not all processes did return and i had to send a interrupt signal.

Thanks for your help in advance!

@devreal
Copy link
Member

devreal commented Jan 27, 2017

Can you please provide us with a stack trace of the Segfault? If you run locally you can do something like:

mpirun -n 3 xterm -e gdb --args ./a.out <parameter>

@BenjaProg
Copy link
Member Author

BenjaProg commented Jan 27, 2017

okay i will provide it this afternoon, unfortunately i am on my way to university every moment

@rkowalewski
Copy link

And please provide your build settings: Which compiler, which libraries, etc.

@fuchsto
Copy link
Member

fuchsto commented Jan 27, 2017

So If there is a segfault, the line DASH_LOG_DEBUG( "Message: ", "made it." );
isn't reached anymore.

You don't know, at least not from missing output. The log operation might have been completed but a subsequent segfault prevented the flush of std::clog.

@BenjaProg
Copy link
Member Author

sorry for the delay, I gladly provide you with a stack trace but I first have to work into gdb more.
@rkowalewski: hope the attached info from the build script hold all relevant infos ;-)
additional Information_1.txt

@BenjaProg
Copy link
Member Author

BenjaProg commented Jan 31, 2017

That's the backtrace I got.
I am sorry for the screenshot! Is there a way to pipe the output of gdb in a file?
Because copy/paste wasn't possible.
gdb overview

@devreal
Copy link
Member

devreal commented Feb 2, 2017

I just gave it a try with current development and cannot reproduce the issue locally. I used 1.10.2, GCC 5.4.1 and 3 processes. Also tried with 5 processes, still succeeded. Missing something?

@BenjaProg
Copy link
Member Author

@devreal Probably not, it seems the error only appears on my local vm :/
"1.10.2" is the version number of what?
I will have a closer look this weekend.

@devreal
Copy link
Member

devreal commented Feb 3, 2017

Ahh sorry, yes. I meant OpenMPI 1.10.2, which is the version that comes packaged with my system.

@fuchsto
Copy link
Member

fuchsto commented Feb 3, 2017

@BenjaProg You could test local sizes before assigning values. Are values in local_sizes[myid] identical to array.lend() - array.lbegin()?

@BenjaProg
Copy link
Member Author

There is now a dedicated branch for this bug. A CSRPatternTest is already there and array.lend() - array.lbegin() is being checked as well, but hasn't been a problem so far.

@fuchsto
Copy link
Member

fuchsto commented Feb 13, 2017

The actual defect is in allocation in DART, see #280 and CSRPatternTest in #279

@fuchsto fuchsto closed this as completed Feb 13, 2017
@fuchsto fuchsto added this to the dash-0.3.0 milestone Feb 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants