Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate valgrind in CircleCI build target #75

Open
fuchsto opened this issue Nov 8, 2016 · 28 comments
Open

Integrate valgrind in CircleCI build target #75

fuchsto opened this issue Nov 8, 2016 · 28 comments

Comments

@fuchsto
Copy link
Member

fuchsto commented Nov 8, 2016

Valgrind should be added in at least a minimal CI target to check for memory leaks and consistency defects.

@devreal
Copy link
Member

devreal commented Nov 8, 2016

I would prefer someone already familiar with CircleCI to do the integration as I cannot give this a high priority right now. It will also be much quicker to do for the maintainers of the CI scripts. Maybe @fmoessbauer can take over this task?

@fuchsto
Copy link
Member Author

fuchsto commented Nov 8, 2016

@fmoessbauer is sick for the next days, actually it's the same situation here, it's not like CI is a core task for us either.
Also, I would prefer if the whole team, or at least more than 1-2 members, understand CI, otherwise the team in Munich degenerates to second level support. We have been new to CI two weeks ago as well.

And valgrind integration is not a mission critical feature of course, so I didn't expect a quick implementation of this issue request, we just should have it in our tasks.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 8, 2016

@devreal But, speaking of it: we should provide a documentation of our CI procedure to make contributions easier for other team members.

Documentation of CI is definitely a task for @fmoessbauer and me (@fuchsto).

See support issue #76

Once documentation is available, we should discuss the current CI procedure and define all required integration test scenarios in a brief conference call.

@fuchsto fuchsto added support and removed feature labels Nov 8, 2016
@devreal
Copy link
Member

devreal commented Nov 9, 2016

I agree that there should be more than one person understanding the CI. I had a look at some of the scripts yesterday and it was not clear to me where to start with the intergration. Once we have documentation from #76 I'm happy to look into that.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 9, 2016

Perfect, thanks!

Yes, the CI scripts need some cleanup, shot myself in the foot quite frequently myself.
We will consolidate deployment first to make things less cryptic.

@devreal
Copy link
Member

devreal commented Nov 11, 2016

I have been playing around with this today, mainly trying to get the CI and docker working locally (I still cannot build a docker container locally due to network issues). My first shot is at branch sup-75-valgrindci, please have a look at it. I am created a new docker configuration that installs valgrind and sets EXEC_WRAP, which is then used as a prefix to the test executable in dash-test.sh. Comments are welcome.

One thing that is still missing: I think it only makes sense to run valgrind under the debug configuration so there is no need to also run the other configurations. I haven't figured out how to pass this information to the scripts so far.

@fmoessbauer
Copy link
Member

fmoessbauer commented Nov 11, 2016

Nice Job, looks great! I implemented the described behavior by just skipping the main loop if container is openmpi2_vg and target is not Debug. I also extended the documentation according to this.

I also added the container to dockerhub, as otherwise it cannot be pulled. Have to re-run CI after the build of the container has finished.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 11, 2016

@devreal Awesome!

@fmoessbauer Remember to use the valgrind MPI wrappers, otherwise we will get lots of false positives from valgrind in the MPI runtime.

@fmoessbauer
Copy link
Member

Don't be happy to early. There are still some todos.

  • Currently the command is executed the following way (from CI output). Is that correct?
    mpirun --allow-run-as-root --map-by core -n 1 valgrind --track-origins=yes --leak-check=full /opt/dash/build-ci/20161111-180312/Debug/bin/dash/test/mpi/dash-test-mpi
  • We need a detection of leak messages in the logs (can be easily added in dash-ci.sh)
  • guarantee that not to much output is produces. See this build for details.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 11, 2016

Yes, the command looks okay.
The overall jango is:

$ mpirun <mpirun flags> valgrind <valgrind flags> the_actual_application <app flags>

@fuchsto
Copy link
Member Author

fuchsto commented Nov 11, 2016

... and: I added valgrind-mpi to the pre-install packages in the openmpi2_vg config.

@fmoessbauer
Copy link
Member

fmoessbauer commented Nov 11, 2016

Ok, but why the hell are there so many memory errors?

And please restart the CI after the new version of the container is available on dockerhub. The build takes ~ 30 minutes...

@fuchsto
Copy link
Member Author

fuchsto commented Nov 11, 2016

Mostly false positives, especially those

Conditional jump or move depends on uninitialised value(s)

Also, lots of reports from OpenMP and (still) MPI runtimes. Those are not related to DASH code.

Ah, apparently we should disable logging for valgrind, the sprintf in the LOG macros seem to confuse it.
Or there might be unclean code in the LOG_MESSAGE macro?

@fuchsto
Copy link
Member Author

fuchsto commented Nov 11, 2016

According to the valgrind docs
(http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap), we need:

LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-<platform>.so   \
           mpirun [args] $prefix/bin/valgrind ./my_application

@fuchsto
Copy link
Member Author

fuchsto commented Nov 11, 2016

@fmoessbauer Awwkay ... sigh ... so, Ubuntu Xenial does not provide the valgrind MPI wrappers as a package (valgrind-mpi), so we will have to build them manually.

That sucks.

The process is described in the valgrind docs:
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap

Still, this won't eliminate all false positives. From the docs:

The wrappers should reduce Memcheck's false-error rate on MPI applications.
Because the wrapping is done at the MPI interface, there will still potentially be a
large number of errors reported in the MPI implementation below the interface.
The best you can do is try to suppress them.

@fuchsto fuchsto removed this from the dash-0.3.0 milestone Nov 12, 2016
@fuchsto
Copy link
Member Author

fuchsto commented Nov 12, 2016

@devreal @fmoessbauer
Testing with from-scratch builds of valgrind and openmpi2 now.
OpenMPI has to be build with valgrind support configured.

Documenting the build steps in the DASH wiki:
http://doc.dash-project.org/internal/tools/valgrind

For valgrind support in OpenMPI2:
https://www.open-mpi.org/faq/?category=debugging#memchecker_what

@fmoessbauer
Copy link
Member

fmoessbauer commented Nov 12, 2016

@fuchsto : After disabling the hanging tests (see #63) the CI finishes. However it fails as there are many leaks reported. It is not clear to me in which order valgrind and openmpi should be build. Also the commands in the dash wiki are contradicting. I cannot build valgrind with mpi support and mpi with valgrind support at the same time as this is a circular statement.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 12, 2016

@fmoessbauer I agree, it's not like I invented it. Currently I build OpenMPI twice: first to obtain mpicc used to build valgrind, then again to link with the valgrind installation. Yup.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 12, 2016

... but even then I still get lots of false positives reported from valgrind. Let's not waste any more time into this, I don't see how we could make it useful in CI at the moment.

@fmoessbauer
Copy link
Member

Agree. We keep the container, but do not run them in CI. I will make a new branch where I pick everything useful and make a PR.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 14, 2016

Nice, we can consider this done then.

@fuchsto fuchsto closed this as completed Nov 14, 2016
@devreal
Copy link
Member

devreal commented Nov 14, 2016

Mhh, seems like I cannot follow-up with your discussions over the weekend (too much private life here I guess) and coming back on Monday finding the issue closed already...

I think we can still make good use of Valgrind to detect the use of uninitialized values inside DASH/DART in the CI tests. Using the filter file as described in https://www.open-mpi.org/faq/?category=debugging#valgrind_clean I do not see false positive reports on use of uninitialized values from OpenMPI anymore. There are still many warnings on leaked memory but we can disable the leak checks. I will also contact the OpenMPI devs to ask for the leaking window memory that is reported by Valgrind.

Just my 2 cents.

@fuchsto fuchsto reopened this Nov 14, 2016
@fuchsto
Copy link
Member Author

fuchsto commented Nov 14, 2016

@devreal That would be great. If you can provide us with a full valgrind command that yields useful output, I will gladly integrate it in CI.

@devreal
Copy link
Member

devreal commented Nov 14, 2016

I sent a report to the OpenMPI developer and am waiting for a reply. Will get back here as soon as the issue of unfiltered false positives is resolved.

@devreal
Copy link
Member

devreal commented Nov 22, 2016

In case anyone is interested, there is an ongoing discussion in a PR for OpenMPI (open-mpi/ompi#2175) to plug all memory leaks, eventually.

@fuchsto
Copy link
Member Author

fuchsto commented Nov 22, 2016

Nice! So at least it's not an issue on our side.

@fmoessbauer
Copy link
Member

There is some progress and it might be fixed in 2.0.2 according to the release notes

Fix a problem with Valgrind false positives when using Open MPI's internal
memchecker.

@devreal
Copy link
Member

devreal commented Feb 13, 2017

I will try to rerun with 2.0.2 this week to see whether we get a clean Valgrind run by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants