-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate valgrind in CircleCI build target #75
Comments
I would prefer someone already familiar with CircleCI to do the integration as I cannot give this a high priority right now. It will also be much quicker to do for the maintainers of the CI scripts. Maybe @fmoessbauer can take over this task? |
@fmoessbauer is sick for the next days, actually it's the same situation here, it's not like CI is a core task for us either. And valgrind integration is not a mission critical feature of course, so I didn't expect a quick implementation of this issue request, we just should have it in our tasks. |
@devreal But, speaking of it: we should provide a documentation of our CI procedure to make contributions easier for other team members. Documentation of CI is definitely a task for @fmoessbauer and me (@fuchsto). See support issue #76 Once documentation is available, we should discuss the current CI procedure and define all required integration test scenarios in a brief conference call. |
I agree that there should be more than one person understanding the CI. I had a look at some of the scripts yesterday and it was not clear to me where to start with the intergration. Once we have documentation from #76 I'm happy to look into that. |
Perfect, thanks! Yes, the CI scripts need some cleanup, shot myself in the foot quite frequently myself. |
I have been playing around with this today, mainly trying to get the CI and docker working locally (I still cannot build a docker container locally due to network issues). My first shot is at branch sup-75-valgrindci, please have a look at it. I am created a new docker configuration that installs valgrind and sets EXEC_WRAP, which is then used as a prefix to the test executable in dash-test.sh. Comments are welcome. One thing that is still missing: I think it only makes sense to run valgrind under the debug configuration so there is no need to also run the other configurations. I haven't figured out how to pass this information to the scripts so far. |
Nice Job, looks great! I implemented the described behavior by just skipping the main loop if container is I also added the container to dockerhub, as otherwise it cannot be pulled. Have to re-run CI after the build of the container has finished. |
@devreal Awesome! @fmoessbauer Remember to use the valgrind MPI wrappers, otherwise we will get lots of false positives from valgrind in the MPI runtime. |
Don't be happy to early. There are still some todos.
|
Yes, the command looks okay. $ mpirun <mpirun flags> valgrind <valgrind flags> the_actual_application <app flags> |
... and: I added |
Ok, but why the hell are there so many memory errors? And please restart the CI after the new version of the container is available on dockerhub. The build takes ~ 30 minutes... |
Mostly false positives, especially those
Also, lots of reports from OpenMP and (still) MPI runtimes. Those are not related to DASH code. Ah, apparently we should disable logging for valgrind, the |
According to the valgrind docs LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-<platform>.so \
mpirun [args] $prefix/bin/valgrind ./my_application |
@fmoessbauer Awwkay ... sigh ... so, Ubuntu Xenial does not provide the valgrind MPI wrappers as a package (valgrind-mpi), so we will have to build them manually. That sucks. The process is described in the valgrind docs: Still, this won't eliminate all false positives. From the docs:
|
@devreal @fmoessbauer Documenting the build steps in the DASH wiki: For valgrind support in OpenMPI2: |
@fuchsto : After disabling the hanging tests (see #63) the CI finishes. However it fails as there are many leaks reported. It is not clear to me in which order valgrind and openmpi should be build. Also the commands in the dash wiki are contradicting. I cannot build valgrind with mpi support and mpi with valgrind support at the same time as this is a circular statement. |
@fmoessbauer I agree, it's not like I invented it. Currently I build OpenMPI twice: first to obtain |
... but even then I still get lots of false positives reported from valgrind. Let's not waste any more time into this, I don't see how we could make it useful in CI at the moment. |
Agree. We keep the container, but do not run them in CI. I will make a new branch where I pick everything useful and make a PR. |
Nice, we can consider this done then. |
Mhh, seems like I cannot follow-up with your discussions over the weekend (too much private life here I guess) and coming back on Monday finding the issue closed already... I think we can still make good use of Valgrind to detect the use of uninitialized values inside DASH/DART in the CI tests. Using the filter file as described in https://www.open-mpi.org/faq/?category=debugging#valgrind_clean I do not see false positive reports on use of uninitialized values from OpenMPI anymore. There are still many warnings on leaked memory but we can disable the leak checks. I will also contact the OpenMPI devs to ask for the leaking window memory that is reported by Valgrind. Just my 2 cents. |
@devreal That would be great. If you can provide us with a full valgrind command that yields useful output, I will gladly integrate it in CI. |
I sent a report to the OpenMPI developer and am waiting for a reply. Will get back here as soon as the issue of unfiltered false positives is resolved. |
In case anyone is interested, there is an ongoing discussion in a PR for OpenMPI (open-mpi/ompi#2175) to plug all memory leaks, eventually. |
Nice! So at least it's not an issue on our side. |
There is some progress and it might be fixed in 2.0.2 according to the release notes
|
I will try to rerun with 2.0.2 this week to see whether we get a clean Valgrind run by now. |
Valgrind should be added in at least a minimal CI target to check for memory leaks and consistency defects.
The text was updated successfully, but these errors were encountered: