-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve thread-support of DART/DASH #118
Comments
I will take care of the global memory allocation and team management aspects. There is a light-weight, convenient unit test framework for thread-parallel test cases called partest from the siemens/embb ecosphere that I really like. Oh, and there is Divine for explicit state model checking, but I found it only practicable for very minimal, stripped-down use cases. Even then, verification usually takes the better part of a day. (Then again, we have lots of CPU hours to spare on SuperMUC) |
[UPDATE: Added third point] I created a branch (
|
Fortunately, these issues are anything but exotic. These are classic parallel memory management problems. We don't have mandatory library dependencies and we should keep it that way, but I recommend to have a look at EMBBs concepts and methods before working on thread-safe allocation:
@devreal Ah, I already sang the EMBB song to you when we discussed task models, I remember. |
I started to work on making the local and global memory allocation in DART thread-safe. So far, I have used mutexes ( I also added With this setup, we rely on the user to not use DART from within a threaded environment if the underlying MPI library does not provide threading support, i.e., there is no explicit exclusion inside DART communication routines. Please have a look at https://github.com/dash-project/dash/compare/bug-118-threadsafety. I have not created a pull request because the documentation does not reflect the changes, yet. |
I added documentation on the updated thread-safety restrictions for the DART part and fixed some issues. In the branch, I also added documentation for the methods in From my point of view, this is all that is needed for DART (unless I missed a critical point, that is). As it stands right now, any DASH/DART operation that only involves communication is thread-safe. Local and global allocation in DART are thread-safe as well. I haven't had a closer look at the road blocks in DASH with respect to thread-safe global memory allocation but I suspect that the bookkeeping in the teams might be unsafe. @fuchsto do you want to look at that or should I have a closer look? |
Some more thoughts I had today on thread-safe global memory allocation: (using this issue to document my thoughts) The issue is more severe than I initially thought. While we can make all access to global data structures involved in global memory allocation thread-safe, team-aligned global memory allocation in the same team is inherently not thread-safe. The reason is that global memory allocation is a collective operation, which in turn is not thread-safe. At a first glance, the MPI standards mandates that collective operations on the same communicator cannot be issued by multiple threads in parallel because communication always happens between two processes, not threads. However, the problem exists even on a higher level: Assuming two threads allocate aligned global memory on the same team, the runtime system has no way of deciding which thread contributes to which allocation. So while we can ensure mutual exclusion to prevent data races on the lower levels, we cannot guarantee correctness. It will be up to the user of DASH to ensure that no two allocation on the same team happen in parallel. Similar restrictions apply to Team management and the majority of the In order to resolve this, we would need some form of asynchronous allocation process, in which allocations carry a globally unique identifier (most likely provided by the user) that is used to collect matching contributions on all units and perform the allocations sequentially. However, this can only be done reliably on allocations that are triggered by the user directly, not for temporary allocations in Maybe someone has a better thought on this :) |
Hm, actually, multi-threaded global allocation is not supported intentionally because allocation and thread-safety are conceptually unrelated (as in: orthogonal semantics). |
Mhh, I think the semantics here are different. According to https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html, it reads:
In particular, this means that you can call for example The text at the link above goes on to read:
We do not pass the team to the DASH algorithms so the user is not aware of the underlying race condition. But yes, let's schedule a telco on this. I'm in my office all week. Just shoot me an email with the times that work for you. |
Related: Execution policies as discussed in #104 |
The following features in DART and DASH should be improved to support thread-parallel access:
See also #109.
The text was updated successfully, but these errors were encountered: