-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread-safety of DART functions #109
Comments
Thread-safe by default without any restriction on safe operations whatsoever is a bad idea, that's why the STL doesn't do it.
... does not specify if this refers to function calls on shared resources. I think that is what this spec is about. There must not exist any hidden shared states (global vars etc.) This also would correspond to the STL's understanding of thread safety in containers. Here's the meat:
Note that the second part
... refers to the responsibility of the programmer, not the STL. |
@fuerlinger @knuedd @colinglass @jgraciahlrs |
I'm not particularly worried about locks on group manipulation routines. Team creation will be slow anyhow and if you are doing it too often, then you are doing something wrong. However, data access operations should certainly be as fast as possible and we should (and can) avoid locks. Similar to STL, DASH datastructures are not concurrent data structures - that's a whole different story that we can look into eventually I have not looked into this too closely but it looks like the STL allows concurrent access to container elements as long as different threads access different data elements. http://en.cppreference.com/w/cpp/container I think a similar approach would be great (and natural) for DASH. However, even this approach requires that we use MPI_Init_thread(THREAD_MULTIPLE), since several threads might concurrently issue MPI_Put and MPI_Get operations. As far as I know THREAD_MULTIPLE can be significantly slower than THREAD_SINGLE for many MPI implementations and so that might be a bit of a problem. So maybe the cleanest solution would be to give the user the choice of threading model, i.e., dash_init(THREAD_MULTIPLE) vs. dash_init(THREAD_SINGLE). |
Yes, concurrent writes to independent container elements are valid, but that's the same thing: P.S.: @fuerlinger Fixed a funny typo in your link. |
Hm, it's a bit of a shame, really. This use case is a prime example for STL execution policies: |
This is currently the way I'm doing it on my tasking branch, except that the function is called From what I can see in the code, most functions are re-entrant on the DART level and can be called by multiple threads. However, at the moment there is some global state that is maintained for teams and segments so any function that creates a new team or allocates a segment is not thread-safe. Since these shared objects are not visible to the programmer they cannot be regarded thread-safe with respect to the definition given by @fuchsto. I also don't see a way to get rid of this hidden state without major redesign of the API. If we want to keep the promise of thread-safety for the upcoming release I can guard these shared objects through locks, which should be easily doable. |
I don't even see a way to make the state of teams a non-global state even with drastic redesigns. Global allocation in DART must be refactored for thread safety, but it should be refactored anyways. |
I agree that team management operations are not required to be thread-safe although one might think about initializing different parts of the code in parallel using threads (we all know this kind of users, right?). We need to specify which parts are meant to be thread-safe and which are not. In the future, global allocations need to be thread-safe since they can be easily hidden in some DASH functionality (e.g., dash::algorithm) that is called from within a thread-parallel region. However, we should postpone that to the next release as it will be invasive. For this release though, my main issue is the underlying MPI layer and with that the communication operations. Since we currently have no way of signaling whether the underlying MPI supports thread-multiple I see three options for the upcoming release:
Personally I am slightly in favor of 2a but I have code in place to implement 2b as well. |
Yes, totally aggree. For the upcoming release, I'd just add documentation of thread-safe/-unsafe functions and not change anything in implementation for now. |
Agreed. RFC:
I will also open a ticket to track the implementation of thread-safe global allocation after this release. |
That would work, but we can elaborate some more already.
Example:
|
@devreal Off-topic remark: we use the "compact" format in Doxygen instead of Example: /**
* Whether the communication sub-system is self-aware.
*
* Falsification test if the communication sub-system reached singularity.
*
* \returns \c 1 if the sub-system is self-aware and open about it or
* \c 0 if the sub-system is either not self-aware or decided to lie about it
*/
int dart_is_self_aware(); |
Thanks for the hint, will fix it within branch sup-109-dart-threadsafety. |
@devreal No need for a fix, |
Having given it more thought, I think it does not make sense to distinguish between read and write access in DART. You either call a DART function from multiple threads or you don't. The STL definition of thread-safety does not apply to the DART API, although it does apply to DASH of course. For DART, I think we should specify the following three:
Note that the MPI standard mandates that a normal Also note, that we might have to apply this scheme to groups of functions, i.e., even with |
@devreal sounds good to me. In the cautionary note on thread safety we should also note that local-view access to DASH data structures does not involve DART calls and can thus be safely be performed in a thread parallel way. dash::Array<int> arr(...);
#pragma omp parallel for // OK to parallelize since we're working on .local
for( auto i=0; i<arr.local.size(); i++ ) [
arr.local[i]=foo(i);
} @fuchsto The above is true for |
@fuerlinger Yes, we could even specify this as a general constraint:
However,
|
Correct, |
With the documentation updates by @devreal we can safely consider this issue resolved. |
The documentation states the following:
This is problematic for two reasons:
MPI_Init_thread
to check whether this is supported) neither do we enforce multi-threaded initialization at the moment.I propose to remove this sentence for now until we have support for tasking/multi-threading in DART in a later version. I would favor a relaxed policy, e.g., all communication operations should be thread-safe.
Related to #106
The text was updated successfully, but these errors were encountered: