-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment IDs and dynamic global data structures #221
Comments
Yes, this definitely has to be fixed. Bugs resulting from this limit would be obscure to a cruel extent, it's adequate to call it sabotage. I don't know of a reason for this, and if there is, it can't be a good one. The problem with re-using segment IDs is their unambiguousness: In conclusion, to reuse a segment ID, it must first be ensured that no unit references a memory space involving this segment ID. This requirement can be reduced to safe resource reclamation and fortunately there are known good strategies to resolve this is kind of problem:
|
Memory is a bit blurry here but I think there is no particular reason why segment IDs are not re-used in the current implementation - it was just the most simple way to get a new unique ID to use a counter. I would not worry too much about global pointers or global references that might still exist and refer to memory that is now invalid or belongs to a new and different data structure. That situation is not different from standard C/C++. But we need to make sure that all units involved agree on the same segment ID and the current counting scheme guarantees that because there is a counter per team (not just a single one per unit), right? Keeping a list of used segment ids (again per team) and choosing the smallest integer not in the list or something like that should work. |
With the current implementation, units neither agree on the same segment ID during an allocation nor is there one segment ID counter for each team (the team ID is not part of the Leaving the aspect of directly copying For reusing segment IDs, I propose to push used segment IDs on a stack during deallocation and try to pop a free ID during an allocation. If the stack is empty we use the local segment ID counter. It might also be worth investigating the algorithms for whether using a |
@devreal @fuerlinger You both are right, I mixed up the concerns there. |
Some thoughts on how to solve this: the easiest solution would be to (re-)integrate the team ID into the To make this happen, the following changes would be required:
Without binding segment IDs to teams explicitly, we'd have to brute-force through O(2^16) segment IDs to find a free segment ID upon allocation. Maybe I'm missing another obvious solution, though... |
Including the team ID is probably a good idea as it makes everything else easier. An alternative would be to keep track of the max. currently used ID on each unit and then to do a max reduction across all units in the team. Although one can probably come up with scenarios where that scheme runs out of IDs quickly. For flags, I don't think we have a very compelling use case yet (correct me if I'm wrong, @rkowalewski ), just the idea that it could be useful in certain situations (maybe in the context of NVRAM). |
The DART interface provides two different routines for collective memory allocation:
As far as I can remember the original purpose of the aligned variant was to commit on a common segment ID among all units in a team, while the non-aligned version does not require this. Considering the suggested solution above raises the question if there is any use case for the non-aligned version, since all allocation will be "aligned" (common segment ID) anyway. Regarding the suggestion to move out the flags and store it internally in the segment is also more a design decision than a technical issue. From the implementation side it works for my use case as well. However, the purpose of the flags is to store additional information about the global pointer. As an example, consider permission flags (e.g. read only), the underlying memory type (RAM, HBM, NVRAM, fault-resilient memory regions, ...) and so forth. Similar to POSIX file descriptors. With the flags we get such information efficiently in DASH with bit mask operations. One could argue that this metadata refers to the whole segment and not a specific global pointer which can theoretically point to any address in the global memory space. So like in malloc we can think about a segment header in DART to store this information. Then, an appropriate DART interface to query such information based on a global pointer is required. This in turn enables to replace the flags with the teamId in our global pointer structure. Another possibility I have discussed with @devreal is to increase the global pointer to, let's say, 256 bits. Then we would neither have the limit for only 2**16 teams (which is easy to reach in large HPC clusters) nor would we need to move out the flags. It would also simplify other use cases if we can reserve 128 bits for the offset / address. However, there was probably a reason to restrict the size of a global pointer to 128 bits? @fuerlinger |
The two forms of memory allocation are
The first one is not collective, the second one is collective,
|
As for the size of global pointers I would try to keep it small just for practical reasons. I would suggest staying at 128 bits if possible. I agree that there is potential value in having flags for the global pointers and that the flags should be encoded in the global pointer itself. How about we take 8 bits from the |
As said yesterday during the phone call with @rkowalewski, most of the flags I can imagine are not specific to a If we agree to shave off 8 bit from the unit ID (2^24 ~ 16M units should be enough for now), I would prefer to use them to extent the team ID to also have 16M teams (esp. since they are not reused). This would give us enough headroom for full-machine runs and the safe use of |
I don't really see the need for huge numbers of teams (per unit), maybe I'm overlooking something. Say we have a program with 1024 units and we do a recursive binary split all the way down to teams of size 1. Then each unit is a member of only 11 teams. We also don't need a huge number of unique team IDs because team IDs don't need to be globally unique. Back during the design phase for DART we came up with a scheme for locally unique team IDs - not sure if this is what is implemented right now but the idea was that team IDs are unique only with respect to their parent team. |
@fuerlinger I just checked the implementation: yes team IDs are globally unique (there is an allreduce to find the highest available ID, lower available IDs skipped by a unit). And team IDs are not reused. If team IDs were not globally unique, the proposed implementation would suffer from the same issue as we have with segment IDs right now. A combination of team ID and segment ID would not be unique. My main concern is the locality split that as far as I understand splits a team based on a locality level. If this level was the socket level, we would hit the 64K limit pretty soon (given team IDs are unique, see above). Maybe I am missing something, @fuchsto ? |
But the allreduce is not over all units, just over the parent team, I assume? Here is an example of what should be allowed in the model that I mentioned: Say I have |
You're right, my mistake. The allreduce is only on the parent team, not globally. |
I've been massaging the team and segment related code for a while now and think I got it working, please see PR #257. As it stands right now, global pointer can now be shared among the members of the team that created the allocation (or globally in case of local allocations). So in fact, a global pointer is not really global but I think we can live with this limitation. I noticed that the Any strong feelings on:
|
Taking the discussion on flags over here again: The flags field in the However, that brings me to my actual question: how are we gonna manage the flag namespace? Is the flags field only intended for internal use? Are users meant to change flags on the fly? Do we have an enum that describes the positions of the flags including the highest available bit to avoid clashes? Since @rkowalewski is the only user of flags right now, I throw this question at you ;) How is it done atm? |
@devreal @fuerlinger @rkowalewski
I lost track: global pointers contain a team ID in your variant, right? |
That was my suggestion, yes. So far we had global unit IDs so changing to team-relative unit IDs would be a major change in the semantics of the interface since they can be accessed directly by user code. I just wanted to confirm that this is what we want. |
I'm with you on this. |
@fuchsto Never mind, since we only have 24 bit for the unit ID we cannot use typed unit IDs in |
Hm, why is that? A bitfield int : 1 ; int : 23 should be plenty for team-relazive unit ids? |
My idea was to have |
While investigating #118 I noticed that segment IDs in DART are not reused, limiting the total number of team-aligned global memory allocations to 2^15 (32k). At the same time, I noticed that some DASH algorithms allocate short-lived global data structures, see dash::accumulate as an example. I haven't checked where else this is the case.
Consider long-running real-world applications with time-stepping loop iterations beyond 64k, each loop iteration using the convenience of the dash algorithms. We will easily run into issues there, not only because the current implementation in DART does not check for overflows of segment IDs.
This might not be relevant for the 0.3.0 release but we need to track this. Is there any reason why segment IDs are not reused at the moment?
The text was updated successfully, but these errors were encountered: