Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment IDs and dynamic global data structures #221

Closed
devreal opened this issue Dec 22, 2016 · 22 comments
Closed

Segment IDs and dynamic global data structures #221

devreal opened this issue Dec 22, 2016 · 22 comments
Assignees
Milestone

Comments

@devreal
Copy link
Member

devreal commented Dec 22, 2016

While investigating #118 I noticed that segment IDs in DART are not reused, limiting the total number of team-aligned global memory allocations to 2^15 (32k). At the same time, I noticed that some DASH algorithms allocate short-lived global data structures, see dash::accumulate as an example. I haven't checked where else this is the case.

Consider long-running real-world applications with time-stepping loop iterations beyond 64k, each loop iteration using the convenience of the dash algorithms. We will easily run into issues there, not only because the current implementation in DART does not check for overflows of segment IDs.

This might not be relevant for the 0.3.0 release but we need to track this. Is there any reason why segment IDs are not reused at the moment?

@fuchsto
Copy link
Member

fuchsto commented Dec 22, 2016

Yes, this definitely has to be fixed. Bugs resulting from this limit would be obscure to a cruel extent, it's adequate to call it sabotage.

I don't know of a reason for this, and if there is, it can't be a good one.
The only reason I can think of is that it's not trivial with respect to consistency of the global address space.

The problem with re-using segment IDs is their unambiguousness:
Reusing a segment ID finally means that the same address would reference different memory spaces in a program's runtime.
This is not a problem if there is a global consensus on the respectively used memory space, that is: if all units "aggree" that global pointers on the reused segment now refer to a different memory space.

In conclusion, to reuse a segment ID, it must first be ensured that no unit references a memory space involving this segment ID.

This requirement can be reduced to safe resource reclamation and fortunately there are known good strategies to resolve this is kind of problem:

  • If deallocation is synchronized: can use reference counting
  • If deallocation is asymmetric: hazard pointers

@fuerlinger
Copy link
Contributor

Memory is a bit blurry here but I think there is no particular reason why segment IDs are not re-used in the current implementation - it was just the most simple way to get a new unique ID to use a counter.

I would not worry too much about global pointers or global references that might still exist and refer to memory that is now invalid or belongs to a new and different data structure. That situation is not different from standard C/C++.

But we need to make sure that all units involved agree on the same segment ID and the current counting scheme guarantees that because there is a counter per team (not just a single one per unit), right? Keeping a list of used segment ids (again per team) and choosing the smallest integer not in the list or something like that should work.

@devreal
Copy link
Member Author

devreal commented Dec 23, 2016

With the current implementation, units neither agree on the same segment ID during an allocation nor is there one segment ID counter for each team (the team ID is not part of the gptr, so that would not work). In fact, the segment ID is used locally to identify all metadata of a global allocation. Having the same segment ID on all units for a single global memory allocation would make copying gptr to other units trivial. However, with allocations on different teams, finding a common segment ID can become challenging or might even turn out to be impossible if enough global allocations are in flight.

Leaving the aspect of directly copying gptr to other units aside, there is no need for common segment IDs on all units. And I also agree with @fuerlinger that dangling global references are not our problem, just as dangling references to free'd memory is not a problem of the libc.

For reusing segment IDs, I propose to push used segment IDs on a stack during deallocation and try to pop a free ID during an allocation. If the stack is empty we use the local segment ID counter.

It might also be worth investigating the algorithms for whether using a dash::Array is always necessary or whether it's more efficient to use local data and DART collective operations where possible. We should keep temporary global memory allocations in DASH functionality to a minimum.

@fuchsto
Copy link
Member

fuchsto commented Dec 26, 2016

@devreal @fuerlinger You both are right, I mixed up the concerns there.
Allocation does not care about consistency and consensus, that the containers' or algorithms' job.
So: yes, we do need the mechanisms I mentioned after all, but that's an unrelated problem.

@devreal
Copy link
Member Author

devreal commented Jan 23, 2017

Some thoughts on how to solve this: the easiest solution would be to (re-)integrate the team ID into the gptr struct. This would make it easy for a team to find consesus on a segment ID upon allocation. Re-use of segment IDs could be feasible through a team-specific stack containing segment IDs previously allocated and free'd. The combination of a team ID and a segment ID will uniquely identify the segment.

To make this happen, the following changes would be required:

  1. Replace the flags field in the gptr by the team ID. Flags can be stored internally in DART and queried through the DART API. @rkowalewski would this work for your use-case?
  2. We have to restrict the number of teams to 2^16. Not sure whether this could be a show-stopper as constructing a potentially large number of teams is quite easily done through the team split operations.

Without binding segment IDs to teams explicitly, we'd have to brute-force through O(2^16) segment IDs to find a free segment ID upon allocation. Maybe I'm missing another obvious solution, though...

@fuerlinger
Copy link
Contributor

Including the team ID is probably a good idea as it makes everything else easier. An alternative would be to keep track of the max. currently used ID on each unit and then to do a max reduction across all units in the team. Although one can probably come up with scenarios where that scheme runs out of IDs quickly.

For flags, I don't think we have a very compelling use case yet (correct me if I'm wrong, @rkowalewski ), just the idea that it could be useful in certain situations (maybe in the context of NVRAM).

@rkowalewski
Copy link

rkowalewski commented Jan 23, 2017

The DART interface provides two different routines for collective memory allocation:

  • dart_team_memalloc
  • dart_team_memalloc_aligned

As far as I can remember the original purpose of the aligned variant was to commit on a common segment ID among all units in a team, while the non-aligned version does not require this. Considering the suggested solution above raises the question if there is any use case for the non-aligned version, since all allocation will be "aligned" (common segment ID) anyway.

Regarding the suggestion to move out the flags and store it internally in the segment is also more a design decision than a technical issue. From the implementation side it works for my use case as well. However, the purpose of the flags is to store additional information about the global pointer. As an example, consider permission flags (e.g. read only), the underlying memory type (RAM, HBM, NVRAM, fault-resilient memory regions, ...) and so forth. Similar to POSIX file descriptors. With the flags we get such information efficiently in DASH with bit mask operations. One could argue that this metadata refers to the whole segment and not a specific global pointer which can theoretically point to any address in the global memory space. So like in malloc we can think about a segment header in DART to store this information. Then, an appropriate DART interface to query such information based on a global pointer is required. This in turn enables to replace the flags with the teamId in our global pointer structure.

Another possibility I have discussed with @devreal is to increase the global pointer to, let's say, 256 bits. Then we would neither have the limit for only 2**16 teams (which is easy to reach in large HPC clusters) nor would we need to move out the flags. It would also simplify other use cases if we can reserve 128 bits for the offset / address. However, there was probably a reason to restrict the size of a global pointer to 128 bits? @fuerlinger

@fuerlinger
Copy link
Contributor

The two forms of memory allocation are

  • dart_memalloc()
  • dart_team_memalloc_aligned()

The first one is not collective, the second one is collective, aligned was meant to signify that each unit contributes the same amount of memory to the overall allocation. aligned might be a misnomer but that was what we came up with back when this was first designed. The advantage of an aligned allocation is that one can compute the address of any location in the global allocation (I.e., a global pointer) by simple arithmetic. If the allocation is not aligned (same size) then each unit needs to know the size of each other's contribution and it gets more complex to compute a global pointer. dart_team_memalloc_aligned() fits well with the use case for dash::Array and dash::Matrix with a (mostly) balanced distribution of data elements.

dart_team_memalloc (without aligned) doesn't exist yet but there might be a use case for it for unbalanced containers.

@fuerlinger
Copy link
Contributor

As for the size of global pointers I would try to keep it small just for practical reasons. I would suggest staying at 128 bits if possible. I agree that there is potential value in having flags for the global pointers and that the flags should be encoded in the global pointer itself.

How about we take 8 bits from the Unit ID field and dedicate it to flags? That leaves 24 bits for unit IDs which should be plenty for every practical usage scenario at least in the nearer future.

@devreal
Copy link
Member Author

devreal commented Jan 24, 2017

As said yesterday during the phone call with @rkowalewski, most of the flags I can imagine are not specific to a gptr but specific to an allocation, i.e., a segment. Hence, there is no need to store them in a gptr. What if the user changes the flags manually? The allocation will still have the semantics of the original flags... Do we have any use-case for purely pointer-specific (allocation-independent) flags?

If we agree to shave off 8 bit from the unit ID (2^24 ~ 16M units should be enough for now), I would prefer to use them to extent the team ID to also have 16M teams (esp. since they are not reused). This would give us enough headroom for full-machine runs and the safe use of locality split operations. I'd like to avoid running into limitations in the near future because of size limitations. Full machine runs should be considered a possibility at least.

@fuerlinger
Copy link
Contributor

I don't really see the need for huge numbers of teams (per unit), maybe I'm overlooking something. Say we have a program with 1024 units and we do a recursive binary split all the way down to teams of size 1. Then each unit is a member of only 11 teams.

We also don't need a huge number of unique team IDs because team IDs don't need to be globally unique. Back during the design phase for DART we came up with a scheme for locally unique team IDs - not sure if this is what is implemented right now but the idea was that team IDs are unique only with respect to their parent team.

@devreal
Copy link
Member Author

devreal commented Jan 24, 2017

@fuerlinger I just checked the implementation: yes team IDs are globally unique (there is an allreduce to find the highest available ID, lower available IDs skipped by a unit). And team IDs are not reused.

If team IDs were not globally unique, the proposed implementation would suffer from the same issue as we have with segment IDs right now. A combination of team ID and segment ID would not be unique.

My main concern is the locality split that as far as I understand splits a team based on a locality level. If this level was the socket level, we would hit the 64K limit pretty soon (given team IDs are unique, see above). Maybe I am missing something, @fuchsto ?

@fuerlinger
Copy link
Contributor

But the allreduce is not over all units, just over the parent team, I assume?

Here is an example of what should be allowed in the model that I mentioned:

snip

Say I have dash::Team::All() and then I have a team per node in a large machine. If I then want to further split one particular node team, then that split operation should not involve the whole machine (to find the team ID), that would defeat the purpose of the hierarchical organization.

@devreal
Copy link
Member Author

devreal commented Jan 24, 2017

You're right, my mistake. The allreduce is only on the parent team, not globally.

@devreal
Copy link
Member Author

devreal commented Jan 30, 2017

I've been massaging the team and segment related code for a while now and think I got it working, please see PR #257.

As it stands right now, global pointer can now be shared among the members of the team that created the allocation (or globally in case of local allocations). So in fact, a global pointer is not really global but I think we can live with this limitation. I noticed that the gptr is probably the last place where we use untyped unit IDs and, by definition, unit IDs in gptr are global so far. Now that we have the team ID in the gptr we could think about using team-relative unit IDs in gptr and type them as such.

Any strong feelings on:

  1. Typed unit IDs in dart_gptr_t?
  2. Using team-relative unit IDs in dart_gptr_t?

@devreal
Copy link
Member Author

devreal commented Jan 31, 2017

Taking the discussion on flags over here again: The flags field in the gptr is now 8 bit wide, the unit ID has 24 bit left (as proposed by @fuerlinger). You can still set 16 bit flags but only the lower 8 bit will be stored in the gptr for quick access. Setting the flags field should be done through a call to dart_gptr_setflags. Retrieving the full 16 bit flags is done through dart_gptr_getflags. Not sure if this is intuitive but I felt that 8 bit (aka 8 flags) is a bit short.

However, that brings me to my actual question: how are we gonna manage the flag namespace? Is the flags field only intended for internal use? Are users meant to change flags on the fly? Do we have an enum that describes the positions of the flags including the highest available bit to avoid clashes? Since @rkowalewski is the only user of flags right now, I throw this question at you ;) How is it done atm?

@fuchsto
Copy link
Member

fuchsto commented Feb 3, 2017

@devreal @fuerlinger @rkowalewski

Using team-relative unit IDs in dart_gptr_t?

I lost track: global pointers contain a team ID in your variant, right?
If a unit ID is provided along with a team ID, I would expect the unit ID to be relative to the team.

@devreal
Copy link
Member Author

devreal commented Feb 6, 2017

Using team-relative unit IDs in dart_gptr_t?

I lost track: global pointers contain a team ID in your variant, right?
If a unit ID is provided along with a team ID, I would expect the unit ID to be relative to the team.

That was my suggestion, yes. So far we had global unit IDs so changing to team-relative unit IDs would be a major change in the semantics of the interface since they can be accessed directly by user code. I just wanted to confirm that this is what we want.

@fuchsto
Copy link
Member

fuchsto commented Feb 6, 2017

I'm with you on this.
I will conduct a quick code survey (read: grep --use-brain) to check which conversion is more frequent in our code right now. I assume that team-relative IDs are more prevalent than global unit IDs, so switching to (team,unit) might simplify the common case.

@devreal
Copy link
Member Author

devreal commented Feb 28, 2017

@fuchsto Never mind, since we only have 24 bit for the unit ID we cannot use typed unit IDs in gptr. I will implement the re-use of segment IDs after which we can close this issue.

@fuchsto
Copy link
Member

fuchsto commented Feb 28, 2017

Hm, why is that? A bitfield int : 1 ; int : 23 should be plenty for team-relazive unit ids?

@devreal
Copy link
Member Author

devreal commented Feb 28, 2017

My idea was to have dart_team_unit_t as a field in dart_gptr_t holding the unit ID. That won't work now that unit IDs in global pointer are 24 bit. They are always relative to the team ID stored in the pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants