Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dash::Array does not support teams properly #125

Closed
fmoessbauer opened this issue Nov 19, 2016 · 15 comments
Closed

dash::Array does not support teams properly #125

fmoessbauer opened this issue Nov 19, 2016 · 15 comments

Comments

@fmoessbauer
Copy link
Member

fmoessbauer commented Nov 19, 2016

If using the array constructor with a team other than dash::Team::All(), the array is only allocated for the first (aka team.position() == 0) team. In all other teams, the array is not allocated.

See this unit test in bug-125-array-team as showcase.

@fuerlinger
Copy link
Contributor

It looks like the problem is the absolute dash::myid() in GlobMem.h used to construct the local pointers. When working with global-view indices it actually seems to work alright.

@fuchsto
Copy link
Member

fuchsto commented Nov 19, 2016

@dash-project/developers

Just some thoughts:
I'm wondering if dash::myid() and dash::barrier() etc. should really behave the way they do at the moment.

If they have global reference, then they are actually not that useful. No algorithm or data structure could use them because they refer to a global, possibly disjoint, set of units.
When would an algorithm / container / whatever actually be interested in the global team hierarchy?

For example, all containers use dash::Team::All() as default initializer for the associated team.
But this means that users have to change their usage of the container interface (specifying a team explicitly or not) depending on the state of the active team (split or not).
Whe should at least provide something like dash::Team::Local() so default teams for containers
have stable semantics.

The interface design I know from several places is: make interfaces agnostic from unrelated states
and maintain semantics if possible.

So when a team engages in a collective algorithm, the library interface provided to the algorithm
should always be the same. To the algorithm (or container, or whatever), there should be no difference
between the Team::All() case and the child team case. When a team is split, it creates its own universe.

Requesting the global state (dash::Team::All() or team.parent() for example) should be an explicit operation.
Currently it is an explicit operation to request the default, the "typically useful and less surprising" behavior.

@fuerlinger
Copy link
Contributor

dash::barrier() is just shorthand for the more long-winded dash::Team::All().barrier(), same for dash::myid(). It is the right and natural thing to do for applications that don't construct teams explicitly, which might be the majority.

For applications that use explicit teams, things are more complex of course. A unit can be part of an arbitrary number of teams and therefore there is no alternative sensible default team a collective operation should refer to (there is no notion of a single active team, there can me multiple active teams in the sense that the Team object has been successfully constructed). There is no way around explicitly specifying the team in those cases. DASH containers are always allocated over a team and therefore container related operations can always choose the right team (as e.g., array.barrier() already does).

For DASH algorithms things might be more tricky of course because they work with global iterators and the team behind a global iterator might not be obvious (to the developer).

@fmoessbauer
Copy link
Member Author

I agree that the current behavior is unclear and misleading. However for me this is a matter of documentation and not implementation / behavior.

There are several good reasons why the containers belong to team all if not specified. For example we currently do not provide any operation to join teams. Furthermore (as @fuerlinger stated) each unit can be part of an arbitrary number of teams. They might even be on the same tree level.

IMO the user is in charge of specifying which team a container should belong to. However all algorithms provided by dash should work on the team, the memory belongs to. For me this is the "natural" behavior.

@fuchsto
Copy link
Member

fuchsto commented Nov 19, 2016

A unit may be part of an arbitrary number of teams, right, but these are hierarchical.
I would just like to have a counterpart to Team::All() (the top level team) like Team::Local() (child team at current lowest level).

@fuchsto
Copy link
Member

fuchsto commented Nov 19, 2016

@fmoessbauer Part of two teams at the same level? How so? A split is a set operation.

@fmoessbauer
Copy link
Member Author

I did not verify that, but I guess the following creates such case:

auto team1 = team_all.split(2);
auto team2 = team_all.split(3);

For me it is not clear, how cases like this can be mapped to a single tree.

@fuchsto
Copy link
Member

fuchsto commented Nov 19, 2016

I don't know, this use case just doesn't look right to me:
(edit: it looks right, but it shouldn't)

dash::Array<int> arr_outer(1024);

auto my_child_team = dash::Team::All().split_somehow();

if (my_child_team.position() == 0) {
   // calls dash::Team::All()::barrier() at lower level in global allocation
   // that is not reached by units in other child team
   dash::Array<int> arr_inner(512);
} else {
   // ...
   my_child_team.barrier(); // good luck with that
}

@fuchsto
Copy link
Member

fuchsto commented Nov 19, 2016

@fmoessbauer
Ah, sure, that's legal.
Yup, you are right, a "local" team is not well-defined. Okay, Then I'm okay with the way things are.

Perhaps there is another way to make it harder to shoot yourself in the foot like in my example above.

@fuerlinger
Copy link
Contributor

I'd say overall the details of team hierarchies are still up for discussion and we can see what makes the most sense. I think in the work of Amir Kamil they actually had a concept of lexically scoped teams for Titanium and then other things are possible / make sense than in our case with C++ but we can investigate further.

In terms of multiple child-teams I'm not sure what is currently implemented but at some point the idea was as follows:

Each team can only have one single sub-team, but teams can be cloned (same underlying DART team id, resources etc.) just a cloned object on the C++ side. So to support two sub-teams of parent, parent has to be cloned first.

The main motivation for this was that it leads to simple stacks for team hierarchies and navigating up/down the team hierarchy is always well defined and trivial. So for a given team it would then make sense to ask "what's the team at the most refined / most local level?" As well as "what's the team two levels down from me?", etc.

@fuchsto
Copy link
Member

fuchsto commented Nov 19, 2016

@fuerlinger I will need some minutes with you and a whiteboard for this, but if there is a so-called containment hierarchy (as in DART locality domains: a unit can be mapped to any domain, but exactly one at any time), then I would define dash::Team::Local() (or Leaf() ...) and use it as default in containers.

Didn't you define the HView concept for this already?

Application developers that do not explicitly construt teams would not notice any difference and we still could fix some otherwise counter-intuitive use cases.

@devreal
Copy link
Member

devreal commented Nov 19, 2016

I think it is dangerous to assume that there is a containment hierarchy. As was already stated, a unit can be part of multiple teams and there does not have to be a single Leaf team to rely on. Assuming an arbitrary team as a leaf team and working with this in DASH algorithms counter intuitive. DASH container accept a team as argument on construction and DASH algorithms should work on the teams of the data they are called on, not assuming anything else. Just my 2 cents :)

@fuchsto
Copy link
Member

fuchsto commented Nov 20, 2016

Sure, but that is unrelated. I was talking about default teams for containers.

@fmoessbauer
Copy link
Member Author

fmoessbauer commented Nov 20, 2016

Unfortunately this is still not fixed. For unit tests see PR #135. In fact the old implementation in GlobMem.h was correct, but the bug was in Array.h, because there a relative team id is explicitly passed to GlobMem.lbegin. For clarification see issue #136.

In branch bug-125-arary-team the unit test passes, but it seems that currently a unit cannot be part of multiple teams. Hence only a single split test can be run per execution. As we currently do not have something like team join, split features cannot be unit tested (see issue #137).

@fmoessbauer
Copy link
Member Author

Bugs are fixed now. The dash::Team discussion can be continued in #137.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants