Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to dash::accumulate #216

Open
devreal opened this issue Dec 21, 2016 · 12 comments
Open

Improvements to dash::accumulate #216

devreal opened this issue Dec 21, 2016 · 12 comments

Comments

@devreal
Copy link
Member

devreal commented Dec 21, 2016

The current implementation of dash::accumulate has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...

Also, dash::accumulate should use dart_accumulate whenever possible.

Discovered while working on #212.

@fuchsto
Copy link
Member

fuchsto commented Dec 21, 2016

The current realization of dash::accumulate is somewhat irritating, yes. Oh yes.

First things first:

  • dart_accumulate semantically corresponds to MPI_Accumulate: C[i] = reduce(A[i], B[i])
  • dash::accumulate semantically corresponds to std::accumulate: Acc = reduce([AccInit] v A[s...e])

Consequently, dart_accumulate is not a semantically suitable backend for dash::accumulate.
C++ and MPI differ in their definitions of "accumulate" (C++ uses the term correctly, however std::transform should be named std::map, but can't, because of the container. It's a bloody mess).

Semantics of dart_accumulate correspond to dash::transform (~ std::transform), and it is in fact used as its communication backend:

https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Transform.h#L252

The best I could do for this clash of specs was to document it:

Standardese aside, the current implementation is broken.
I found another defect: the initial value is accumulated in every local partial result, but should only be accumulated once, at unit 0. Currently implemented behavior is:

// (a, b] = { 1,2,3, 4,5,6 }
// --> accumulate(a,b,+,10) = 31

unit 0:                          | unit 1:
=================================+==================================
dash::accumulate(a, b, +, 10);   | dash::accumulate(a, b, +, 10);
--> local result: 10 + 1 + 2 + 3 | --> local result: 10 + 4 + 5 + 6
    = 16                         |     = 25
----------------------------- barrier ------------------------------
result = 16 + 25 = 41 != 31

@devreal
Copy link
Member Author

devreal commented Dec 22, 2016

Thanks for the enlightenment! Then maybe accumulate_impl should not reside in Accumulate.h since the semantics are different? Who owns the code?

@fuchsto
Copy link
Member

fuchsto commented Dec 22, 2016

Emm. You moved it there? Or @fmoessbauer did and you reviewed it:
71d57b7

I implemented the algorithm stuff and placed accumulate_impl in Transform.h
... but we all own the code!

@devreal
Copy link
Member Author

devreal commented Dec 22, 2016

I know. Let me rephrase: Who will fix it? ;)

@fuchsto
Copy link
Member

fuchsto commented Dec 22, 2016

Oh :D
I will, my original implementation is broken anyways.

@fuchsto
Copy link
Member

fuchsto commented Dec 22, 2016

... aaaand also: as it is defined in DASH, accumulate_impl should be named transform_impl, because that's the DASH semantics.

@fuchsto
Copy link
Member

fuchsto commented Jan 25, 2017

Resolved in PR #237

@fuchsto fuchsto closed this as completed Jan 25, 2017
@devreal
Copy link
Member Author

devreal commented Feb 2, 2017

Reopening this issue after accidentally looking at the code for dash::transform: The way I understand the semantics of dash::transform is that it performs pair-wise operations on two input ranges into one output range. This is done using dart_accumulate, a wrapper around MPI_Accumulate. The latter does exactly what is says: it accumulates a single input range into the single output value. So, this is exactly what dash::accumulate does but is far from the semantics of dash::transform. Am I missing something here?

@devreal devreal reopened this Feb 2, 2017
@fuchsto
Copy link
Member

fuchsto commented Feb 3, 2017

@devreal
MPI_Accumulate does not reduce a sequence to a single value, if I get the docs right:

https://www.mpich.org/static/docs/v3.1/www3/MPI_Accumulate.html

It performs an element-wise atomic update operation, similar to a put, but reduces origin and target data into the target buffer. This is why there is a parameter target_count. If the input sequence was reduced to a single value, it would always be 1 and therefore irrelevant.

So that's why dash::accumulate (DASH -> STL semantics) follows std::accumulate and dart_accumulate (DART -> MPI semantics) implements dash::transform / std::transform.

The synopsis described in these lecture slides might be more helpful than the vague definition in the MPI standard docs:

http://wgropp.cs.illinois.edu/courses/cs598-s16/lectures/lecture34.pdf

@devreal
Copy link
Member Author

devreal commented Feb 6, 2017

Right, got the docs wrong. Sorry for that.

@devreal devreal closed this as completed Feb 6, 2017
@devreal
Copy link
Member Author

devreal commented Feb 13, 2017

I have to reopen this once again because the implementation of dash::accumulate is still broken, which was the actual reason for this issue. Please see my first paragraph in the initial description:

The current implementation of dash::accumulate has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...

I guess we got completely side-tracked by the discussion of the semantics of MPI_Accumulate etc but the original problem still persists.

EDIT: as a side-node, there is another problem: the definition of result as auto will lead to really interesting results when used with floating point values...

EDIT2: why not have dash::accumulate get the result to all units in the team and introduce dash::accumulate_at (or an additional unit paramter to dash::accumulate) which accumulates to a single unit. Also, dash::accumulate can make use of dart_allreduce and dart_reduce for basic types.

@devreal devreal reopened this Feb 13, 2017
@fuchsto
Copy link
Member

fuchsto commented Feb 24, 2017

Will be fixed along with introduction of execution- and launch policies, see PR #300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants