Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Copy of C Array to Dash Array across multiple units #347

Closed
wants to merge 5 commits into from

Conversation

rkowalewski
Copy link

Addresses #346

@rkowalewski rkowalewski self-assigned this Mar 27, 2017
@rkowalewski
Copy link
Author

rkowalewski commented Mar 27, 2017

It was not intended to submit the clang-format in this pull request. But now we have one.

@rkowalewski
Copy link
Author

rkowalewski commented Mar 27, 2017

@fuchsto My changes assume that local memory is always contiguous. So it will probably not work with patterns where local memory is fragmented. Please consider for review this special case. I am not sure if any of our copy algorithms can handle this.

Copy link
Member

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments inline.

auto out_last = out_first + num_elements;
if (num_elements < 1) return out_first;

auto nremaining = static_cast<dash::default_size_t>(num_elements);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dangerous! dash::default_size_t may be 32 bit wide while num_elements is of type size_t.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::distance cannot return size_t because it may be negative. And then I check explicitly for a non-negative number.
Apart from that I would expect that dash::default_size_t is the same as size_t if available on the underlying platform. Is there any argument against this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is ssize_t. The case I had in mind is the following: the user might decides to compile with the flag DASH_ENABLE_DEFAULT_INDEX_TYPE_INT (for whatever reason) and then copy more than 4GB into the array, which causes num_elements to be truncated into nremaining. He might even pass size_t as template parameter IndexType to the global data structure so it's a valid operation. I think using the pattern's index-type is the safest bet. But that is certainly a highly pathologic case :)

auto local_size = pattern.local_extents(team_unit_t{local_pos.unit});

auto ncopy = std::min(
std::initializer_list<dash::default_size_t>{local_size[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

dart_storage_t ds = dash::dart_storage<ValueType>(ncopy);

DASH_ASSERT_RETURNS(
dart_put_blocking(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use dart_put and a final dart_flush after the loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course you are right, thanks. I will do this. Did not think too much about this, I simply extended the already existing code.

Copy link
Author

@rkowalewski rkowalewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devreal I have considered your comment and modified it appropriately.

After having a more detailed look into it I noticed that copy_async was not modified according to the changes in the blocking variant. The approach is now a bit more elegant in my opinion. dash::copy on local ranges is now dash::copy_async on local ranges followed by an immediate wait on the returned future.

As a consequence we do temporarily not support dash::GlobPtr as input to dash::copy because we need the pattern information in GlobIter. This was a discussion in #246. However it is supported again after merging #345 which enables pointer arithmetics on dash::GlobPtr across units.

auto out_last = out_first + num_elements;
if (num_elements < 1) return out_first;

auto nremaining = static_cast<dash::default_size_t>(num_elements);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::distance cannot return size_t because it may be negative. And then I check explicitly for a non-negative number.
Apart from that I would expect that dash::default_size_t is the same as size_t if available on the underlying platform. Is there any argument against this?

dart_storage_t ds = dash::dart_storage<ValueType>(ncopy);

DASH_ASSERT_RETURNS(
dart_put_blocking(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course you are right, thanks. I will do this. Did not think too much about this, I simply extended the already existing code.

@fuchsto
Copy link
Member

fuchsto commented Mar 29, 2017

Support for strided / chunked local ranges has been requested in issue #116.

@fuchsto
Copy link
Member

fuchsto commented Mar 29, 2017

Ah, now I see your changes.

No, the implementation of dash::copy, especially in this respect, is not by accident. Isn't it exactly these kind of cheesy shortcuts we hate about some MPI implementations?

Some time ago I gave a talk on a DASH project meeting on dash::copy and why the blocking variant is not implemented as dash::async_copy + flush. The overhead is rather significant. The non-blocking variant creates dash::Future objects with lambdas, request handles are stored in a std::vector which leads to additional heap allocation, etc.
The non-blocking dash::copy_async (soon to be provided via launch policy) is only beneficial for reasonably high work loads for latency hiding. The dash::copy function interface has a dedicated benchmark suite that has been used to evaluate these tradeoffs.

In addition, the blocking algorithms in DASH should rely on specialized blocking operations provided by DART because these again can exploit the communication routines for this specific purpose - even if these are currently nearly identical.

@fuchsto fuchsto closed this Mar 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants