-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directly specify the halo width #680
Comments
How should a direct halo width specification look like? |
Possible is a stencil spec generator for typical stencil pattern. |
Hi dhinf, Lets say, you want a two element halo width in each direction, does it mean including diagonal elements or not etc. . The thing is, you need later on the stencil spec to access the halo elements. --> Yes, it includes diagonal elements. The general idea is to separate the halo spec and stencil operator. For a case with their own stencil implementation (such as our ArrayUDF), the DASH can also works as a storage layer. Anyway, I guess it is just an API effort. I assume that current DASH infers the width from the stencil spec (but I might be wrong). So, it can just expose this to user directly. In our ArrayUDF on HDF5 file, it provides the below API for a 2D array:
Possible is a stencil spec generator for typical stencil pattern. --> This could be a possible solution but as said above, the stencil width can be exposed to user. |
Hi Goon83, The halo width is based on the stencil spec. But i think it should be possible to also provide the width. I'm interested in your use case. The halo wrapper takes the local part of a dash::matrix or narray (naming is a little bit misleading). Then it adds the halo part, which is a separate chunk of memory. How does ArrayUDF handles the halo memory? Like i said, i'm really interested how you want to use DASH with the halo wrapper. |
Hi dhinf, I came up a simple and naive example (below) to simulate the request of ArrayUDF. It creates a 2D array on 4 MPI processing, giving the total size of 8 by 8 and the tile size of 4 by 4. Each process writes a subset of the 2D array (note: without halo). Then, each process read a subset of the array with halo layer. This example code works so far. But, I think the read with halo might be better served by the array created with halo layer. Otherwise, each access to halo layer may trigger a communication and thus bad performance. Any thoughts on this? Thanks.
Bests, |
Indeed, each access on non local elements triggers a blocked communication call. So the performance is really bad. The halo wrapper takes the local part of the distributed Array and the stencil spec. Then it calculates the memory size required for the halo elements and requests new memory. At the moment it is not possible to use the subscript operator to access the halo elements. That's why you need the stencil iterator. But maybe a subscript operator based access layer for the halos can be added in future. The idea behind the stencil operator is to distinguish between inner and boundary calculation to asynchronously load the halo elements while calculating the inner part. Also, we wanted to use the NArray with halo and without halo support (therefor we decided to use separate memory for the halo elements). best |
Hi dhinf, Thanks. |
Hi Bin, here is an example: Within the iteration loop you can see how to use the iterator. If you use the feat-halo branch the comment in line 145 (slow version) isn't true anymore. The performance is more competitive in this branch. |
@dhinf Can we merge this branch into development? |
hi dhinf, Following the example code, it is a 2D matrix. The Stencil is defined as
Then, the following two lines create a iterator/operator based on the stencil_spec again.
So far, I think the Stencil should be 2D on a 2D data, i.e., accessing point at left/right/up/down of each point and do some calculation. My confusing part is the below code block.
|
I need to clean up some code. I think at the end of this week or next week we can merge the branch into development. |
This is the stencil point position within the stencilspec. you also can use -> value_at(StencilT(-1,0)) . Here the iterator tries to find the position of stencil point within the given stencil spec and then calls value_at(position). So you have an indirection and results in slightly less performance.
Depending on the stencil width the iterator for the inner part only iterates over elements where all stencil points accesses non halo elements only . The boundary iterator iterates over all center elements with at least one halo element access. The method value_at in this case tests whether the stencil point is located in the halo or the inner part. |
Hi dhinf, 1 2 3 4 5 BTW,
2)As previously discussed, the stencil spec generator may help the first case. Any idea (sample code) how to do that in DASH? Some of our use case can have over 100s layers of halo. A stencil spec generator may be a temporary resolution to it. Bests, |
Hi Bin,
works on A
works on all 0 elements I will look into your other questions next week. Best |
Hi Denis, I may rephrase my question with the below code again. As the code says, it accesses halo elements as normal (inner) element through [] operator or value_at. Having a halo layer can boost the performance. However, the code must be re-written to use HaloMatrixWrapper and boundary/inner iterator. I think it is fine to use the HaloMatrixWrapper but the boundary/inner iterator become too complexed. Could we just add a value_at operator to HaloMatrixWrapper? This operator access both boundary and inner elements with global coordinate. But the value_at decides to choose which boundary/inner iterator to use to access data. Definitely, users can explicitly control the update the boundary elements. Thanks. Bests,
|
Hi Bin, best |
Hi Denis, Bests, |
Hi @dhinf and @devreal Bests, |
Hi Bin, for(int i = 0; i < extent_first_dim; ++i) {
for(int j = 0; j < extent_second_dim; ++j) {
dst[i][j] = src[i][j] + src[i-1][j] .....
}
} i try to implement this but you need to allocate a full stencil (or at least all necessary ones), otherwise you will get wrong results. I will do it besides my current tasks, so it will take while. Hopefully not longer than the end of next week. |
Thanks dhinf, HaloMatrixWrapper m (4 by 4) with 1 layer halo, making a 5 by 5 matrix; m[0][0] access 1st element (halo) Bests, |
Hi Bin, I adapted your code to the new functionality: #include <unistd.h>
#include <iostream>
#include <cstddef>
#include <iomanip>
#include <libdash.h>
using Matrix_t = dash::Matrix<int, 2>;
using HaloMatrixWrapper_t = dash::halo::HaloMatrixWrapper<Matrix_t>;
using namespace std;
using std::cout;
int main(int argc, char *argv[])
{
dash::init(&argc, &argv);
size_t team_size = dash::Team::All().size();
dash::TeamSpec<2> teamspec;
teamspec.balance_extents();
dash::global_unit_t myid = dash::myid();
size_t num_units = dash::Team::All().size();
if (num_units != 4)
{
cout << "Please run with mpirun -n 4" << endl;
exit(-1);
}
size_t tilesize_x = 4;
size_t tilesize_y = 4;
size_t rows = tilesize_x * (num_units / 2);
size_t cols = tilesize_y * (num_units / 2);
Matrix_t matrix(
dash::SizeSpec<2>(
rows,
cols),
dash::DistributionSpec<2>(dash::BLOCKED, dash::BLOCKED),
dash::Team::All(),
teamspec);
size_t matrix_size = rows * cols;
DASH_ASSERT(matrix_size == matrix.size());
DASH_ASSERT(rows == matrix.extent(0));
DASH_ASSERT(cols == matrix.extent(1));
HaloMatrixWrapper_t wrapper(matrix, 1);
if (0 == myid)
{
cout << "Matrix size: " << rows
<< " x " << cols
<< " == " << matrix_size
<< endl;
}
if (!myid) {
cout << "Assigning matrix values" << endl;
}
auto access = wrapper.coordinate_access();
auto lranges = access.ranges_local();
cout << myid
<< "'s start = (" << lranges[0].begin << ", " << lranges[1].begin
<< "), my_end = " << lranges[0].end << ", " << lranges[1].end << " )"
<< endl << std::flush;
for (size_t i = lranges[0].begin; i < lranges[0].end; ++i) {
for (size_t k = lranges[1].begin; k < lranges[1].end; ++k) {
access[i][k] = myid;
}
}
// Units waiting for value initialization
// Read and assert values in matrix
for (size_t i = lranges[0].begin; i < lranges[0].end; ++i) {
for (size_t k = lranges[1].begin; k < lranges[1].end; ++k) {
auto value = access[i][k];
auto expected = myid;
DASH_ASSERT(expected == value);
}
}
matrix.barrier();
if (!myid)
{
cout << "Print matrix values at rank 0: " << endl;
for (size_t i = 0; i < matrix.extent(0); ++i) {
for (size_t k = 0; k < matrix.extent(1); ++k) {
int value = matrix[i][k];
cout << value;
cout << " ";
}
cout << endl;
}
}
wrapper.update();
dash::Team::All().barrier();
sleep(myid*2);
auto hranges = access.ranges_halo();
cout << myid
<< "'s start = (" << hranges[0].begin << ", " << hranges[1].begin
<< "), my_end = " << hranges[0].end<< ", " << hranges[1].end<< " )"
<< endl << std::flush;
// Read and assert values in matrix
for (auto i = hranges[0].begin; i < hranges[0].end; ++i) {
for (auto k = hranges[1].begin; k < hranges[1].end; ++k) {
cout << access[i][k] << " ";
}
cout << endl;
}
dash::Team::All().barrier();
dash::finalize();
} Best |
Hi Denis, Bests, |
Hi, @dhinf, PS: also tried to install bug-dash-halo/bug-halo-wrapper branches for test. But, none of them work on Mac. Bests,
|
Hi Bin, Please try it with this branch. best |
@dhinf thanks for information. Check-outing the feat-halo branch but it still reports below errors:
|
Hi Bin, i fixed it. After i send you the example, i changed the internal structure and missed to change that alias in stencil.h. best |
hi @dhinf Will you merge it into the dev branch ? Bests, |
Hi Bin, best |
@dhinf Could you kindly hep to merge this into development. Thanks and happy new year ! Bests, |
Hi Goon83, Happy new year and bests |
Hi Goon83, best |
it is merged now |
Following some examples to learn the halo function in DASH. One question is about how to specify the width of halo layer without StencilSpec. I found most example codes use the below steps to declare a array with halo.
After studying the API of HaloWrapper_t, it seems that the stencil_spec is the only way to ask for a halo layer.
https://codedocs.xyz/dash-project/dash/a01194.html
Thanks.
Bin
The text was updated successfully, but these errors were encountered: