Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dash::copy broken on team-allocations #352

Closed
devreal opened this issue Mar 31, 2017 · 1 comment
Closed

dash::copy broken on team-allocations #352

devreal opened this issue Mar 31, 2017 · 1 comment

Comments

@devreal
Copy link
Member

devreal commented Mar 31, 2017

Consider the following example code:

TEST_F(CopyTest, BlockingGlobalToLocalTeam)
{
  // Copy all elements of global array into local vector:
  const int num_elem_per_unit = 20;
  size_t num_elem_total       = _dash_size * num_elem_per_unit;

  auto& team = dash::Team::All().split(2);

  dash::Array<int> array(num_elem_total, dash::BLOCKED, team);

  // Assign initial values: [ 1000, 1001, 1002, ... 2000, 2001, ... ]
  for (auto l = 0; l < num_elem_per_unit; ++l) {
    array.local[l] = ((dash::myid() + 1) * 1000) + l;
  }
  array.barrier();

  // Local vector to store copy of global array;
  int *local_vector = new int[num_elem_per_unit * team.size()];

  // Copy values from global range to local memory.
  // All units copy first block, so unit 0 tests local-to-local copying.
  dash::copy(array.begin(), array.end(), local_vector);

  for (size_t i = 0; i < array.size(); ++i) {
    EXPECT_EQ_U(static_cast<int>(array[i]), local_vector[i]);
  }

  delete[] local_vector;
}

Executing the test on 4 units, Valgrind reports the following invalid write:

==1559== Invalid write of size 2
==1559==    at 0x4C32723: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1559==    by 0x6B06205: opal_convertor_unpack (in /home/joseph/opt/openmpi-2.1.0/lib/libopen-pal.so.20.10.0)
==1559==    by 0xCD399C5: mca_pml_ob1_recv_frag_callback_match (in /home/joseph/opt/openmpi-2.1.0/lib/openmpi/mca_pml_ob1.so)
==1559==    by 0xB8CDD09: mca_btl_vader_component_progress (in /home/joseph/opt/openmpi-2.1.0/lib/openmpi/mca_btl_vader.so)
==1559==    by 0x6AF574B: opal_progress (in /home/joseph/opt/openmpi-2.1.0/lib/libopen-pal.so.20.10.0)
==1559==    by 0x6AFA824: sync_wait_mt (in /home/joseph/opt/openmpi-2.1.0/lib/libopen-pal.so.20.10.0)
==1559==    by 0x52A5D4B: ompi_request_default_wait (in /home/joseph/opt/openmpi-2.1.0/lib/libmpi.so.20.10.0)
==1559==    by 0x52DF46C: PMPI_Wait (in /home/joseph/opt/openmpi-2.1.0/lib/libmpi.so.20.10.0)
==1559==    by 0xA35F20: dart_get_blocking (dart_communication.c:970)
==1559==    by 0x7D47DC: int* dash::internal::copy_impl<int, dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> > >(dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> >, dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> >, int*) (Copy.h:178)
==1559==    by 0x7CE128: int* dash::copy<int, dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> > >(dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> >, dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> >, int*) (Copy.h:1002)
==1559==    by 0x7C860D: CopyTest_BlockingGlobalToLocalTeam_Test::TestBody() (CopyTest.cc:858)
==1559==  Address 0x1890fb48 is 0 bytes after a block of size 56 alloc'd
==1559==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1559==    by 0xA1A79B: dash::util::LocalityDomain::init(dart_domain_locality_s*) (LocalityDomain.cc:603)
==1559==    by 0xA16946: dash::util::LocalityDomain::LocalityDomain(dart_domain_locality_s*) (LocalityDomain.cc:67)
==1559==    by 0x790260: dash::util::UnitLocality::UnitLocality(dash::Team const&, dash::unit_id<(dash::unit_scope)0, dart_local_unit>) (UnitLocality.h:62)
==1559==    by 0x7CD8BD: int* dash::copy<int, dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> > >(dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> >, dash::GlobIter<int, dash::BlockPattern<1, (dash::MemArrange)1, long>, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> >, dash::GlobPtr<int, dash::GlobStaticMem<int, dash::allocator::SymmetricAllocator<int> > >, dash::GlobRef<int> >, int*) (Copy.h:847)
==1559==    by 0x7C860D: CopyTest_BlockingGlobalToLocalTeam_Test::TestBody() (CopyTest.cc:858)
==1559==    by 0xA0967E: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/dash/dash-test-mpi)
==1559==    by 0xA0371E: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/dash/dash-test-mpi)
==1559==    by 0x9E7DCD: testing::Test::Run() (in /home/joseph/src/dash/dash/build/dash/dash-test-mpi)
==1559==    by 0x9E8765: testing::TestInfo::Run() (in /home/joseph/src/dash/dash/build/dash/dash-test-mpi)
==1559==    by 0x9E8E58: testing::TestCase::Run() (in /home/joseph/src/dash/dash/build/dash/dash-test-mpi)
==1559==    by 0x9EFF9F: testing::internal::UnitTestImpl::RunAllTests() (in /home/joseph/src/dash/dash/build/dash/dash-test-mpi)

Eventually the process goes down in a Segfault.

Interestingly, leaving out the team split (.split(2)) leads to correct results so I assume that dash::copy does not honor the team in the array.

Related: The failing ThreadsafetyTest.ConcurrentAlgorithm mentioned by @ddiefenthaler in #292

@devreal
Copy link
Member Author

devreal commented Mar 31, 2017

Never mind, the test case was faulty (allocated _dash_size * num_elem_per_unit instead of team.size() * num_elem_per_unit element.

@devreal devreal closed this as completed Mar 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants