Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple memory leaks in DART #140

Closed
devreal opened this issue Nov 21, 2016 · 11 comments
Closed

Multiple memory leaks in DART #140

devreal opened this issue Nov 21, 2016 · 11 comments
Assignees
Labels
Milestone

Comments

@devreal
Copy link
Member

devreal commented Nov 21, 2016

On multiple occasions memory does not seem to be free'd properly in DART. Most of the leaks seem to stem from the use of MPI groups and a missing MPI_Group_free, e.g., :

==1725== 80 (72 direct, 8 indirect) bytes in 1 blocks are definitely lost in loss record 39 of 84
==1725==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1725==    by 0x52BDEAF: ompi_group_allocate_plist_w_procs (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==1725==    by 0x52BDFC7: ompi_group_allocate (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==1725==    by 0x52BE7E8: ompi_group_union (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==1725==    by 0x52EC8EA: PMPI_Group_union (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==1725==    by 0x42B958: dart_group_union (dart_team_group.c:69)
==1725==    by 0x42BC4A: dart_group_addmember (dart_team_group.c:156)
==1725==    by 0x434471: dart__base__host_topology__update_module_locations (host_topology.c:197)
==1725==    by 0x4377CF: dart__base__host_topology__create (host_topology.c:678)
==1725==    by 0x438C1A: dart__base__locality__create (locality.c:174)
==1725==    by 0x43851B: dart__base__locality__init (locality.c:77)
==1725==    by 0x41F830: dart__mpi__locality_init (dart_locality_priv.c:21)

However, there are other leaks that are not related to MPI groups, e.g., :

==1725== 30 bytes in 1 blocks are definitely lost in loss record 20 of 84
==1725==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1725==    by 0x436CFC: dart__base__host_topology__create (host_topology.c:547)
==1725==    by 0x438C1A: dart__base__locality__create (locality.c:174)
==1725==    by 0x43851B: dart__base__locality__init (locality.c:77)
==1725==    by 0x41F830: dart__mpi__locality_init (dart_locality_priv.c:21)
==1725==    by 0x41F112: dart_init (dart_initialization.c:262)
==1725==    by 0x40EDAF: dash::init(int*, char***) (Init.cc:33)
==1725==    by 0x405A3A: main (in /home/joseph/src/dash/workspace_2dheat/2dheat/dash_array_simple)

I'm working on a fix to resolve these leaks.

@devreal devreal added the bug label Nov 21, 2016
@devreal devreal added this to the dash-0.3.0 milestone Nov 21, 2016
@devreal devreal self-assigned this Nov 21, 2016
@fuchsto
Copy link
Member

fuchsto commented Nov 22, 2016

The most interesting and most pressing is #46 actually

@devreal
Copy link
Member Author

devreal commented Nov 23, 2016

I already fixed a host of issues in bug-140-dart-memleaks, including leaking MPI_Group handles. However, the last CI run failed in CopyTest.BlockingGlobalToLocalBarrierUnaligned when running with 1 unit only. Valgrind reports the following issue:

[  RUN     ] CopyTest.BlockingGlobalToLocalBarrierUnaligned
[=   0  LOG =]               CopyTest.h :  18 | >>> Test suite: CopyTest 
[=   0  LOG =]               CopyTest.h :  29 | ===> Running test case with 1 units ... 
[=   0  LOG =]              CopyTest.cc : 328 | Elements per unit: 20 
[=   0  LOG =]              CopyTest.cc : 329 | Start index:       7 
[=   0  LOG =]              CopyTest.cc : 330 | Elements to copy:  20 
[=   0  LOG =]              CopyTest.cc : 331 | Array size:        20 
==11220== Invalid read of size 2
==11220==    at 0x4C32720: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0x841C8E: dart_get_blocking (dart_communication.c:873)
==11220==    by 0x6DF9A6: int* dash::internal::copy_impl<int, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> > >(dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, int*) (Copy.h:174)
==11220==    by 0x6DA2D3: int* dash::copy<int, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> > >(dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, int*) (Copy.h:972)
==11220==    by 0x6D1B5F: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:339)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x80A338: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EE9CF: testing::Test::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EF367: testing::TestInfo::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EFA5A: testing::TestCase::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7F6BA1: testing::internal::UnitTestImpl::RunAllTests() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x811802: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==  Address 0x112e98b0 is 0 bytes after a block of size 80 alloc'd
==11220==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0xDE33200: component_select (in /home/joseph/opt/openmpi-master/lib/openmpi/mca_osc_sm.so)
==11220==    by 0x52AD69A: ompi_win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x52E4EE1: PMPI_Win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x83A428: dart_team_memalloc_aligned (dart_globmem.c:232)
==11220==    by 0x648A6F: dash::allocator::CollectiveAllocator<int>::allocate(unsigned long) (CollectiveAllocator.h:183)
==11220==    by 0x6472F1: dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >::GlobMem(unsigned long, dash::Team&) (GlobMem.h:125)
==11220==    by 0x645695: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::allocate(dash::Pattern<1, (dash::MemArrange)1, long> const&) (Array.h:1142)
==11220==    by 0x644990: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::DistributionSpec<1> const&, dash::Team&) (Array.h:701)
==11220==    by 0x644036: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::Team&) (Array.h:711)
==11220==    by 0x6D174B: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:326)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220== 
==11220== Invalid read of size 2
==11220==    at 0x4C3272E: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0x841C8E: dart_get_blocking (dart_communication.c:873)
==11220==    by 0x6DF9A6: int* dash::internal::copy_impl<int, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> > >(dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, int*) (Copy.h:174)
==11220==    by 0x6DA2D3: int* dash::copy<int, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> > >(dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, dash::GlobIter<int, dash::Pattern<1, (dash::MemArrange)1, long>, dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >, dash::GlobPtr<int, dash::Pattern<1, (dash::MemArrange)1, long> >, dash::GlobRef<int> >, int*) (Copy.h:972)
==11220==    by 0x6D1B5F: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:339)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x80A338: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EE9CF: testing::Test::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EF367: testing::TestInfo::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EFA5A: testing::TestCase::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7F6BA1: testing::internal::UnitTestImpl::RunAllTests() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x811802: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==  Address 0x112e98b4 is 4 bytes after a block of size 80 alloc'd
==11220==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0xDE33200: component_select (in /home/joseph/opt/openmpi-master/lib/openmpi/mca_osc_sm.so)
==11220==    by 0x52AD69A: ompi_win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x52E4EE1: PMPI_Win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x83A428: dart_team_memalloc_aligned (dart_globmem.c:232)
==11220==    by 0x648A6F: dash::allocator::CollectiveAllocator<int>::allocate(unsigned long) (CollectiveAllocator.h:183)
==11220==    by 0x6472F1: dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >::GlobMem(unsigned long, dash::Team&) (GlobMem.h:125)
==11220==    by 0x645695: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::allocate(dash::Pattern<1, (dash::MemArrange)1, long> const&) (Array.h:1142)
==11220==    by 0x644990: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::DistributionSpec<1> const&, dash::Team&) (Array.h:701)
==11220==    by 0x644036: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::Team&) (Array.h:711)
==11220==    by 0x6D174B: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:326)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220== 
==11220== Invalid read of size 2
==11220==    at 0x4C32720: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0x841C8E: dart_get_blocking (dart_communication.c:873)
==11220==    by 0x644363: dash::GlobRef<int>::operator int() const (GlobRef.h:156)
==11220==    by 0x6D1BCA: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:348)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x80A338: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EE9CF: testing::Test::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EF367: testing::TestInfo::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EFA5A: testing::TestCase::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7F6BA1: testing::internal::UnitTestImpl::RunAllTests() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x811802: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x80AFEA: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==  Address 0x112e98b0 is 0 bytes after a block of size 80 alloc'd
==11220==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0xDE33200: component_select (in /home/joseph/opt/openmpi-master/lib/openmpi/mca_osc_sm.so)
==11220==    by 0x52AD69A: ompi_win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x52E4EE1: PMPI_Win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x83A428: dart_team_memalloc_aligned (dart_globmem.c:232)
==11220==    by 0x648A6F: dash::allocator::CollectiveAllocator<int>::allocate(unsigned long) (CollectiveAllocator.h:183)
==11220==    by 0x6472F1: dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >::GlobMem(unsigned long, dash::Team&) (GlobMem.h:125)
==11220==    by 0x645695: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::allocate(dash::Pattern<1, (dash::MemArrange)1, long> const&) (Array.h:1142)
==11220==    by 0x644990: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::DistributionSpec<1> const&, dash::Team&) (Array.h:701)
==11220==    by 0x644036: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::Team&) (Array.h:711)
==11220==    by 0x6D174B: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:326)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220== 
==11220== Invalid read of size 1
==11220==    at 0x4C32691: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0x841C8E: dart_get_blocking (dart_communication.c:873)
==11220==    by 0x644363: dash::GlobRef<int>::operator int() const (GlobRef.h:156)
==11220==    by 0x6D1BCA: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:348)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x80A338: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EE9CF: testing::Test::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EF367: testing::TestInfo::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7EFA5A: testing::TestCase::Run() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x7F6BA1: testing::internal::UnitTestImpl::RunAllTests() (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x811802: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==    by 0x80AFEA: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220==  Address 0x112e98b4 is 4 bytes after a block of size 80 alloc'd
==11220==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11220==    by 0xDE33200: component_select (in /home/joseph/opt/openmpi-master/lib/openmpi/mca_osc_sm.so)
==11220==    by 0x52AD69A: ompi_win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x52E4EE1: PMPI_Win_allocate_shared (in /home/joseph/opt/openmpi-master/lib/libmpi.so.0.0.0)
==11220==    by 0x83A428: dart_team_memalloc_aligned (dart_globmem.c:232)
==11220==    by 0x648A6F: dash::allocator::CollectiveAllocator<int>::allocate(unsigned long) (CollectiveAllocator.h:183)
==11220==    by 0x6472F1: dash::GlobMem<int, dash::allocator::CollectiveAllocator<int> >::GlobMem(unsigned long, dash::Team&) (GlobMem.h:125)
==11220==    by 0x645695: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::allocate(dash::Pattern<1, (dash::MemArrange)1, long> const&) (Array.h:1142)
==11220==    by 0x644990: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::DistributionSpec<1> const&, dash::Team&) (Array.h:701)
==11220==    by 0x644036: dash::Array<int, long, dash::Pattern<1, (dash::MemArrange)1, long> >::Array(unsigned long, dash::Team&) (Array.h:711)
==11220==    by 0x6D174B: CopyTest_BlockingGlobalToLocalBarrierUnaligned_Test::TestBody() (CopyTest.cc:326)
==11220==    by 0x81035A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/joseph/src/dash/dash/build/bin/dash-test-mpi)
==11220== 
[=   0  LOG =]               CopyTest.h :  35 | <=== Finished test case with 1 units 
[=   0  LOG =]               CopyTest.h :  22 | <<< Closing test suite: CopyTest 

@fuchsto Could you please look at this test? The test seems suspicious in the following line:

  dash::copy(array.begin() + start_index,
             array.begin() + start_index + num_elems_copy,
             local_array);

Isn't that reading beyond the bounds of the array if running on only one unit?

@fuchsto fuchsto self-assigned this Nov 23, 2016
@fuchsto
Copy link
Member

fuchsto commented Nov 23, 2016

@devreal Awesome, looks way better now.

Yes, you are probably right with the out-of-bounds; checking this now.

@fuchsto
Copy link
Member

fuchsto commented Nov 23, 2016

@devreal You can retry now, the out-of-bounds fix is merged.

@devreal
Copy link
Member Author

devreal commented Nov 23, 2016

Looks good, seems fixed!

Another issue I found: The implementation of Team::split(n) seems to leak memory at mutiple points:

  1. The memory allocated for sub_groups is never free'd as is the memory for sub_groups[i]. AFAICS, it can safely be free'd at the end of the function.
  2. The result variable is overwritten without delete'ing the memory allocate in an earlier iteration of the enclosing for loop. Not sure what is the right fix there? Maybe a break is missing?
  3. Is the memory allocated for result ever free'd again? I must admit I haven't invested much time in checking this but since a reference is returned the caller might not know that he should free it.

@fuchsto
Copy link
Member

fuchsto commented Nov 23, 2016

Yes, this is a refactoring ToDo, we should use std::vector for temporary arrays and nothing else.
Will fix this later today.

@devreal
Copy link
Member Author

devreal commented Nov 23, 2016

One more thing: Is dart_team_destroy ever called on the team created in the split? I can't seem to find, although dart_team_locality_finalize is called but that is something different.

@fuchsto
Copy link
Member

fuchsto commented Nov 23, 2016

No, but that's actually not a bug I think: #137

@devreal
Copy link
Member Author

devreal commented Nov 23, 2016

OK, I see. Not calling dart_team_destroy will likely leak the communicator handle but that's fine with me for now.

@fuchsto
Copy link
Member

fuchsto commented Nov 23, 2016

Yes, doesn't hurt too much for now, but we should free all team objects in dart_finalize.
Destruction of DART teams is addressed in #137.

I think we can close this issue for now.

@devreal, just create a pull request for branch bug-140-dart-memleaks when you're done.

@fuchsto
Copy link
Member

fuchsto commented Nov 24, 2016

Cleanup of dash::Team in separate issue: #159

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants