You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been using VMA extensively for our memory management on projects with heavy memory load. However, upon integrating the defragmentation utility, we started getting frequent memory corruption on internal VMA data. We narrowed down the source of this corruption to concurrent accesses on our VmaAllocator and (apparently) succeeded in preventing it by blocking allocations for the duration of the defragmentation. More context follows, with a couple questions at the end.
We use a single VmaAllocator from two separate threads. Most of our allocations are done from the rendering thread, which is also the one that runs the defragmentation. We also allocate from the main thread, although less frequently.
Our defragmentation integration looks something like this (it has the same structure as what can be found in the online documentation) :
vmaBeginDefragmentation(mainAllocator, ...);
for (;;)
{
if (vmaBeginDefragmentationPass(mainAllocator, ...) == VK_SUCCESS)
break;
CreateResourcesAndCopy();
WaitForCompletion();
if (vmaEndDefragmentationPass(mainAllocator, ...) == VK_SUCCESS)
break;
}
vmaEndDefragmentation(mainAllocator, ...);
Running this bit of code on heavy workloads from our rendering thread while our main thread is free to use mainAllocator for other allocations, we reliably trigger memory corruptions, specifically located on the first 4 bytes of random VmaAllocation from our working VmaDefragmentationPassMoveInfo::pMoves array.
Regardless of this particular symptom, we managed to prevent such corruption by implementing a mutex targeting the region between vmaBeginDefragmentation() and vmaEndDefragmentation() as well as other VMA calls on mainAllocator, specifically those that could be made concurrently to defragmentation.
As a side note, we first prevented memory corruption by enabling VMA_DEBUG_GLOBAL_MUTEX, changing VmaMutex::m_Mutex type to std::recursive_mutex, and locking it from our side right after vmaBeginDefragmentation() and releasing it right before vmaEndDefragmentation().
Does it make sense to treat the region from vmaBeginDefragmentation() to vmaEndDefragmentation() as a critical section with regards to the used VmaAllocator ? If so, is it something that could be enforced from within the library ?
The text was updated successfully, but these errors were encountered:
Thank you for reporting this bug and a good description. It sounds very serious. We will investigate it. Any help in debugging it is welcome.
Blocking any usage of the allocator between Begin and EndDefragmentation may be a good workaround for the bug, but it shouldn't be needed. Defragmentation was developed with the assumption that using the allocator while defragmentation is in progress is allowed.
Forgive me if I'm wrong, but I've been looking at the defrag code recently and the map transferring code seems really sketchy to me.
There's a period of time after the allocation data swap but before maps are transferred to the new block when an allocation thinks it has references to a block but doesn't (unmapping during this period of time can cause explosions or cause the reference count to underflow)
Copies to the new memory can occur while the same memory is being written elsewhere. Results vary based on memory type
New maps can be opened on memory that's already been copied to the new location but before the allocation swap, resulting in a write that isn't actually going to be used. It's possible for the block to be fully unmapped or freed by the defragmenter while this is happening.
It just seems like from a logical standpoint, from the moment the copy begins to the end of the pass, a memory block being moved needs to be unwritable. I'm not saying that VMA needs to do this, it probably makes more sense for us users to take care of that. More generally, the VMA documentation says that allocations are not considered thread safe, and the defrag manipulates them.
We've been using VMA extensively for our memory management on projects with heavy memory load. However, upon integrating the defragmentation utility, we started getting frequent memory corruption on internal VMA data. We narrowed down the source of this corruption to concurrent accesses on our
VmaAllocator
and (apparently) succeeded in preventing it by blocking allocations for the duration of the defragmentation. More context follows, with a couple questions at the end.We use a single
VmaAllocator
from two separate threads. Most of our allocations are done from the rendering thread, which is also the one that runs the defragmentation. We also allocate from the main thread, although less frequently.Our defragmentation integration looks something like this (it has the same structure as what can be found in the online documentation) :
Running this bit of code on heavy workloads from our rendering thread while our main thread is free to use
mainAllocator
for other allocations, we reliably trigger memory corruptions, specifically located on the first 4 bytes of randomVmaAllocation
from our workingVmaDefragmentationPassMoveInfo::pMoves
array.Regardless of this particular symptom, we managed to prevent such corruption by implementing a mutex targeting the region between
vmaBeginDefragmentation()
andvmaEndDefragmentation()
as well as other VMA calls onmainAllocator
, specifically those that could be made concurrently to defragmentation.As a side note, we first prevented memory corruption by enabling
VMA_DEBUG_GLOBAL_MUTEX
, changingVmaMutex::m_Mutex
type tostd::recursive_mutex
, and locking it from our side right aftervmaBeginDefragmentation()
and releasing it right beforevmaEndDefragmentation()
.Does it make sense to treat the region from
vmaBeginDefragmentation()
tovmaEndDefragmentation()
as a critical section with regards to the usedVmaAllocator
? If so, is it something that could be enforced from within the library ?The text was updated successfully, but these errors were encountered: