Fix for memory leak in JVMObjectTracker#801
Conversation
… to release JVM objects during shutdown.
...scala/microsoft-spark-2-3/src/test/scala/org/apache/spark/api/dotnet/DotnetBackendTest.scala
Show resolved
Hide resolved
|
@imback82 I think the fix this PR addresses and the fix for issue #792 / PR #793 are similar. We can either leave This PR takes the first approach, and #793 does the second, but I think we should try to be consistent with both fixes. I'm partial to the 2nd approach and |
Just to let you know. The fix I've created solves memory leak issue, but causes job isolation issues, when all tracking objects for one job may be cleaned up during shutdown of another job. The only sensible way to fix that is to make |
Sounds like a good plan. Looking forward to the updated PR. |
…aru doesn't contain such method.
|
Hi @suhsteve PR was updated, so currently Looking forward for any feedback. |
|
FYI I've done a little regression testing on my QA environment using latest fixes from current PR. All batch jobs, which previously suffered by memory leak, currently are running correctly without any memory issues. It would be perfect to start reviewing changes and move further towards integrating these changes into master. |
|
Thanks @spzSource for the update. I will get to this PR this week, sorry for the delay. |
|
looking forword the PR go to master. |
imback82
left a comment
There was a problem hiding this comment.
Few minor/nit comments, but generally looking great to me. Thanks @spzSource for working on this!
src/scala/microsoft-spark-2-3/src/main/scala/org/apache/spark/api/dotnet/SerDe.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2-3/src/main/scala/org/apache/spark/api/dotnet/SerDe.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2-3/src/test/scala/org/apache/spark/api/dotnet/SerDeTest.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2-3/src/test/scala/org/apache/spark/api/dotnet/SerDeTest.scala
Show resolved
Hide resolved
src/scala/microsoft-spark-2-3/src/test/scala/org/apache/spark/api/dotnet/Extensions.scala
Outdated
Show resolved
Hide resolved
...scala/microsoft-spark-2-3/src/test/scala/org/apache/spark/api/dotnet/DotnetBackendTest.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2-3/src/main/scala/org/apache/spark/api/dotnet/DotnetBackend.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2-3/src/main/scala/org/apache/spark/api/dotnet/DotnetBackend.scala
Outdated
Show resolved
Hide resolved
…to jvmObjectTracker-memory-leak
|
Seems that latest build failed due to nuget feed issue:
Do anybody have idea what has happened with this feed? |
Looks like the feed got deprecated recently. I will create a PR to fix this. Thanks! |
|
The build issue is fixed in #807. I pushed the changes to your branch. |
imback82
left a comment
There was a problem hiding this comment.
LGTM, thanks @spzSource!
Fixes #799
As per explanation in the linked issue there is memory leak in
JVMObjectTracker. Because ofJVMObjectTrackeris singleton by it's nature, it retains references to JVM objects throughout the entire life cycle of spark driver node.All these in turn causes massive memory leak when running
DotnetRunnermultiple time on the same driver node. For instance this happens when using SetJar approach in Databricks.To fix the memory leak
JVMObjectTrackeris cleaned up right beforeDotnetBackendshutdown, so it releases references to tracked JVM objects, which results for successful garbage collection against these objects.Heap statistics using
jmaptool after the fix applied:Usage of Old gen space is 12%, comparing with 99,9% before the fix.
Point to discuss:
My first attempt was to make
JVMObjectTrackernon-static, which potentially may provide better protection from leaks in the future. But making it non-static causes multiple file being modified, moreover it looks like it will be pretty hard to makeDotnetForeachBatchHelperwork with non-static version ofJVMObjectTracker.Assuming all mentioned above I've decided to implement an easiest possible solution, but I absolutely do not mind discussing other approaches.