Memory usage optimization: discard compute closure after evaluation in Lazy #214
+26
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements a memory usage optimization related to
Lazy
: once we have evaluated a Lazy value, we no longer need to retain the closure that performed the evaluation. Discarding closures (and, transitively, their dependencies) after lazy evaluation saves memory.Motivation and discovery of this issue
We have a single jsonnet file which has very high peak memory requirements during evaluation. I captured a heap dump and noticed significant memory usage due to
sjsonnet.Lazy[]
, with over half the shallow size consumed byLazy[]
arrays:From the merged paths view, we see that most of these are retained from anonymous classes:
For example, here
sjsonnet.Evaluator$$anonfun$visitAsLazy$2
corresponds to the() => visitExpr(e)
insjsonnet/sjsonnet/src/sjsonnet/Evaluator.scala
Lines 81 to 84 in 759cea7
That is defining an anonymous implementation of sjsonnet.Lazy using the single abstract method type syntax.
Here, visitExpr takes an implicit ValScope parameter. ValScope is a value class, wrapping a
bindings: Array[Lazy]
.Thus, our
sjsonnet.Evaluator$$anonfun$visitAsLazy$2
anonymous class captures the values needed to evaluate thevisirExpr(e)
closure, capturing thebindings
array and thereby contributing to the high count of retainedArray[Lazy]
. We can also see this from the object inspection in the heap dump:Insight behind the optimization: we don't need the closure after evaluation
In the heap dump that I looked at, most of these anonymous Lazy instances had non-null
cached
fields, meaning that their lazy values had been computed. At this point the value will not be recomputed so we can discard the closure, and, transitively discard its heavyweight bindings, which in turn reference more closures, and bindings, and so on.I also draw inspiration (but not implementation) from a neat behavior of Scala
lazy val
s where class constructor parameters that are used exclusively inlazy val
are discarded after successful lazy evaluation. For instance, the codedecompiles (via cfr) into:
demonstrating how the closure is discarded after lazy evaluation.
Implementation
This PR implements a similar optimization. I introduce a new class
LazyWithComputeFunc(@volatile private[this] var computeFunc: () => Val) extends Lazy
which can be used in place of the anonymous classes and which discards the closure after evaluation.The underlying implementation is slightly tricky because of a few performance considerations:
force
was monomorphic, so I couldn't change that.Here, I have chosen to make
computeFunc
a volatile field and check it inside ofcompute()
. In ordinary cases,compute()
will only be called once becauseforce
checks whethercached
has already been computed. In the rare case of concurrent calls tocompute()
, we first check whethercomputeFunc
has been nulled: if it is null then some other thread computed and cached a value (assigned from withincompute()
itself) and that other thread's write is guaranteed to be visible to the race-condition-losing thread because the volatile read ofcomputeFunc
provides piggybacked visibility of writes from the other racing thread (see https://stackoverflow.com/a/8769692).Testing
This passes existing unit tests. I did some manual heap dump tests to validate that this cuts memory usage on small toy examples. I have not yet run this end-to-end on the real workload which generated the original heap dumps.