Memory usage optimization: discard compute closure after evaluation in Lazy #214

JoshRosen · 2024-11-21T04:43:32Z

This PR implements a memory usage optimization related to Lazy: once we have evaluated a Lazy value, we no longer need to retain the closure that performed the evaluation. Discarding closures (and, transitively, their dependencies) after lazy evaluation saves memory.

Motivation and discovery of this issue

We have a single jsonnet file which has very high peak memory requirements during evaluation. I captured a heap dump and noticed significant memory usage due to sjsonnet.Lazy[], with over half the shallow size consumed by Lazy[] arrays:

From the merged paths view, we see that most of these are retained from anonymous classes:

For example, here sjsonnet.Evaluator$$anonfun$visitAsLazy$2 corresponds to the () => visitExpr(e) in

sjsonnet/sjsonnet/src/sjsonnet/Evaluator.scala

Lines 81 to 84 in 759cea7

    
           def visitAsLazy(e: Expr)(implicit scope: ValScope): Lazy = e match { 
        
             case v: Val => v 
        
             case e => () => visitExpr(e) 
        
           }

That is defining an anonymous implementation of sjsonnet.Lazy using the single abstract method type syntax.

Here, visitExpr takes an implicit ValScope parameter. ValScope is a value class, wrapping a bindings: Array[Lazy].

Thus, our sjsonnet.Evaluator$$anonfun$visitAsLazy$2 anonymous class captures the values needed to evaluate the visirExpr(e) closure, capturing the bindings array and thereby contributing to the high count of retained Array[Lazy]. We can also see this from the object inspection in the heap dump:

Insight behind the optimization: we don't need the closure after evaluation

In the heap dump that I looked at, most of these anonymous Lazy instances had non-null cached fields, meaning that their lazy values had been computed. At this point the value will not be recomputed so we can discard the closure, and, transitively discard its heavyweight bindings, which in turn reference more closures, and bindings, and so on.

I also draw inspiration (but not implementation) from a neat behavior of Scala lazy vals where class constructor parameters that are used exclusively in lazy val are discarded after successful lazy evaluation. For instance, the code

class Lazy(f: () => String) {
  lazy val value = f()
}

decompiles (via cfr) into:

public class Lazy {
    private String value;
    private Function0<String> f;
    private volatile boolean bitmap$0;

    private String value$lzycompute() {
        Lazy lazy = this;
        synchronized (lazy) {
            if (!this.bitmap$0) {
                this.value = (String)this.f.apply();
                this.bitmap$0 = true;
            }
        }
        this.f = null;
        return this.value;
    }

    public String value() {
        if (!this.bitmap$0) {
            return this.value$lzycompute();
        }
        return this.value;
    }

    public Lazy(Function0<String> f) {
        this.f = f;
    }
}

demonstrating how the closure is discarded after lazy evaluation.

Implementation

This PR implements a similar optimization. I introduce a new class LazyWithComputeFunc(@volatile private[this] var computeFunc: () => Val) extends Lazy which can be used in place of the anonymous classes and which discards the closure after evaluation.

The underlying implementation is slightly tricky because of a few performance considerations:

It's really important that we optimize performance for the case where a value is computed once and read many times. Previously, commit d26e9db previously took pains to ensure that force was monomorphic, so I couldn't change that.
Similarly, I don't to introduce any use of locks synchronization, nor any volatile accesses on hot paths.
Finally, I must ensure that the code is thread safe: it's acceptable to redundantly compute a value if we have concurrent initialization, but we can't NPE or otherwise crash.

Here, I have chosen to make computeFunc a volatile field and check it inside of compute(). In ordinary cases, compute() will only be called once because force checks whether cached has already been computed. In the rare case of concurrent calls to compute(), we first check whether computeFunc has been nulled: if it is null then some other thread computed and cached a value (assigned from within compute() itself) and that other thread's write is guaranteed to be visible to the race-condition-losing thread because the volatile read of computeFunc provides piggybacked visibility of writes from the other racing thread (see https://stackoverflow.com/a/8769692).

Testing

This passes existing unit tests. I did some manual heap dump tests to validate that this cuts memory usage on small toy examples. I have not yet run this end-to-end on the real workload which generated the original heap dumps.

JoshRosen · 2024-11-21T04:45:50Z

sjsonnet/src/sjsonnet/Val.scala

-abstract class Lazy {
+protected abstract class Lazy {


I did this in order to guarantee that I replaced all anonymous classes with the new implementation. Happy to back this part out if there's some API compatibility reason why we can't change this, but I'm hoping this is okay. Note that Val is sealed, so I think it's unlikely there are external implementors of Lazy.

jordanmoxon-db

Thanks for this change! We'll need to test carefully as we bump the version in universe; I'll be interested to see how the memory usage changes.

JoshRosen added 3 commits November 20, 2024 19:44

Discard closures after evaluation in Lazy

7ab0a0a

Simplify.

bbb0f92

generated comment re-word

fa86b13

JoshRosen commented Nov 21, 2024

View reviewed changes

JoshRosen changed the title ~~Discard compute closure after evaluation in Lazy~~ Memory usage optimization: discard compute closure after evaluation in Lazy Nov 21, 2024

jordanmoxon-db approved these changes Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage optimization: discard compute closure after evaluation in Lazy #214

Memory usage optimization: discard compute closure after evaluation in Lazy #214

JoshRosen commented Nov 21, 2024 •

edited

Loading

JoshRosen Nov 21, 2024

jordanmoxon-db left a comment

	def visitAsLazy(e: Expr)(implicit scope: ValScope): Lazy = e match {
	case v: Val => v
	case e => () => visitExpr(e)
	}

Memory usage optimization: discard compute closure after evaluation in Lazy #214

Are you sure you want to change the base?

Memory usage optimization: discard compute closure after evaluation in Lazy #214

Conversation

JoshRosen commented Nov 21, 2024 • edited Loading

Motivation and discovery of this issue

Insight behind the optimization: we don't need the closure after evaluation

Implementation

Testing

JoshRosen Nov 21, 2024

Choose a reason for hiding this comment

jordanmoxon-db left a comment

Choose a reason for hiding this comment

JoshRosen commented Nov 21, 2024 •

edited

Loading