Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage optimization: discard compute closure after evaluation in Lazy #214

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

JoshRosen
Copy link
Contributor

@JoshRosen JoshRosen commented Nov 21, 2024

This PR implements a memory usage optimization related to Lazy: once we have evaluated a Lazy value, we no longer need to retain the closure that performed the evaluation. Discarding closures (and, transitively, their dependencies) after lazy evaluation saves memory.

Motivation and discovery of this issue

We have a single jsonnet file which has very high peak memory requirements during evaluation. I captured a heap dump and noticed significant memory usage due to sjsonnet.Lazy[], with over half the shallow size consumed by Lazy[] arrays:

image

From the merged paths view, we see that most of these are retained from anonymous classes:

image

For example, here sjsonnet.Evaluator$$anonfun$visitAsLazy$2 corresponds to the () => visitExpr(e) in

def visitAsLazy(e: Expr)(implicit scope: ValScope): Lazy = e match {
case v: Val => v
case e => () => visitExpr(e)
}

That is defining an anonymous implementation of sjsonnet.Lazy using the single abstract method type syntax.

Here, visitExpr takes an implicit ValScope parameter. ValScope is a value class, wrapping a bindings: Array[Lazy].

Thus, our sjsonnet.Evaluator$$anonfun$visitAsLazy$2 anonymous class captures the values needed to evaluate the visirExpr(e) closure, capturing the bindings array and thereby contributing to the high count of retained Array[Lazy]. We can also see this from the object inspection in the heap dump:

image

Insight behind the optimization: we don't need the closure after evaluation

In the heap dump that I looked at, most of these anonymous Lazy instances had non-null cached fields, meaning that their lazy values had been computed. At this point the value will not be recomputed so we can discard the closure, and, transitively discard its heavyweight bindings, which in turn reference more closures, and bindings, and so on.

I also draw inspiration (but not implementation) from a neat behavior of Scala lazy vals where class constructor parameters that are used exclusively in lazy val are discarded after successful lazy evaluation. For instance, the code

class Lazy(f: () => String) {
  lazy val value = f()
}

decompiles (via cfr) into:

public class Lazy {
    private String value;
    private Function0<String> f;
    private volatile boolean bitmap$0;

    private String value$lzycompute() {
        Lazy lazy = this;
        synchronized (lazy) {
            if (!this.bitmap$0) {
                this.value = (String)this.f.apply();
                this.bitmap$0 = true;
            }
        }
        this.f = null;
        return this.value;
    }

    public String value() {
        if (!this.bitmap$0) {
            return this.value$lzycompute();
        }
        return this.value;
    }

    public Lazy(Function0<String> f) {
        this.f = f;
    }
}

demonstrating how the closure is discarded after lazy evaluation.

Implementation

This PR implements a similar optimization. I introduce a new class LazyWithComputeFunc(@volatile private[this] var computeFunc: () => Val) extends Lazy which can be used in place of the anonymous classes and which discards the closure after evaluation.

The underlying implementation is slightly tricky because of a few performance considerations:

  • It's really important that we optimize performance for the case where a value is computed once and read many times. Previously, commit d26e9db previously took pains to ensure that force was monomorphic, so I couldn't change that.
  • Similarly, I don't to introduce any use of locks synchronization, nor any volatile accesses on hot paths.
  • Finally, I must ensure that the code is thread safe: it's acceptable to redundantly compute a value if we have concurrent initialization, but we can't NPE or otherwise crash.

Here, I have chosen to make computeFunc a volatile field and check it inside of compute(). In ordinary cases, compute() will only be called once because force checks whether cached has already been computed. In the rare case of concurrent calls to compute(), we first check whether computeFunc has been nulled: if it is null then some other thread computed and cached a value (assigned from within compute() itself) and that other thread's write is guaranteed to be visible to the race-condition-losing thread because the volatile read of computeFunc provides piggybacked visibility of writes from the other racing thread (see https://stackoverflow.com/a/8769692).

Testing

This passes existing unit tests. I did some manual heap dump tests to validate that this cuts memory usage on small toy examples. I have not yet run this end-to-end on the real workload which generated the original heap dumps.

abstract class Lazy {
protected abstract class Lazy {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in order to guarantee that I replaced all anonymous classes with the new implementation. Happy to back this part out if there's some API compatibility reason why we can't change this, but I'm hoping this is okay. Note that Val is sealed, so I think it's unlikely there are external implementors of Lazy.

@JoshRosen JoshRosen changed the title Discard compute closure after evaluation in Lazy Memory usage optimization: discard compute closure after evaluation in Lazy Nov 21, 2024
Copy link

@jordanmoxon-db jordanmoxon-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change! We'll need to test carefully as we bump the version in universe; I'll be interested to see how the memory usage changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants