Generalize target invalidation logic #3706

lefou · 2024-10-10T08:47:15Z

Mill target invalidation works reasonable under the assumption that all outputs are cached in the Mill cache (e.g. the out/ dir). Sometimes it's necessary to point to things outside of Mill's cache. Some use cases include:

Thirdparty caches like coursier
Results of external tools, like docker

Whenever we want to refer to something outside of Mill's cache, we are either forced to use commands, which are not cached or use cached tasks which risk to get stale. To prohibit such situations we already added the PathRef.revalidate flag, which ensures that cached tasks returning PathRefs can detect if their cached value is invalid, to invalidate itself. We use this to properly detect cache evictions of coursier.

To make Mill even more versatile, we should have a general concept to invalidate cached targets with user provided logic. E.g. to avoid to build a upstream docker container each time, we want to use a cached task. But since we can't point to exact file locations, we have no idea if the container is still valid. If we had some check logic to run, it could just query the docker registry and give a quick response.

In PR #3617 we introduced a new way to tag tasks.

We could introduce some new validate tag, which either contains a closure to check or points to a companion task that does the check for us. I think we need to experiment and thinker a bit, before we find the best approach.

The text was updated successfully, but these errors were encountered:

lihaoyi · 2024-10-10T11:30:25Z

Would a Task.Input suffice for these? The task body always runs, can decide whether to re-use previous external output or force a new external computation, and then returns a result that will only invalidate downstream tasks if the result is changed from before

lihaoyi · 2024-10-11T07:42:15Z

Maybe we need a Task.Input(persistent = true) in a more general case, so the input task is also able to re-use previous results if desired?

lefou · 2024-10-12T08:55:37Z

@lihaoyi Do you mean to decide inside the input task whether the heavy work needs to be done and do it conditionally? This might be doable when we make Inputs persistent (to access the previous result), but I assume handling such logic will appear to be more complex as the original proposal. Also, each use-site has to come up with its own correct logic instead of just relying on a Mill concept and providing some validation criteria.

lefou · 2024-10-12T09:03:25Z

The charm of the current PathRef validation is, that it only runs if the task was cached before. It just throws a specific exception when loading the cached result from the cache and Evaluator is handling it. So we could implement general invalidation by just returning a type that does the validation in it's JSON deserializer. But having some concrete API make it easier and better discoverable.

lihaoyi · 2024-10-14T12:08:50Z

Yes that's what I meant. I don't mind adding standardized helpers, just trying to understand what the underlying primitives.

Just like Task.Source/Task.Sources are just a thin layer on top of Task.Input, this "check upstream cache key and re-evaluate as necessary" could be a thin layer on top of Task.Input(persistent = true)

lefou · 2024-10-14T17:24:50Z

The persistent solution still has to handle the persistent of the result to have it available for the validation step. Although Mill knows all cached values, we can't access them in a target and need to implement it ourselves. Beside being tedious and error-prone, it's also redundant. It would be nice, if we could access the previous cached result somehow, e.g. via some Task.ctx().cached: Option[T]. Then we could simply return it, if we find it valid. And only if not, we could start the heavy computation.

lihaoyi · 2024-10-15T01:49:28Z

That's possible. The way Task.Input works is by override def sideHash: Int = util.Random.nextInt(). Presumably we could have another kind of task where sideHash is customizable by the user, and then all the caching and invalidation should work automatically

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize target invalidation logic #3706

Generalize target invalidation logic #3706

lefou commented Oct 10, 2024

lihaoyi commented Oct 10, 2024

lihaoyi commented Oct 11, 2024

lefou commented Oct 12, 2024 •

edited

Loading

lefou commented Oct 12, 2024

lihaoyi commented Oct 14, 2024

lefou commented Oct 14, 2024

lihaoyi commented Oct 15, 2024

Generalize target invalidation logic #3706

Generalize target invalidation logic #3706

Comments

lefou commented Oct 10, 2024

lihaoyi commented Oct 10, 2024

lihaoyi commented Oct 11, 2024

lefou commented Oct 12, 2024 • edited Loading

lefou commented Oct 12, 2024

lihaoyi commented Oct 14, 2024

lefou commented Oct 14, 2024

lihaoyi commented Oct 15, 2024

lefou commented Oct 12, 2024 •

edited

Loading