-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring / renaming and caching? #1250
Comments
@mristin thanks for the question!
:) Feedback, issues, and contributions appreciated!
Yes that's a good question. The current design assumes that when you change code or change functions you want that portion of the graph to be recomputed. That said, old results are still around - see the note on the cache_key here.
@zilto might have a better idea, but for me the idea would be to:
Does that make sense? It's possible you can do (3) before (2), but in any case you'd need to know what cache results you want to port over. |
Hi @skrawcz ! I ended up writing my own workflow library in the end (https://github.com/mristin/fsdag). |
@mristin no worries. Hamilton allows many patterns. Just to mention the lighter weight way is:
You could also write a python decorator that does the above, or use Hamilton's simple caching adapter approach that is simpler than the new built in one with cache_keys, etc. If you want to handle reading/writing more systematically you can read up on materialization. |
Hi @mristin! I think Hamilton is a great fit here! I myself used it during my Master's thesis. This recording from our community meetup gives more context.
Manually digging into the cache keys is not a common pattern. Although relevant @skrawcz's initial suggestions were more "power user" features. Cached results are based on "the input data + the code of a given node". If you rename a function or add a parameter, you're changing the code and the caching algorithm needs to re-execute the node (it can't know if the code change affects the output or not otherwise.) Looking at the library you shared, the main feature of Hamilton in comparison is that it automatically wires the DAG from the function definitions. This is a unique and powerful feature that enables iterative development in notebooks for instance (tutorial here) with the ability to save your DAG to a Issues / future work@skrawcz regarding caching, we could add a mechanism to specify a constant cache key via the |
First of all, thanks a lot for such a great tool! I'm just starting out with it and reading the documentation.
I missed one feature in the documentation which is crucial for workflows in development with longer-running tasks. Inevitably, during the development, we will need to refactor the tasks -- rename, introduce default arguments, etc. How does Hamilton's caching deal with refactoring?
For example, assume we want to rename a function. Is there any way we can keep the cache? Or introduce an argument with default value, so that previous computations do not get invalidated?
So far, I haven't seen a recipe in the documentation how to deal with refactoring.
If you lack time, feel free to outline the recipe here, and I'll add it to the documentation as a pull request.
The text was updated successfully, but these errors were encountered: