You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the result of discussing with a user on backwards compatibility of a dataflow.
Currently, @config offers 4 options:
@config.when(key="foo"): select this implementation when equality is True
@config.when_not(key="foo"): select this impl. when equality is False
@config.when_in(key=["foo", "bar"]): selects this impl. when key in list[] is True
@config.when_not_in(key=["foo", "bar"]): selects this impl. when key in list[] is False
This covers a lot of cases, but there's no way to specify a default.
Example 1
Here's a simple illustration of limitations for backwards compatibility.
Now I'm adding a version2 and I want to have version1 as my default.
problem
If you use @config.when(version="1") and @config.when(version="2"), this can break downstream drivers because there will be no node foo if .with_config() is not set.
# dataflow.pyfromhamilton.function_modifiersimportconfig@config.when(version="1")deffoo__v1() ->int:
return1@config.when(version="2")deffoo__v2() ->int:
return2# run.pyimportdataflowfromhamiltonimportdriver# breaks because `.with_config()` didn't set `version="1"` or `version="2"`dr=driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])
solution
Best solution is to annotate when_not(version="2") to catch all configurations (including empty ones, i.e., when .with_config() is not present).
If I'm conserving my previous code and adding @config.when(version="3"), it will never be hit. This is because the already existing when_not(version="2") will catch this configuration.
# dataflow.pyfromhamilton.function_modifiersimportconfig@config.when_not(version="2")deffoo__v1() ->int:
return1@config.when(version="2")deffoo__v2() ->int:
return2@config.when(version="3")deffoo__v3() ->int:
return3# run.pyimportdataflowfromhamiltonimportdriver# there will be no errors, but `v1` will be used actuallydr=driver.Builder().with_config({"version": "3"}).with_modules(dataflow).build()
dr.execute(["foo"])
Solution
The user has to modify the decorator for foo__v1() and set it to when_not_in(version=["2", "3"]) to catch all configurations.
The next problem is that whenever an implementation is added, you need to remember to add it to this list otherwise you will silently catch the new version="4".
The main issue is backwards compatibility. When refactoring from a single implementation to two implementations, users have to carefully use .when() and .when_not() in conjunction otherwise, they will break Driver that don't have a config. Then, when moving from 2 to 3+, they have to use when_not_in() and manually manage a list. It is also not obvious from the code that the when_not_in() means "default implementation".
Currently, using .when(version="1") and .when(version="2") implicitly creates a pattern of raising an error on invalid configurations (e.g., version=-1) because there would be a missing node foo, which will likely break a key path. If breaking the path didn't raise an error then a correct or incorrect config didn't matter.
This relates to a broader task of defining the space of valid configurations.
Solution
We should have a @config.default to ensure a node foo is always present in the DAG. Its name is also easy to understand. When you're moving from 1 implementation to 2+, you get a clear design decision: do I want a config.when with v1 and v2 or a default and v2?
Using @config.default would mean "select this implementation if no other config is resolved". This condition needs to be the last resolved and you can't have two nodes of the same name with @config.default.
# dataflow.pyfromhamilton.function_modifiersimportconfig@config.defaultdeffoo__v1() ->int:
return1@config.when(version="2")deffoo__v2() ->int:
return2@config.when(version="3")deffoo__v3() ->int:
return3# run.pyimportdataflowfromhamiltonimportdriver# passing no config means `default` was useddr=driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])
The text was updated successfully, but these errors were encountered:
This is a really good feature (thought we had an issue a while back?), the hard part is that we need to store global state, largely due to the internal way we manage decorators.
Each decorator creates 0+ nodes from a function (this is currently how it works)
The first decorator (the default) wouldn't know whether to create a node or not, cause it depends on the state of the others
We'd have to store some state -- E.G. do a fallback-type-thing where we know whether it was hit already
This would have to be run in a second pass (unless we're hacking around here) -- E.G. we don't know when we've hit the last one...
So it might be possible to hack in, but it's a fundamental limitation. If we knew it was last, it would be easy enough, or if we just add a second pass at some point. Alternatively, we might be able to do something like this:
# dataflow.pyfromhamilton.function_modifiersimportconfigdeffoo__v1() ->int:
return1@config.when(version="2")deffoo__v2() ->int:
return2@config.when(version="3", otherwise=foo__v1) # we know it's last -- if this doesn't evaluate we evaluate to foo__v1.deffoo__v3() ->int:
return3# run.pyimportdataflowfromhamiltonimportdriver# passing no config means `default` was useddr=driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])
I think we process in the right order so this should work, but we'd still need to keep some state...
This is the result of discussing with a user on backwards compatibility of a dataflow.
Currently,
@config
offers 4 options:@config.when(key="foo")
: select this implementation when equality isTrue
@config.when_not(key="foo")
: select this impl. when equality isFalse
@config.when_in(key=["foo", "bar"])
: selects this impl. whenkey in list[]
is True@config.when_not_in(key=["foo", "bar"])
: selects this impl. whenkey in list[]
is FalseThis covers a lot of cases, but there's no way to specify a default.
Example 1
Here's a simple illustration of limitations for backwards compatibility.
This is
version1
Example 2
Now I'm adding a
version2
and I want to haveversion1
as my default.problem
If you use
@config.when(version="1")
and@config.when(version="2")
, this can break downstream drivers because there will be no nodefoo
if.with_config()
is not set.solution
Best solution is to annotate
when_not(version="2")
to catch all configurations (including empty ones, i.e., when.with_config()
is not present).Example 3
Now, I'm adding an implementation
version3
Problem
If I'm conserving my previous code and adding
@config.when(version="3")
, it will never be hit. This is because the already existingwhen_not(version="2")
will catch this configuration.Solution
The user has to modify the decorator for
foo__v1()
and set it towhen_not_in(version=["2", "3"])
to catch all configurations.The next problem is that whenever an implementation is added, you need to remember to add it to this list otherwise you will silently catch the new
version="4"
.Consequences
The main issue is backwards compatibility. When refactoring from a single implementation to two implementations, users have to carefully use
.when()
and.when_not()
in conjunction otherwise, they will breakDriver
that don't have a config. Then, when moving from 2 to 3+, they have to usewhen_not_in()
and manually manage a list. It is also not obvious from the code that thewhen_not_in()
means "default implementation".Currently, using
.when(version="1")
and.when(version="2")
implicitly creates a pattern of raising an error on invalid configurations (e.g.,version=-1
) because there would be a missing nodefoo
, which will likely break a key path. If breaking the path didn't raise an error then a correct or incorrect config didn't matter.This relates to a broader task of defining the space of valid configurations.
Solution
We should have a
@config.default
to ensure a nodefoo
is always present in the DAG. Its name is also easy to understand. When you're moving from 1 implementation to 2+, you get a clear design decision: do I want aconfig.when
withv1
andv2
or adefault
andv2
?Using
@config.default
would mean "select this implementation if no other config is resolved". This condition needs to be the last resolved and you can't have two nodes of the same name with@config.default
.The text was updated successfully, but these errors were encountered: