Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine single-value column to treat it as that single value #12120

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from

Conversation

radeusgd
Copy link
Member

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

@radeusgd radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Jan 23, 2025
@radeusgd radeusgd self-assigned this Jan 23, 2025
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a great start.

@radeusgd radeusgd marked this pull request as draft January 23, 2025 11:35
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to be getting failures in some of the tests. I guess I am not the only one, right? Are there bugs you'd like me to fix, @radeusgd?

c1.value_type . should_equal Value_Type.Integer
c1.at 0 . should_equal 23

c2 = c1 + 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume 100 + c1 would also work now yielding an Integer...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, worth adding a test, will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 4820070

100 + c1 won't work directly as it needs a cast, so we need 100 + c1:Integer. I think this is consistent with other examples, like my_integer_fn. Or do you think that binary operators should cope without casts? If yes, let's file a ticket to fix it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type is Column & Any, then there will be no need for a cast to Integer. The Integer will be available. To verify remove -> Column check in from_vector calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I thought that for now we wanted to see how it all works with -> Column, not Column & Any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for now we agreed to try -> Column.
It is a bit more 'conservative' and should more easily allow 'loosening' the requirement to be Column & Any and removing the need for casts - so the eventual change in this direction is doable but the reverse direction could break user's workflows (removing casts is a mostly compatible change, whereas starting to requiring them is breaking).

If we try out with -> Column I think we can more easily migrate to -> Column & Any later. Doing a migration in the reverse direction will be a breaking change.

@@ -136,8 +137,9 @@ type Column
## PRIVATE
Creates a new column given a Java Column object.
from_java_column : Java_Column -> Column
from_java_column java_column =
Column.Value java_column
from_java_column java_column:Java_Column -> Column =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a PRIVATE function. It doesn't really need signature. E.g. the -> Column can be dropped without any impact. Some publicly visible function needs a signature in order for the static analysis to use it. What's that function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question is: what should be the signature of such a function? @radeusgd has proposed Column & Any. What should it mean if a multi value gets into such a cast?

  • if one of the intersection types of such a multi value is Column, then move it first and unhide all other hidden types
  • if no type of Column is among the intersection types of such a multi value, but there is a conversion from one of its types, then just perform conversion to Column and return the result

Is that how you want Column & Any to behave, @radeusgd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a PRIVATE function. It doesn't really need signature. E.g. the -> Column can be dropped without any impact. Some publicly visible function needs a signature in order for the static analysis to use it. What's that function?

The idea was that adding the return type check here, affects semantics of all functions that rely on it. So even if I forget to update the signature in some case, all methods that return a Column will have consistent semantics regarding 'hiding' of the intersection type. Thus my plan was to first only have this method updated (to get the desired semantics and experiment with it) and do a refactor of all type signatures in a separate PR - that will be a pretty big and boring change, so I thought it will be easier to review if separated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if one of the intersection types of such a multi value is Column, then move it first and unhide all other hidden types

I was initially thinking to avoid hiding any additional types that were intersected with Column
. I was not thinking about additionally 'unhiding' any other types. I think it may be acceptable to unhide all types if X & Any is encountered, although I feel that the instanceof metaphor works better if hidden types are not unhidden, just adding & Any guarantees we don't hide any currently visible ones.

I'm happy for either.

But overall yes, that was my initial suggestion for how I think Column & Any should work. I'd be very happy if we can get it working that way :)

Even if we don't end up using Column & Any here and only have Column - requiring the casts to be inserted, I still think that the semantics of X & Any (if it's ever encountered in any code) should indeed be as you suggest above.

@enso-bot enso-bot bot mentioned this pull request Jan 24, 2025
@JaroslavTulach JaroslavTulach force-pushed the wip/radeusgd/12095-column-as-single-value branch from c5b7067 to 9bb73ae Compare January 24, 2025 05:12
@@ -120,10 +121,12 @@ type Column
case needs_polyglot_conversion of
True -> Java_Column.fromItems name (enso_to_java_maybe items) expected_storage_type java_problem_aggregator
False -> Java_Column.fromItemsNoDateConversion name items expected_storage_type java_problem_aggregator
result = Column.from_java_column java_column . throw_on_warning Conversion_Failure
Copy link
Member

@JaroslavTulach JaroslavTulach Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Any.throw_on_warning extracts the first value of an intersection type of a multi value that has such a method - e.g. Column and invokes the method on such a simple value while loosing the additional types. Related issues:

The workaround is to avoid calling Any instance methods. Done in 9bb73ae. The only "proper solution" I can think of: if dispatching instance method of Any, send the whole multi value as self.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "proper solution" I can think of: if dispatching instance method of Any, send the whole multi value as self.

I think that is the behaviour we need for the intersection type solution to make any sense. We can't make it so easy to 'loose' the intersected types. While in libraries we can try to rely on workarounds, our users will get confused if the column stops being an Integer_Column after removing warnings. throw_on_warning and the related functions are user facing methods so they cannot be breaking it.

@radeusgd radeusgd marked this pull request as ready for review January 24, 2025 12:33
Comment on lines -177 to +179
_ : Float ->
key_as_float : Float ->
if no_warning then new_dict else
Warning.attach (Floating_Point_Equality.Used_As_Dictionary_Key key) new_dict
Warning.attach (Floating_Point_Equality.Used_As_Dictionary_Key key_as_float) new_dict
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With intersection types, with

case a of
   b : T -> ...

The a and b are no longer interchangeable. The T part could be hidden in a (so you cannot pass it into functions that expect T) and it can be un-hidden by the b : T check at which point b has it 'visible' and b can be passed to f (t : T) whereas a cannot.

So we need to keep in mind that whenever we rely on case of, we should not do _ : T but name it and use the named component. At least anywhere where we may expect the intersection types to come up.

cc: @jdunkerley @GregoryTravis

This is actually an important change to Enso semantics that we kind of knew about in #11600 but I'm not sure we have appreciate its implications enough yet.

Comment on lines 2474 to -2477
resolved = case value of
_ : Column -> value
_ : Text -> self.make_constant_column value
_ : Expression -> self.evaluate_expression value on_problems
_ : Column -> value
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, intersection types have changed something fundamental about Enso semantics.

case of branches that used to be completely disjoint - a value was able to match only one of them - are now no longer disjoint - a multi-value can match multiple branches. So now we need to be more careful about ordering of the branches. If a value can match more than one, which branch is the one that should be preferred?

E.g. here single-text-value column would match both _ : Text and _ : Column branch (because it is Column & Text). Now, we want it to still go to the _ : Column branch, if it went to _ : Text branch that was leading to errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Column Intersection Type (part 1)
5 participants