Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort output of S3_File.list and Enso_File.list #11929

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

GregoryTravis
Copy link
Contributor

@GregoryTravis GregoryTravis commented Dec 19, 2024

Also: Comparable for Vector.
Closes #11899

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

@GregoryTravis GregoryTravis added the CI: No changelog needed Do not require a changelog entry for this PR. label Dec 19, 2024
@GregoryTravis GregoryTravis marked this pull request as ready for review December 19, 2024 21:25
@GregoryTravis GregoryTravis marked this pull request as draft December 19, 2024 21:25
@@ -187,7 +187,7 @@ type S3_File
S3_File.Value (S3_Path.Value bucket key) self.credentials
files = pair.second . map key->
S3_File.Value (S3_Path.Value bucket key) self.credentials
sub_folders + files
sub_folders + files . sort on=.s3_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this just sort files but not sub_folders?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we could just sort on .path and then we wouldn't need to implement the custom comparator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to not use a comparator? I was thinking of adding a comparator for S3_File as well -- if a user might have a vector of them, then they might want to sort it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it sorted the files and sub-folders properly:

s3://enso-data-samples/Store Data.xlsx
s3://enso-data-samples/data_2500_rows.csv
s3://enso-data-samples/examples/
s3://enso-data-samples/locations.json
s3://enso-data-samples/products.csv
s3://enso-data-samples/sample.xml
s3://enso-data-samples/spreadsheet.xls
s3://enso-data-samples/spreadsheet.xlsx
s3://enso-data-samples/tableau/
s3://enso-data-samples/transactions.csv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still add parentheses, the precedence of + vs . is not something that everyone knows immediately, so I think making it clear with parentheses will make it better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to not use a comparator? I was thinking of adding a comparator for S3_File as well -- if a user might have a vector of them, then they might want to sort it.

Hmm, I guess we could do that yeah. But still I'd rely on comparing the path as Text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1569 to 1588
type Vector_Comparator
compare x y =
min_length = x.length.min y.length
when_prefixes_equal =
## At this point, if the vectors are the same length, then they are
identical; otherwise, the shorter one is lesser.
if x.length == y.length then Ordering.Equal else
Ordering.compare x.length y.length
go i =
if i >= min_length then when_prefixes_equal else
x_elem = x.at i
y_elem = y.at i
Ordering.compare x_elem y_elem . and_then <|
@Tail_Call go (i + 1)
k = go 0
k

hash x = Default_Comparator.hash_builtin x

Comparable.from (that : Vector) = Comparable.new that Vector_Comparator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this just re-implements Vector_Lexicographic_Order.compare?

I think we did not register it into Comparable on purpose - comparing vectors is not unambiguously defined operation - there's multiple orderings that may seem right depending on context. So we did not want to provide any default one to avoid user confusion and require to choose one explicitly (we implement only lexicographic one but for example an (incomplate) ordering could be the point-wise order.

Of course we can decide that it is beneficial to include a default < for vectors. But I think the original decision was not wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside to making the comparable is that if a user accidentally nests vectors, and sorts them, they won't know they made a mistake. But it will be easy to see that the data is nested. Without a comparator, they'll get an error.

The upside is that they'll be able to sort vectors containing more things.

@JaroslavTulach You asked if modern languages allow this; Haskell does:

 main = do
   msp $ [1, 2] < [3, 4]
=> True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the Vector_Comparator since it's redundant and it's a separate question.

Comment on lines 47 to 48
r . should_equal (r.sort on=.path)
r . should_equal (r.sort on=.enso_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enso_path is private, I don't think the second check can work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 264 to 268
group_builder.specify "list should sort its output" <|
r = root.list
r.should_be_a Vector
r . should_equal (r.sort on=.s3_path)
r . should_equal (r.sort on=(x-> x.s3_path.to_text))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, .s3_path is private and implementation detail that should be a black box in the tests, use path that returns Text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to .path

@GregoryTravis
Copy link
Contributor Author

I removed the changes to vector, and added Comparable for Enso_File and S3_File, under the assumption that it's always good to have more comparators. If not, I can take those out too.

@GregoryTravis GregoryTravis marked this pull request as ready for review December 20, 2024 19:29
@@ -616,3 +616,11 @@ translate_file_errors related_file result =
s3_path = S3_Path.Value error.bucket error.key
s3_file = S3_File.Value s3_path related_file.credentials
Error.throw (File_Error.Not_Found s3_file)

type S3_File_Comparator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type should probably be marked as PRIVATE as we don't want it to show up in CB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@GregoryTravis GregoryTravis requested a review from radeusgd January 6, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort output of list for S3_File and Enso_File
2 participants