-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort output of S3_File.list
and Enso_File.list
#11929
base: develop
Are you sure you want to change the base?
Conversation
@@ -187,7 +187,7 @@ type S3_File | |||
S3_File.Value (S3_Path.Value bucket key) self.credentials | |||
files = pair.second . map key-> | |||
S3_File.Value (S3_Path.Value bucket key) self.credentials | |||
sub_folders + files | |||
sub_folders + files . sort on=.s3_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this just sort files
but not sub_folders
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we could just sort on .path
and then we wouldn't need to implement the custom comparator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to not use a comparator? I was thinking of adding a comparator for S3_File
as well -- if a user might have a vector of them, then they might want to sort it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it sorted the files and sub-folders properly:
s3://enso-data-samples/Store Data.xlsx
s3://enso-data-samples/data_2500_rows.csv
s3://enso-data-samples/examples/
s3://enso-data-samples/locations.json
s3://enso-data-samples/products.csv
s3://enso-data-samples/sample.xml
s3://enso-data-samples/spreadsheet.xls
s3://enso-data-samples/spreadsheet.xlsx
s3://enso-data-samples/tableau/
s3://enso-data-samples/transactions.csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still add parentheses, the precedence of +
vs .
is not something that everyone knows immediately, so I think making it clear with parentheses will make it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to not use a comparator? I was thinking of adding a comparator for
S3_File
as well -- if a user might have a vector of them, then they might want to sort it.
Hmm, I guess we could do that yeah. But still I'd rely on comparing the path
as Text
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
type Vector_Comparator | ||
compare x y = | ||
min_length = x.length.min y.length | ||
when_prefixes_equal = | ||
## At this point, if the vectors are the same length, then they are | ||
identical; otherwise, the shorter one is lesser. | ||
if x.length == y.length then Ordering.Equal else | ||
Ordering.compare x.length y.length | ||
go i = | ||
if i >= min_length then when_prefixes_equal else | ||
x_elem = x.at i | ||
y_elem = y.at i | ||
Ordering.compare x_elem y_elem . and_then <| | ||
@Tail_Call go (i + 1) | ||
k = go 0 | ||
k | ||
|
||
hash x = Default_Comparator.hash_builtin x | ||
|
||
Comparable.from (that : Vector) = Comparable.new that Vector_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this just re-implements Vector_Lexicographic_Order.compare
?
I think we did not register it into Comparable
on purpose - comparing vectors is not unambiguously defined operation - there's multiple orderings that may seem right depending on context. So we did not want to provide any default one to avoid user confusion and require to choose one explicitly (we implement only lexicographic one but for example an (incomplate) ordering could be the point-wise order.
Of course we can decide that it is beneficial to include a default <
for vectors. But I think the original decision was not wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside to making the comparable is that if a user accidentally nests vectors, and sorts them, they won't know they made a mistake. But it will be easy to see that the data is nested. Without a comparator, they'll get an error.
The upside is that they'll be able to sort vectors containing more things.
@JaroslavTulach You asked if modern languages allow this; Haskell does:
main = do
msp $ [1, 2] < [3, 4]
=> True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the Vector_Comparator
since it's redundant and it's a separate question.
r . should_equal (r.sort on=.path) | ||
r . should_equal (r.sort on=.enso_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enso_path
is private, I don't think the second check can work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
test/AWS_Tests/src/S3_Spec.enso
Outdated
group_builder.specify "list should sort its output" <| | ||
r = root.list | ||
r.should_be_a Vector | ||
r . should_equal (r.sort on=.s3_path) | ||
r . should_equal (r.sort on=(x-> x.s3_path.to_text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, .s3_path
is private and implementation detail that should be a black box in the tests, use path
that returns Text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to .path
I removed the changes to vector, and added |
@@ -616,3 +616,11 @@ translate_file_errors related_file result = | |||
s3_path = S3_Path.Value error.bucket error.key | |||
s3_file = S3_File.Value s3_path related_file.credentials | |||
Error.throw (File_Error.Not_Found s3_file) | |||
|
|||
type S3_File_Comparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type should probably be marked as PRIVATE
as we don't want it to show up in CB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Also:
Comparable
forVector
.Closes #11899
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.