-
Notifications
You must be signed in to change notification settings - Fork 2k
test: Extend Spark Array functions: array_repeat , shuffle and slice test coverage
#20420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -70,7 +70,9 @@ DataFusion's SQL implementation is tested using [sqllogictest](https://github.co | |
| cargo test --profile=ci --test sqllogictests | ||
| # Run a specific test file | ||
| cargo test --profile=ci --test sqllogictests -- aggregate.slt | ||
| # Run and update expected outputs | ||
| # Run a specific test file and update expected outputs | ||
| cargo test --profile=ci --test sqllogictests -- aggregate.slt --complete | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this looks unrelated to the PR
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems a nice improvement regardless
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have covered this doc update here because of a minor change |
||
| # Run and update expected outputs for all test files | ||
| cargo test --profile=ci --test sqllogictests -- --complete | ||
| ``` | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to the PR, but how it comes
shufflewhich is non deterministic https://spark.apache.org/docs/latest/api/sql/#shuffle has stable output 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're providing a seed for deterministic output
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general,
shufflefunction isnon-deterministic. However, it can also returndeterministicresult by passingseedfor specially this kind of cases (e.g: test verification).For reference:
Spark
shufflefunction acceptsoptional randomSeedargument to :https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L1264
On the other hand, current DF-Spark
shufflefunc implementation support following both cases by accepting both arguments for the alignment with Spark: