test: Extend Spark Array functions: array_repeat , shuffle and slice test coverage#20420
Conversation
9d73003 to
674f724
Compare
674f724 to
693ba61
Compare
693ba61 to
c19386f
Compare
c19386f to
44292e3
Compare
| cargo test --profile=ci --test sqllogictests -- aggregate.slt | ||
| # Run and update expected outputs | ||
| # Run a specific test file and update expected outputs | ||
| cargo test --profile=ci --test sqllogictests -- aggregate.slt --complete |
There was a problem hiding this comment.
this looks unrelated to the PR
There was a problem hiding this comment.
It seems a nice improvement regardless
There was a problem hiding this comment.
I have covered this doc update here because of a minor change
| NULL | ||
|
|
||
| query ? | ||
| SELECT shuffle(['2001-09-28T01:00:00'::timestamp, '2001-08-28T01:00:00'::timestamp, '2001-07-28T01:00:00'::timestamp, '2001-06-28T01:00:00'::timestamp, '2001-05-28T01:00:00'::timestamp], 1); |
There was a problem hiding this comment.
not related to the PR, but how it comes shuffle which is non deterministic https://spark.apache.org/docs/latest/api/sql/#shuffle has stable output 🤔
There was a problem hiding this comment.
We're providing a seed for deterministic output
There was a problem hiding this comment.
In general, shuffle function is non-deterministic. However, it can also return deterministic result by passing seed for specially this kind of cases (e.g: test verification).
For reference:
-
Spark
shufflefunction acceptsoptional randomSeedargument to :
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L1264 -
On the other hand, current DF-Spark
shufflefunc implementation support following both cases by accepting both arguments for the alignment with Spark:
shuffle(array)
shuffle(array, seed)
| NULL | ||
|
|
||
| query ? | ||
| SELECT shuffle(['2001-09-28T01:00:00'::timestamp, '2001-08-28T01:00:00'::timestamp, '2001-07-28T01:00:00'::timestamp, '2001-06-28T01:00:00'::timestamp, '2001-05-28T01:00:00'::timestamp], 1); |
There was a problem hiding this comment.
We're providing a seed for deterministic output
| cargo test --profile=ci --test sqllogictests -- aggregate.slt | ||
| # Run and update expected outputs | ||
| # Run a specific test file and update expected outputs | ||
| cargo test --profile=ci --test sqllogictests -- aggregate.slt --complete |
There was a problem hiding this comment.
It seems a nice improvement regardless
|
Thanks everyone |
Which issue does this PR close?
array_repeat,shuffleandslicetest coverage #20419.Rationale for this change
This PR adds new positive test cases for
datafusion-sparkarray functions:array_repeat,shuffle,slicefor the following use-cases:Also, being updated contributor-guide testing documentation with minor addition.
What changes are included in this PR?
Being added new positive test cases to
datafusion-sparkarray functions:array_repeat,shuffle,slice.Are these changes tested?
Yes, adding new positive test cases.
Are there any user-facing changes?
No