Skip to content

Comments

test: Extend Spark Array functions: array_repeat , shuffle and slice test coverage#20420

Merged
comphead merged 1 commit intoapache:mainfrom
erenavsarogullari:datafusion_spark_array_functions_test_coverage
Feb 20, 2026
Merged

test: Extend Spark Array functions: array_repeat , shuffle and slice test coverage#20420
comphead merged 1 commit intoapache:mainfrom
erenavsarogullari:datafusion_spark_array_functions_test_coverage

Conversation

@erenavsarogullari
Copy link
Member

@erenavsarogullari erenavsarogullari commented Feb 18, 2026

Which issue does this PR close?

Rationale for this change

This PR adds new positive test cases for datafusion-spark array functions: array_repeat , shuffle, slice for the following use-cases:

- nested function execution,
- different datatypes such as timestamp,
- casting before function execution

Also, being updated contributor-guide testing documentation with minor addition.

What changes are included in this PR?

Being added new positive test cases to datafusion-spark array functions: array_repeat , shuffle, slice.

Are these changes tested?

Yes, adding new positive test cases.

Are there any user-facing changes?

No

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) spark labels Feb 18, 2026
@erenavsarogullari erenavsarogullari force-pushed the datafusion_spark_array_functions_test_coverage branch 2 times, most recently from 9d73003 to 674f724 Compare February 18, 2026 05:10
@github-actions github-actions bot removed the spark label Feb 18, 2026
@erenavsarogullari erenavsarogullari force-pushed the datafusion_spark_array_functions_test_coverage branch from 674f724 to 693ba61 Compare February 18, 2026 05:11
Copy link
Contributor

@getChan getChan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@erenavsarogullari erenavsarogullari force-pushed the datafusion_spark_array_functions_test_coverage branch from 693ba61 to c19386f Compare February 19, 2026 03:31
@erenavsarogullari erenavsarogullari force-pushed the datafusion_spark_array_functions_test_coverage branch from c19386f to 44292e3 Compare February 19, 2026 03:40
cargo test --profile=ci --test sqllogictests -- aggregate.slt
# Run and update expected outputs
# Run a specific test file and update expected outputs
cargo test --profile=ci --test sqllogictests -- aggregate.slt --complete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks unrelated to the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a nice improvement regardless

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have covered this doc update here because of a minor change

NULL

query ?
SELECT shuffle(['2001-09-28T01:00:00'::timestamp, '2001-08-28T01:00:00'::timestamp, '2001-07-28T01:00:00'::timestamp, '2001-06-28T01:00:00'::timestamp, '2001-05-28T01:00:00'::timestamp], 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to the PR, but how it comes shuffle which is non deterministic https://spark.apache.org/docs/latest/api/sql/#shuffle has stable output 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're providing a seed for deterministic output

Copy link
Member Author

@erenavsarogullari erenavsarogullari Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, shuffle function is non-deterministic. However, it can also return deterministic result by passing seed for specially this kind of cases (e.g: test verification).

For reference:

shuffle(array)
shuffle(array, seed)

NULL

query ?
SELECT shuffle(['2001-09-28T01:00:00'::timestamp, '2001-08-28T01:00:00'::timestamp, '2001-07-28T01:00:00'::timestamp, '2001-06-28T01:00:00'::timestamp, '2001-05-28T01:00:00'::timestamp], 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're providing a seed for deterministic output

cargo test --profile=ci --test sqllogictests -- aggregate.slt
# Run and update expected outputs
# Run a specific test file and update expected outputs
cargo test --profile=ci --test sqllogictests -- aggregate.slt --complete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a nice improvement regardless

@comphead
Copy link
Contributor

Thanks everyone

@comphead comphead added this pull request to the merge queue Feb 20, 2026
Merged via the queue into apache:main with commit a936d0d Feb 20, 2026
31 checks passed
@erenavsarogullari erenavsarogullari deleted the datafusion_spark_array_functions_test_coverage branch February 20, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend Spark Array functions: array_repeat , shuffle and slice test coverage

4 participants