Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement grouping function using grouping id #12704

Merged

Conversation

eejbyfeldt
Copy link
Contributor

@eejbyfeldt eejbyfeldt commented Oct 1, 2024

Which issue does this PR close?

Closes #5647 .
Closes #10208

Rationale for this change

Grouping function is supported by postgres and needed inorder to be able to run TPC-DS (#4763).

What changes are included in this PR?

This PR adds an analyzer rule that will transform the grouping aggregation function into a projection on top of grouping id (added in #12571). Since the grouping aggregation function is so specific and special (and not really a aggregation), I don't think it make sense to generalize the UDAF api to support it.

This PR currently leaves the non-functional grouping UDAF. But since it non functional and will remain so, we might want to replace it with some built in Expr::Grouping instead. Would love some more feedback if that should be done in this PR or some other alternative.

Are these changes tested?

New slt, mostly copied from #10208 with some additional test of wrong arguments.

Are there any user-facing changes?

Yes, after this grouping function will be usable.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait functions labels Oct 1, 2024
@eejbyfeldt eejbyfeldt force-pushed the implement-grouping-function-using-grouping-id branch from 6b6bf52 to b67aee9 Compare October 1, 2024 20:27
@eejbyfeldt eejbyfeldt force-pushed the implement-grouping-function-using-grouping-id branch from b67aee9 to be38977 Compare October 7, 2024 18:08
@github-actions github-actions bot removed logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate substrait labels Oct 7, 2024
@eejbyfeldt eejbyfeldt force-pushed the implement-grouping-function-using-grouping-id branch from be38977 to bb53649 Compare October 7, 2024 18:19
@eejbyfeldt eejbyfeldt marked this pull request as ready for review October 8, 2024 18:20
@eejbyfeldt
Copy link
Contributor Author

Are @comphead (because of your review here #12565) or @thinkharderdev (becaues of your review on #12571) be willing to do a review of this one?

// Postgres allows grouping function for group by without grouping sets, the result is then
// always 0
if !is_grouping_set {
return Ok(Expr::Literal(ScalarValue::from(0u32)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be u32, or can be u8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u32 was incorrect. Currently the return type of the aggregate function is i32 so changed it to that.

The return value of the grouping function should always be the same type. The u8, u16 etc. is only an optimization for the internal grouping id. But we always cast it to i32 here, when computing the grouping column.

fn is_grouping_function(expr: &Expr) -> bool {
// TODO: Do something better than name here should grouping be a built
// in expression?
matches!(expr, Expr::AggregateFunction(AggregateFunction { ref func, .. }) if func.name() == "grouping")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can extend AggregateFunction 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe adding Expr::Grouping but we match on Expr exhaustively in quite a few places so it becomes a semi big change.

The thing with the grouping function does not look that much like a aggregate function except for that where it can appear in sql. But it does not support extra AggregateFunction stuff like order by/null treatment/filter/distinct.

This patch adds a Analyzer rule to transform the grouping aggreation
function into computation ontop of the grouping id that is used
internally for grouping sets.
@eejbyfeldt eejbyfeldt force-pushed the implement-grouping-function-using-grouping-id branch from bb53649 to 8acd5cb Compare October 15, 2024 17:42
@eejbyfeldt eejbyfeldt requested a review from comphead October 15, 2024 17:54
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its lgtm, thanks @eejbyfeldt

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

Thanks @eejbyfeldt and @comphead

@alamb alamb merged commit 8802f63 into apache:main Oct 16, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Grouping functions with Group By CUBE/ROLLUP/GROUPING SETS
3 participants