Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Unparse LogicalPlans with grouping aggregate functions #66

Closed
wants to merge 1 commit into from

Conversation

phillipleblanc
Copy link

Which issue does this PR close?

Rationale for this change

DataFusion v43 added a new analyzer rule ResolveGroupingFunction that supports grouping aggregate functions in select expressions with grouping set aggregations.

i.e.

SELECT 
  id, 
  first_name, 
  grouping(id) -- <- This!
FROM person 
GROUP BY ROLLUP(id, first_name)

As mentioned in the implementing PR (apache#12704) this is necessary to run several queries in the TPC-DS benchmark suite.

Unfortunately, logical plans that contain a grouping function that are run through this analyzer cannot be unparsed anymore, running into the error: Internal error: Tried to unproject column referring to internal grouping id.

This was because the plan was being transformed from this:

  Projection: lineitem.l_orderkey, grouping(lineitem.l_orderkey), avg(lineitem.l_quantity) AS avg_quantity
      Aggregate: groupBy=[[ROLLUP (lineitem.l_orderkey)]], aggr=[[avg(lineitem.l_quantity)]]
        TableScan: lineitem

to this:

  Projection: lineitem.l_orderkey, grouping(lineitem.l_orderkey), avg(lineitem.l_quantity) AS avg_quantity
    Projection: lineitem.l_orderkey, CAST(__grouping_id AS Int32) AS grouping(lineitem.l_orderkey), avg(lineitem.l_quantity)
      Aggregate: groupBy=[[ROLLUP (lineitem.l_orderkey)]], aggr=[[avg(lineitem.l_quantity)]]
        TableScan: lineitem

With this change, the unparser correctly supports unparsing back to the original SQL by taking the original SQL that was preserved in the column alias, and returning it directly in a Placeholder expression.

What changes are included in this PR?

When the unparser encounters the internal grouping id column, it returns a placeholder expression with the original SQL representing the grouping aggregate function.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@phillipleblanc phillipleblanc self-assigned this Nov 14, 2024
@phillipleblanc phillipleblanc added the enhancement New feature or request label Nov 14, 2024
@phillipleblanc
Copy link
Author

This doesn't completely fix the issue, investigating

@phillipleblanc phillipleblanc marked this pull request as draft November 14, 2024 23:57
@phillipleblanc
Copy link
Author

I've decided on a different approach to fix this in Spice that doesn't require changes in DataFusion, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants