-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Support TPC-DS benchmarks #4763
Comments
I just tried running TPC-DS with DataFusion 39 and these are the failing queries currently:
|
This patch makes it so that rules the configure an `apply_order` will also include subqueries in their traversel. This is a step twoards being able to run TPC-DS q41 (apache#4763) which has an expressions that needs simplification before we can decorrelate the subquery. This closes apache#3770 and maybe apache#2480
This patch makes it so that rules the configure an `apply_order` will also include subqueries in their traversel. This is a step twoards being able to run TPC-DS q41 (apache#4763) which has an expressions that needs simplification before we can decorrelate the subquery. This closes apache#3770 and maybe apache#2480
I think q41 is the latest one that has to be supported, is that right @eejbyfeldt ? |
Actually, some of those errors seem to be runtime errors, so we need to check against some dataset as well. |
If I try to run the version of the queries we have in the datafusion repo against scala factor 1. Only q41 produces a hard error. But there are likely issues with the results for some of them.
Sounds great! In the long run it would be nice to have it part of CI like we do for TPC-H. |
Cool, that sounds promising @eejbyfeldt thanks for the hard work on supporting them! |
@andygrove @eejbyfeldt I updated the epic with tasks to support / test / verify the queries. |
this problem with q35, taken from the description above:
seems to be some sort of bug in query generation for tpc-ds. For me the provided
This has multiple columns that would resolve to the same name, like I believe this is not intentional, as the query template goes like this:
latest spec mentions this about q35:
so all aggregates are different than each other, if the template was used with the substitutions listed in the spec, no duplicates should occur, and this query can be executed successfully with the main branch of datafusion as of now |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to be able to run all TPC-DS queries with DataFusion, but some are not yet supported.
simplify_expressions
#3770Old description:
Describe the solution you'd like
Support all the queries.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: