-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Collapse cross-joins to faster joins #18633
Conversation
3c3f849
to
4ba8caf
Compare
If you want I can write a small fuzzer to try and further verify the correctness of this. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18633 +/- ##
==========================================
+ Coverage 79.84% 79.88% +0.03%
==========================================
Files 1518 1519 +1
Lines 205577 205925 +348
Branches 2892 2892
==========================================
+ Hits 164144 164497 +353
+ Misses 40885 40880 -5
Partials 548 548 ☔ View full report in Codecov by Sentry. |
Currently it doesn't account for suffixed names. If we do support that, I can remove much of the logic in the There is some logic there that deals with suffixed names. It is a bit of bookkeeping as duplicate names get suffixed post-join, but don't have that suffixed name pre-join. (They could have a name that has the suffix though. 🙈 ). |
This is now taken into account. I also added a very broad test that verifies the correctness up to a certain extent. |
88e2679
to
6f05782
Compare
I think this will close #18619, no? (I know you opened it, don't want it to remain orphaned). |
I don’t think it will. It is less pushing down predicates through a join and more about putting predicates into a join. |
This PR adds the `collapse_joins` optimization pass. This collapses a join and filters into a faster join. For example, `a.join(b, how='cross').filter(pl.col.l == pl.col.r)` can be collapsed into `a.join(b, how='inner', left_on=pl.col.l, right_on=pl.col.r)` if `l` is a column of `a` and `r` is a column of `b`. This currently only collapses `cross` joins into `inner` or `iejoin`, but theoretically other joins could be simplified as well.
2cf08a4
to
7341934
Compare
This PR adds the
collapse_joins
optimization pass.This collapses a join and filters into a faster join. For example,
a.join(b, how='cross').filter(pl.col.l == pl.col.r)
can be collapsed intoa.join(b, how='inner', left_on=pl.col.l, right_on=pl.col.r)
ifl
is a column ofa
andr
is a column ofb
.This currently only collapses
cross
joins intoinner
oriejoin
, but theoretically other joins could be simplified as well.closes #18753