Skip to content

Conversation

zblz
Copy link
Contributor

@zblz zblz commented Aug 26, 2025

Description of changes

On the implementation for ASOF JOIN for postgres in #11024 there were a couple of bugs that we have found through real-world usage:

  1. predicates other that the inequality condition would be applied on the join instead of the subquery, making them either irrelevant or delete all matches, since the subquery only returns one row
  2. queries without predicates beyond the asof inequality condition would result in on=None, which leads to invalid syntax for postgres.

This PR moves all the predicates into the subquery and sets the join's on clause to TRUE because the join will only be a one-to-one or a one-to-zero join.

The above had not been caught by the existing asof join tests, so I modified the test_asof_join test so that there are no predicates in addition to the inequality condition, and added a test_keyed_asof_join with a key but no tolerance since the tolerance was obfuscating the second bug.

@github-actions github-actions bot added tests Issues or PRs related to tests sql Backends that generate SQL labels Aug 26, 2025
@zblz zblz force-pushed the fix-postgres-asof-join branch from f42e29a to 2d3bde0 Compare August 26, 2025 15:53
@zblz zblz force-pushed the fix-postgres-asof-join branch from 2d3bde0 to 8bbbb38 Compare August 26, 2025 15:54
@zblz zblz force-pushed the fix-postgres-asof-join branch from 9eeb5d0 to 592b423 Compare August 27, 2025 10:00
@github-actions github-actions bot added the polars The polars backend label Aug 27, 2025
Copy link
Contributor

@NickCrews NickCrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with the details of how asof joins work, but I don't see any issues here.

@@ -84,6 +84,7 @@ def time_keyed_right(time_keyed_df2):
)
@pytest.mark.notyet(
[
"clickhouse",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because it also truncates to seconds, as in the test below? Does this PR therefore regress behavior for clickhouse (eg a user could be relying on some behavior, and we break it here)? Did you think if there would be a way to avoid this regression? If it's too hard, I could be fine with the regression, but at least worth talking about trying to avoid a regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this happens because of second truncation, and there is no regression in functionality for clickhouse: we were just not testing this case before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, that reason is not correct. This test fails on clickhouse because it does not support asof joins that do not also have an equality predicate on the join. It seems a bit arbitrary, but it might be a way to improve performance in their internal implementation. The case with a noop equality predicate (where all rows have the same key value) is in test_noop_keyed_asof_join

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
polars The polars backend sql Backends that generate SQL tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants