-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid duplicating complex expression in comparisons #34172
base: main
Are you sure you want to change the base?
Conversation
This change can already take care of most of the worst offenders found in #34048 🥳 |
I'll add some tests that check this transformation specifically |
8a4e1bf
to
144b7e0
Compare
body = _sqlExpressionFactory.OrElse( | ||
_sqlExpressionFactory.AndAlso(body, _sqlExpressionFactory.AndAlso(leftIsNotNull, rightIsNotNull)), | ||
_sqlExpressionFactory.AndAlso(leftIsNull, rightIsNull)); | ||
if (leftNullable && rightNullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranma42 can you please add a comment here explaining the logic, i.e. that duplication is bad except for columns, plus columns may make usage of indexes which arbitrary expressions (usually) won't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added 4c3a542
(#34172) which is aimed at addressing this
@@ -1003,7 +1003,10 @@ public override async Task Json_collection_index_in_predicate_using_constant(boo | |||
""" | |||
SELECT [j].[Id] | |||
FROM [JsonEntitiesBasic] AS [j] | |||
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') <> N'Foo' OR JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') IS NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a place where I'm a bit hesitant about this change... The SQL Server docs specifically document using an indexed computed column over JSON_VALUE as a way to speed up queries filtering inside a JSON document; unless I'm mistaken, these queries would likely stop using such an index if we switch to the CASE translation (maybe in this specific test it doesn't matter because of the inequality, but you get whar I'm saying).
In a perfect world, we'd vary our translation based on knowledge that an indexed computed column exists for this expression, but we're pretty far away from doing that at the moment.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe it is very likely that the CASE
translation will not take advantage of indexes, but I would expect the same to be true for the original version as well, as it is performing a <>
comparison (maybe it would use the index to include all of the NULL
values 🤔, but then it would still have to scan all of the non-null values and filter each of them).
For equality in predicates the translation should already be
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') = N'Foo'
which should effectively take advantage of indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
So I'm trying to understand whether there are cases - and which ones - in which this PR causes a perf regression because the switch to CASE doesn't use an index. If there are such cases (and after all, we do avoid the CASE translation for columns because of this), we should think carefully - I'm not sure whether the optimization to remove double evaluation for some cases outweighs the (potentially severe) regression triggered by not using an index. A conservative approach would wait until we could know more reliably whether an index would be used on an expression (e.g. because we're aware of expression indexes/indexed computed columns).
I know I'm being very cautious here, I'm thinking about the perf regressions brought about by the switch from IN+constants to OPENJSON in 8.0 - that change improved general perf for many queries, but also caused severe regressions for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are cases in which the translation could cause a regression; the main one I can think of (which is the one currently avoided by the column handling) is the following (and similar ones):
.Where(e => !e.BoolA != e.NullableBoolB)
This is
SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE "e"."BoolA" = "e"."NullableBoolB" OR "e"."NullableBoolB" IS NULL
Sqlite (and litely other SQL providers) would take advantage of an index on NullableBoolB
(assuming BoolA and NullableBoolB are actually columns from different tables).
When using the CASE
, this becomes
SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE CASE
WHEN "e"."BoolA" <> "e"."NullableBoolB" THEN 0
ELSE 1
END
and the index cannot be used anymore.
I pushed ranma42@ecdd12e to show what happens when the CASE
transformation is used whenever it is valid.
With #34166 this could possibly affect a few more tests, but if I am not mistaken, this boolean comparison (negated-different-from) is the only case in which a "good" WHERE
would regress (at least according to optimizations rules similar to those of sqlite).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, obviously you could also do the same on json values:
.Where(e => !e.MyJsonColumn.BoolA != e.MyOtherJsonColumn.NullableBoolB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of checking for a simple column, the right check would be whether the emitted operand is =
vs !=
? (aka if the WHERE
predicate has some chances of being optimized)
When comparing a nullable expression to a non-nullable one, a `NULL` result always represent a difference. This makes it possible to avoid duplicating the nullable expression by mapping the `NULL` result to a `FALSE` (when comparing for equality). Fixes dotnet#34165.
4c3a542
to
4a9993e
Compare
4a9993e
to
aeca728
Compare
I pushed a new version of the branch to solve the merge conflicts. |
WHEN [c].[Region] = N'ASK' AND [c].[Region] IS NOT NULL THEN CAST(1 AS bit) | ||
WHEN [c].[Region] = N'ASK' THEN CAST(1 AS bit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice side-effect, but we might want to ensure that this kind of optimization happens regardless of this PR (and possibly not only on comparisons 🤔 )
When comparing a nullable expression to a non-nullable one, a
NULL
result alwaysrepresent a difference.
This makes it possible to avoid duplicating the nullable expression by mapping
the
NULL
result to aFALSE
(when comparing for equality).Fixes #34165.