Avoid duplicating complex expression in comparisons #34172

ranma42 · 2024-07-05T19:45:12Z

When comparing a nullable expression to a non-nullable one, a NULL result always
represent a difference.

This makes it possible to avoid duplicating the nullable expression by mapping
the NULL result to a FALSE (when comparing for equality).

Fixes #34165.

ranma42 · 2024-07-05T19:45:57Z

This change can already take care of most of the worst offenders found in #34048 🥳

ranma42 · 2024-07-05T22:57:53Z

I'll add some tests that check this transformation specifically
EDIT: done 👍

roji · 2024-07-26T16:04:02Z

src/EFCore.Relational/Query/SqlNullabilityProcessor.cs

-        body = _sqlExpressionFactory.OrElse(
-            _sqlExpressionFactory.AndAlso(body, _sqlExpressionFactory.AndAlso(leftIsNotNull, rightIsNotNull)),
-            _sqlExpressionFactory.AndAlso(leftIsNull, rightIsNull));
+        if (leftNullable && rightNullable


@ranma42 can you please add a comment here explaining the logic, i.e. that duplication is bad except for columns, plus columns may make usage of indexes which arbitrary expressions (usually) won't?

I added 4c3a542 (#34172) which is aimed at addressing this

roji · 2024-07-27T12:49:30Z

test/EFCore.SqlServer.FunctionalTests/Query/JsonQuerySqlServerTest.cs

@@ -1003,7 +1003,10 @@ public override async Task Json_collection_index_in_predicate_using_constant(boo
            """
 SELECT [j].[Id]
 FROM [JsonEntitiesBasic] AS [j]
-WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') <> N'Foo' OR JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') IS NULL


This is a place where I'm a bit hesitant about this change... The SQL Server docs specifically document using an indexed computed column over JSON_VALUE as a way to speed up queries filtering inside a JSON document; unless I'm mistaken, these queries would likely stop using such an index if we switch to the CASE translation (maybe in this specific test it doesn't matter because of the inequality, but you get whar I'm saying).

In a perfect world, we'd vary our translation based on knowledge that an indexed computed column exists for this expression, but we're pretty far away from doing that at the moment.

Thoughts?

Yes, I believe it is very likely that the CASE translation will not take advantage of indexes, but I would expect the same to be true for the original version as well, as it is performing a <> comparison (maybe it would use the index to include all of the NULL values 🤔, but then it would still have to scan all of the non-null values and filter each of them).

For equality in predicates the translation should already be
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') = N'Foo'
which should effectively take advantage of indexes.

OK.

So I'm trying to understand whether there are cases - and which ones - in which this PR causes a perf regression because the switch to CASE doesn't use an index. If there are such cases (and after all, we do avoid the CASE translation for columns because of this), we should think carefully - I'm not sure whether the optimization to remove double evaluation for some cases outweighs the (potentially severe) regression triggered by not using an index. A conservative approach would wait until we could know more reliably whether an index would be used on an expression (e.g. because we're aware of expression indexes/indexed computed columns).

I know I'm being very cautious here, I'm thinking about the perf regressions brought about by the switch from IN+constants to OPENJSON in 8.0 - that change improved general perf for many queries, but also caused severe regressions for others.

Yes, there are cases in which the translation could cause a regression; the main one I can think of (which is the one currently avoided by the column handling) is the following (and similar ones):

.Where(e => !e.BoolA != e.NullableBoolB)

This is

SELECT "e"."Id" FROM "Entities1" AS "e" WHERE "e"."BoolA" = "e"."NullableBoolB" OR "e"."NullableBoolB" IS NULL

Sqlite (and litely other SQL providers) would take advantage of an index on NullableBoolB (assuming BoolA and NullableBoolB are actually columns from different tables).

When using the CASE, this becomes

SELECT "e"."Id" FROM "Entities1" AS "e" WHERE CASE WHEN "e"."BoolA" <> "e"."NullableBoolB" THEN 0 ELSE 1 END

and the index cannot be used anymore.

I pushed ranma42@ecdd12e to show what happens when the CASE transformation is used whenever it is valid.

With #34166 this could possibly affect a few more tests, but if I am not mistaken, this boolean comparison (negated-different-from) is the only case in which a "good" WHERE would regress (at least according to optimizations rules similar to those of sqlite).

ah, obviously you could also do the same on json values:

.Where(e => !e.MyJsonColumn.BoolA != e.MyOtherJsonColumn.NullableBoolB)

Maybe instead of checking for a simple column, the right check would be whether the emitted operand is = vs !=? (aka if the WHERE predicate has some chances of being optimized)

When comparing a nullable expression to a non-nullable one, a `NULL` result always represent a difference. This makes it possible to avoid duplicating the nullable expression by mapping the `NULL` result to a `FALSE` (when comparing for equality). Fixes dotnet#34165.

ranma42 · 2024-07-29T21:13:52Z

I pushed a new version of the branch to solve the merge conflicts.
As I was at it, I also changed the logic behind the activation of the CASE transformation; it now only activates if it is valid (not on nullable vs nullable) and it causes no de-optimization (aka it is only allowed on predicates if the comparison is an inequality).

ranma42 · 2024-07-29T21:16:56Z

test/EFCore.SqlServer.FunctionalTests/Query/NorthwindMiscellaneousQuerySqlServerTest.cs

-    WHEN [c].[Region] = N'ASK' AND [c].[Region] IS NOT NULL THEN CAST(1 AS bit)
+    WHEN [c].[Region] = N'ASK' THEN CAST(1 AS bit)


this is a nice side-effect, but we might want to ensure that this kind of optimization happens regardless of this PR (and possibly not only on comparisons 🤔 )

ranma42 force-pushed the avoid-equal-duplication-34165 branch 2 times, most recently from 8a4e1bf to 144b7e0 Compare July 13, 2024 06:55

roji reviewed Jul 26, 2024

View reviewed changes

roji reviewed Jul 27, 2024

View reviewed changes

ranma42 mentioned this pull request Jul 28, 2024

Propagate allowOptimizedExpansion to CASE results #34304

Merged

ranma42 added 2 commits July 29, 2024 21:42

Add test to check for (non-)duplication of equality subtrees

b5661ae

ranma42 force-pushed the avoid-equal-duplication-34165 branch from 4c3a542 to 4a9993e Compare July 29, 2024 20:16

Update baselines

aeca728

ranma42 force-pushed the avoid-equal-duplication-34165 branch from 4a9993e to aeca728 Compare July 29, 2024 20:40

ranma42 commented Jul 29, 2024

View reviewed changes

maumar assigned roji Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid duplicating complex expression in comparisons #34172

Avoid duplicating complex expression in comparisons #34172

ranma42 commented Jul 5, 2024

ranma42 commented Jul 5, 2024

ranma42 commented Jul 5, 2024 •

edited

Loading

roji Jul 26, 2024

ranma42 Jul 27, 2024

roji Jul 27, 2024

ranma42 Jul 27, 2024 •

edited

Loading

roji Jul 27, 2024

ranma42 Jul 27, 2024

ranma42 Jul 27, 2024 •

edited

Loading

ranma42 Jul 27, 2024

ranma42 commented Jul 29, 2024

ranma42 Jul 29, 2024

		WHEN [c].[Region] = N'ASK' AND [c].[Region] IS NOT NULL THEN CAST(1 AS bit)
		WHEN [c].[Region] = N'ASK' THEN CAST(1 AS bit)

Avoid duplicating complex expression in comparisons #34172

Are you sure you want to change the base?

Avoid duplicating complex expression in comparisons #34172

Conversation

ranma42 commented Jul 5, 2024

ranma42 commented Jul 5, 2024

ranma42 commented Jul 5, 2024 • edited Loading

roji Jul 26, 2024

Choose a reason for hiding this comment

ranma42 Jul 27, 2024

Choose a reason for hiding this comment

roji Jul 27, 2024

Choose a reason for hiding this comment

ranma42 Jul 27, 2024 • edited Loading

Choose a reason for hiding this comment

roji Jul 27, 2024

Choose a reason for hiding this comment

ranma42 Jul 27, 2024

Choose a reason for hiding this comment

ranma42 Jul 27, 2024 • edited Loading

Choose a reason for hiding this comment

ranma42 Jul 27, 2024

Choose a reason for hiding this comment

ranma42 commented Jul 29, 2024

ranma42 Jul 29, 2024

Choose a reason for hiding this comment

ranma42 commented Jul 5, 2024 •

edited

Loading

ranma42 Jul 27, 2024 •

edited

Loading

ranma42 Jul 27, 2024 •

edited

Loading