Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-3539: Pass required distribution through trivial filter #514

Closed
wants to merge 24 commits into from

Conversation

RekGRpth
Copy link
Member

@RekGRpth RekGRpth commented Apr 26, 2023

The condition for checking the requested distribution does not work in case of inlining the outer CTE by a filter, because the filter replaces the requested distribution.

CREATE TABLE d (a int, b int, c int) DISTRIBUTED BY (b);
CREATE TABLE r (a int, b int, c char(255)) DISTRIBUTED REPLICATED;

insert into d select 1,generate_series(1,1),1;
insert into r select 1,2,generate_series(1,1);

EXPLAIN (ANALYZE off, COSTS off, VERBOSE off)
with s as (WITH e AS (
    SELECT b FROM d LIMIT 2
), h AS (
    SELECT a FROM d JOIN e f USING (b) JOIN e USING (b)
) SELECT * FROM r JOIN h USING (a) JOIN h i USING (a)) select * from s;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice7; segments: 1)
   ->  Sequence
         ->  Shared Scan (share slice:id 7:1)
               ->  Materialize
                     ->  Redistribute Motion 1:1  (slice6)
                           ->  Limit
                                 ->  Gather Motion 3:1  (slice5; segments: 3)
                                       ->  Seq Scan on d d_1
         ->  Sequence
               ->  Shared Scan (share slice:id 7:2)
                     ->  Materialize
                           ->  Hash Join
                                 Hash Cond: (d.b = share1_ref2.b)
                                 ->  Hash Join
                                       Hash Cond: (d.b = share1_ref3.b)
                                       ->  Seq Scan on d
                                       ->  Hash
                                             ->  Broadcast Motion 3:1  (slice3; segments: 3)
                                                   ->  Shared Scan (share slice:id 3:1)
                                 ->  Hash
                                       ->  Broadcast Motion 3:1  (slice4; segments: 3)
                                             ->  Shared Scan (share slice:id 4:1)
               ->  Hash Join
                     Hash Cond: ((r.a = share2_ref3.a) AND (r.a = share2_ref2.a))
                     ->  Seq Scan on r
                     ->  Hash
                           ->  Broadcast Motion 3:1  (slice2; segments: 3)
                                 ->  Hash Join
                                       Hash Cond: (share2_ref3.a = share2_ref2.a)
                                       ->  Shared Scan (share slice:id 2:2)
                                       ->  Hash
                                             ->  Broadcast Motion 3:3  (slice1; segments: 3)
                                                   ->  Shared Scan (share slice:id 1:2)
 Optimizer: Pivotal Optimizer (GPORCA)
(34 rows)

EXPLAIN (ANALYZE on, COSTS off, VERBOSE off)
with s as (WITH e AS (
    SELECT b FROM d LIMIT 2
), h AS (
    SELECT a FROM d JOIN e f USING (b) JOIN e USING (b)
) SELECT * FROM r JOIN h USING (a) JOIN h i USING (a)) select * from s;
ERROR:  SendTupleChunkToAMS: targetRoute is 2, must be between 0 and 1 . (ic_common.c:328)  (entry db 172.19.0.2:6432 pid=352094) (ic_common.c:328)
HINT:  Process 352094 will wait for gp_debug_linger=120 seconds before termination.
Note that its locks and other resources will not be released until then.
Physical plan: 
+--CPhysicalMotionGather(master)   rows:1   width:305  rebinds:1   cost:3017.007210   origin: [Grp:1, GrpExpr:3]
   +--CPhysicalFilter   rows:1   width:305  rebinds:1   cost:3017.004225   origin: [Grp:1, GrpExpr:2]
      |--CPhysicalSequence   rows:1   width:305  rebinds:1   cost:3017.004225   origin: [Grp:0, GrpExpr:3]
      |  |--CPhysicalCTEProducer (1), Columns: ["b" (1)]   rows:1   width:42  rebinds:1   cost:431.000040   origin: [Grp:42, GrpExpr:1]
      |  |  +--CPhysicalMotionRandom   rows:1   width:42  rebinds:1   cost:431.000039   origin: [Grp:41, GrpExpr:4]
      |  |     +--CPhysicalLimit <empty> global   rows:1   width:42  rebinds:1   cost:431.000029   origin: [Grp:41, GrpExpr:3]
      |  |        |--CPhysicalMotionGather(master)   rows:1   width:42  rebinds:1   cost:431.000025   origin: [Grp:38, GrpExpr:2]
      |  |        |  +--CPhysicalTableScan "d" ("d")   rows:1   width:42  rebinds:1   cost:431.000008   origin: [Grp:38, GrpExpr:1]
      |  |        |--CScalarConst (0)   origin: [Grp:39, GrpExpr:0]
      |  |        +--CScalarConst (2)   origin: [Grp:40, GrpExpr:0]
      |  +--CPhysicalSequence   rows:1   width:397  rebinds:1   cost:2586.003918   origin: [Grp:13, GrpExpr:2]
      |     |--CPhysicalCTEProducer (2), Columns: ["a" (10)]   rows:1   width:126  rebinds:1   cost:1293.000822   origin: [Grp:31, GrpExpr:1]
      |     |  +--CPhysicalInnerHashJoin   rows:1   width:126  rebinds:1   cost:1293.000821   origin: [Grp:30, GrpExpr:16]
      |     |     |--CPhysicalInnerHashJoin   rows:1   width:84  rebinds:1   cost:862.000492   origin: [Grp:32, GrpExpr:3]
      |     |     |  |--CPhysicalTableScan "d" ("d")   rows:1   width:42  rebinds:1   cost:431.000023   origin: [Grp:20, GrpExpr:1]
      |     |     |  |--CPhysicalMotionBroadcast    rows:1   width:42  rebinds:1   cost:431.000075   origin: [Grp:21, GrpExpr:2]
      |     |     |  |  +--CPhysicalCTEConsumer (1), Columns: ["b" (20)]   rows:1   width:42  rebinds:1   cost:431.000003   origin: [Grp:21, GrpExpr:1]
      |     |     |  +--CScalarCmp (=)   origin: [Grp:25, GrpExpr:0]
      |     |     |     |--CScalarIdent "b" (11)   origin: [Grp:23, GrpExpr:0]
      |     |     |     +--CScalarIdent "b" (20)   origin: [Grp:24, GrpExpr:0]
      |     |     |--CPhysicalMotionBroadcast    rows:1   width:42  rebinds:1   cost:431.000075   origin: [Grp:22, GrpExpr:2]
      |     |     |  +--CPhysicalCTEConsumer (1), Columns: ["b" (30)]   rows:1   width:42  rebinds:1   cost:431.000003   origin: [Grp:22, GrpExpr:1]
      |     |     +--CScalarCmp (=)   origin: [Grp:27, GrpExpr:0]
      |     |        |--CScalarIdent "b" (11)   origin: [Grp:23, GrpExpr:0]
      |     |        +--CScalarIdent "b" (30)   origin: [Grp:26, GrpExpr:0]
      |     +--CPhysicalInnerHashJoin   rows:1   width:397  rebinds:1   cost:1293.002829   origin: [Grp:12, GrpExpr:14]
      |        |--CPhysicalTableScan "r" ("r")   rows:1   width:297  rebinds:1   cost:431.000163   origin: [Grp:2, GrpExpr:1]
      |        |--CPhysicalMotionBroadcast    rows:1   width:100  rebinds:1   cost:862.000480   origin: [Grp:17, GrpExpr:5]
      |        |  +--CPhysicalInnerHashJoin   rows:1   width:100  rebinds:1   cost:862.000337   origin: [Grp:17, GrpExpr:3]
      |        |     |--CPhysicalCTEConsumer (2), Columns: ["a" (120)]   rows:1   width:50  rebinds:1   cost:431.000003   origin: [Grp:3, GrpExpr:1]
      |        |     |--CPhysicalMotionBroadcast    rows:1   width:50  rebinds:1   cost:431.000075   origin: [Grp:4, GrpExpr:2]
      |        |     |  +--CPhysicalCTEConsumer (2), Columns: ["a" (150)]   rows:1   width:50  rebinds:1   cost:431.000003   origin: [Grp:4, GrpExpr:1]
      |        |     +--CScalarCmp (=)   origin: [Grp:10, GrpExpr:0]
      |        |        |--CScalarIdent "a" (120)   origin: [Grp:6, GrpExpr:0]
      |        |        +--CScalarIdent "a" (150)   origin: [Grp:8, GrpExpr:0]
      |        +--CScalarBoolOp (EboolopAnd)   origin: [Grp:18, GrpExpr:0]
      |           |--CScalarCmp (=)   origin: [Grp:7, GrpExpr:0]
      |           |  |--CScalarIdent "a" (110)   origin: [Grp:5, GrpExpr:0]
      |           |  +--CScalarIdent "a" (120)   origin: [Grp:6, GrpExpr:0]
      |           +--CScalarCmp (=)   origin: [Grp:9, GrpExpr:0]
      |              |--CScalarIdent "a" (110)   origin: [Grp:5, GrpExpr:0]
      |              +--CScalarIdent "a" (150)   origin: [Grp:8, GrpExpr:0]
      +--CScalarConst (1)   origin: [Grp:44, GrpExpr:0]

Therefore, for cases of CTE inlining by a filter, I added a flag that remembers this, and then if this flag is set, then the filter passes the requested distribution, and does not replace it.
All of this gives right plan

EXPLAIN (ANALYZE off, COSTS off, VERBOSE off)
with s as (WITH e AS (
    SELECT b FROM d LIMIT 2
), h AS (
    SELECT a FROM d JOIN e f USING (b) JOIN e USING (b)
) SELECT * FROM r JOIN h USING (a) JOIN h i USING (a)) select * from s;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Sequence
   ->  Shared Scan (share slice:id 0:1)
         ->  Materialize
               ->  Limit
                     ->  Gather Motion 3:1  (slice3; segments: 3)
                           ->  Seq Scan on d d_1
   ->  Sequence
         ->  Shared Scan (share slice:id 0:2)
               ->  Materialize
                     ->  Hash Join
                           Hash Cond: (d.b = share1_ref2.b)
                           ->  Hash Join
                                 Hash Cond: (d.b = share1_ref3.b)
                                 ->  Gather Motion 3:1  (slice2; segments: 3)
                                       ->  Seq Scan on d
                                 ->  Hash
                                       ->  Shared Scan (share slice:id 0:1)
                           ->  Hash
                                 ->  Shared Scan (share slice:id 0:1)
         ->  Hash Join
               Hash Cond: ((r.a = share2_ref3.a) AND (r.a = share2_ref2.a))
               ->  Gather Motion 1:1  (slice1; segments: 1)
                     ->  Seq Scan on r
               ->  Hash
                     ->  Hash Join
                           Hash Cond: (share2_ref3.a = share2_ref2.a)
                           ->  Shared Scan (share slice:id 0:2)
                           ->  Hash
                                 ->  Shared Scan (share slice:id 0:2)
 Optimizer: Pivotal Optimizer (GPORCA)
(30 rows)

@RekGRpth RekGRpth changed the base branch from adb-6.x-dev to ADBDEV-3539-tmp April 27, 2023 05:37
@RekGRpth RekGRpth changed the title ADBDEV-3539-11: ORCA produces bogus plan for queries with CTE during handling distribution for Sequence children ADBDEV-3539: ORCA produces bogus plan for queries with CTE during handling distribution for Sequence children May 12, 2023
@RekGRpth RekGRpth marked this pull request as ready for review May 12, 2023 05:36
@RekGRpth
Copy link
Member Author

I'll update the commit description if that description is clear.

@RekGRpth RekGRpth requested a review from a team May 12, 2023 05:37
@RekGRpth
Copy link
Member Author

After merging a branch, this branch will need to be rebased to the dev branch.

@RekGRpth RekGRpth changed the base branch from ADBDEV-3539-tmp to adb-6.x-dev June 14, 2023 10:09
@RekGRpth RekGRpth changed the base branch from adb-6.x-dev to ADBDEV-3539-tmp June 14, 2023 12:13
@RekGRpth RekGRpth changed the base branch from ADBDEV-3539-tmp to adb-6.x-dev June 14, 2023 12:13
…ution for Sequence children

The condition for checking the requested distribution does not work
in case of inlining the outer CTE by a filter, because the filter replaces
the requested distribution.

Therefore, for cases of CTE inlining by a filter, I added a flag that remembers
this, and then if this flag is set, then the filter passes the requested
distribution, and does not replace it.
@RekGRpth
Copy link
Member Author

RekGRpth commented Aug 3, 2023

There is a simplified version of the patch. It leads to the same results. But it requires the removal of two asserts.

@HustonMmmavr
Copy link

Should this patch fix the porblem with fallback to postgres optimizer of query, which you found at the review of ASBDEV-3888?

@RekGRpth
Copy link
Member Author

RekGRpth commented Aug 3, 2023

Should this patch fix the porblem with fallback to postgres optimizer of query, which you found at the review of ASBDEV-3888?

No, none of the CTEs are inlined by the filter in that query.

if (CDistributionSpec::EdtNonSingleton == pdsRequired->Edt() &&
!CDistributionSpecNonSingleton::PdsConvert(pdsRequired)
->FAllowReplicated())
if (FTrivial())
Copy link

@HustonMmmavr HustonMmmavr Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this condition is enough to make a decision that we should return the required distribution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this condition is enough to make a decision that we should return the required distribution?

This condition is sufficient to make a decision to pass the requested distribution through the filter.

@RekGRpth RekGRpth changed the title ADBDEV-3539: ORCA produces bogus plan for queries with CTE during handling distribution for Sequence children ADBDEV-3539: Pass required distribution through trivial filter Aug 8, 2023
@HustonMmmavr
Copy link

LGTM, but seems some words about m_trivial (renaming of variable, or some comments should be added at code)

@BenderArenadata
Copy link

Allure report

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/55880

@BenderArenadata
Copy link

Failed job Behave tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/677351

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/57890

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/760167

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/760168

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61010

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/924552

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/924553

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62823

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1011553

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1011554

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/69736

@RekGRpth
Copy link
Member Author

RekGRpth commented Jul 1, 2024

#980

@RekGRpth RekGRpth closed this Jul 1, 2024
@RekGRpth RekGRpth deleted the ADBDEV-3539-11 branch July 1, 2024 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants