Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-3951: Backport of "Implemented InPlaceUpdate to be used for updates made on non-distribution columns" #609

Open
wants to merge 2 commits into
base: adb-6.x-dev
Choose a base branch
from

Conversation

KnightMurloc
Copy link

@KnightMurloc KnightMurloc commented Sep 1, 2023

Implemented InPlaceUpdate to be used for updates made on non-distribution columns.
Currently, ORCA uses Split-Update for updates on both distribution
and non-distribution columns. With this commit,
ORCA uses an InPlaceUpdate whenever updates are made to
non-distribution columns or non-partition keys, and Split Update
if any of modified columns are either distribution or partition keys.
Consider below setup where we are updating
non-distibution column, b in the table foo.

create table foo(a int, b int); explain update foo set b=4;
ORCA produces plan with Split and Update nodes

 Update
   Output: foo.a, foo.b, (DMLAction), foo.ctid
   ->  Split
         Output: foo.a, foo.b, foo.ctid, foo.gp_segment_id, DMLAction
         ->  Result
               Output: foo.a, foo.b, 4, foo.ctid, foo.gp_segment_id
               ->  Seq Scan on public.foo
                     Output: foo.a, foo.b, foo.ctid, foo.gp_segment_id

There is no point in using a Split and Update for this as we are updating a
non-distribution column which do not require any redistribution. This
commit uses an InPlace Update to perform updates on non-distribution
columns like Planner. Below is the new plan produced with this commit.

New Plan

 Update
   Output: foo.a, "outer".b, foo.ctid
   ->  Result
         Output: foo.a, 4, foo.ctid, foo.gp_segment_id
         ->  Seq Scan on public.foo
               Output: foo.a, foo.ctid, foo.gp_segment_id

greenplum 6 specific changes:

  1. Some constructors have been changed because the list of arguments in 6X and 7X
    are different.
  2. fixed a bug in CParseHandlerPhysicalDML::startElement where preserve_oids_xml
    was used instead of fSplit, which could lead to SIGSEGV during DML node parsing.
  3. Changed create_index_hot test. Removed disabling the optimizer before updating
    since ORCA no longer uses split update in this case.
  4. Implement In-Place Update for DML node. Added the exec Update call in the DML
    node. Added partitioning selection in case of In-Place update by sorted table.
    Added a call to reconstructMatchingTupleSlot in execUpdate to update plans
    generated by ORCA to handle cases of dropped attributes.

original commit: https://github.com/greenplum-db/gpdb/commit/3ced85b65732728edff66fd9a2ce6d7485d65a06

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/53331

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/53341

@KnightMurloc KnightMurloc force-pushed the ADBDEV-3951 branch 3 times, most recently from 6d7cfac to ddc5dde Compare September 4, 2023 07:18
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/53456

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/53506

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/53536

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/53552

@RekGRpth
Copy link
Member

RekGRpth commented Sep 6, 2023

If it's a backport, then maybe you should have cherrypicked it to keep the authorship.

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/60486

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/890515

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/890516

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/890509

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/890510

@KnightMurloc KnightMurloc force-pushed the ADBDEV-3951 branch 3 times, most recently from 129e453 to a02d221 Compare December 20, 2023 08:17
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/60723

@BenderArenadata
Copy link

Failed job Orca unittests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/905014

@BenderArenadata
Copy link

Failed job ORCA code linter on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/905019

@BenderArenadata
Copy link

Failed job Orca unittests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/905013

@RekGRpth
Copy link
Member

RekGRpth commented Dec 20, 2023

implement execUpdate call for DML node
not compute actionColIdx for InPlace Update

If it doesn't apply to the specified backport, then might it be better to make it a separate commit(s) (perhaps a backport(s) of some other commit(s))?

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/905017

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/905018

@KnightMurloc
Copy link
Author

implement execUpdate call for DML node
not compute actionColIdx for InPlace Update

If it doesn't apply to the specified backport, then might it be better to make it a separate commit(s) (perhaps a backport(s) of some other commit(s))?

These changes are necessary for the patch to work. GP7 uses the Modify Table node instead of DML. I'm not sure if we want to port these changes.

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61589

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/958078

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/958079

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/958072

@RekGRpth

This comment was marked as resolved.

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/958073

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61705

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/962859

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/962861

KnightMurloc and others added 2 commits January 15, 2024 12:28
Added the exec Update call in the DML node. Added partitioning selection in case
of In-Place update by sorted table. added a call to reconstructMatchingTupleSlot
in execUpdate to update plans generated by ORCA to handle cases of dropped
attributes.
…tion columns.

Currently, ORCA uses Split-Update for updates on both distribution
and non-distribution columns. With this commit,
ORCA uses an InPlaceUpdate whenever updates are made to
non-distribution columns or non-partition keys, and Split Update
if any of modified columns are either distribution or partition keys.
Consider below setup where we are updating
non-distibution column, b in the table foo.

`
create table foo(a int, b int);
explain update foo set b=4;
`
ORCA produces plan with Split and Update nodes

```
Update on public.foo
   ->  Result
         Output: foo_1.a, foo_1.b, (DMLAction), foo_1.ctid, foo_1.gp_segment_id
         ->  Split
               Output: foo_1.a, foo_1.b, foo_1.ctid, foo_1.gp_segment_id, DMLAction
               ->  Seq Scan on public.foo foo_1
                     Output: foo_1.a, foo_1.b, 4, foo_1.ctid, foo_1.gp_segment_id
```

There is no point in using a Split and Update for this as we are updating a
non-distribution column which do not require any redistribution. This
commit uses an InPlace Update to perform updates on non-distribution
columns like Planner. Below is the new plan produced with this commit.

New Plan
```
Update on public.foo
   ->  Seq Scan on public.foo foo_1
         Output: foo_1.a, 4, foo_1.ctid, foo_1.gp_segment_id
 Optimizer: Pivotal Optimizer (GPORCA)
```

greenplum 6 specific changes:
1. Some constructors have been changed because the list of arguments in 6X and 7X
   are different.
2. fixed a bug in CParseHandlerPhysicalDML::startElement where preserve_oids_xml
   was used instead of fSplit, which could lead to SIGSEGV during DML node parsing.
3. Changed create_index_hot test. Removed disabling the optimizer before updating
   since ORCA no longer uses split update in this case.

(cherry picked from commit 3ced85b)
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61708

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/962989

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/962990

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/962981

/*
* Perform partition selection for in place update
*/
if (isUpdate && !AttributeNumberIsValid(plannode->actionColIdx))
Copy link
Member

@bimboterminator1 bimboterminator1 Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to perform partition selection in case of updating leaf partition. Do we?

* Perform partition selection for in place update
*/
if (isUpdate && !AttributeNumberIsValid(plannode->actionColIdx))
node->ps.state->es_result_relation_info =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests check this code branch?

checkPartitionUpdate(estate, slot, resultRelInfo);

if (planGen == PLANGEN_OPTIMIZER)
slot = reconstructMatchingTupleSlot(slot, resultRelInfo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests check this code branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants