ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration #110

RekGRpth · 2024-10-29T09:42:29Z

Make gprestore --resize-cluster use --jobs for parallel restoration

Users consider the current behavior to be incorrect, since the --jobs parameter
is used when creating a backup, which is not actually used during restoration.
Restoration works in a single instance of the gpbackup_helper agent (in the
--restore-agent mode).

In short, when restoring a backup using the gprestore utility with the
--resize-cluster option, which was previously made in parallel creation mode
(--jobs), the utility does not use the corresponding mode during restoration.
In this case, restoration occurs through gpbackup_helper, which is launched in
one instance of this process for each segment. Thus, one table is processed at
a time based on the list of table OIDs passed when gpbackup_helper is launched.

As a solution to the problem, this patch implements the launch of several
gpbackup_helper instances based on the value passed in the --jobs argument.

To do this:

The general list of table OIDs for restoration is distributed between
several instances of gpbackup_helper processes according to the specified value
of the --jobs parameter. Also this list is distributed between goroutines in
main process by same way.
Since the general list of table OIDs for restoration is sorted by size, the
distribution among gpbackup_helper instances and among goroutines is done
according to the principle of sequentially distributing one table OID to each
instance and goroutine. Thus, the load is expected to be more uniform. All
batches corresponding to the same table OID go to the same gpbackup_helper
instance and same goroutine.
An integer identifier of the "instance number" is added to the file name of
the list of table OIDs for restoration by gpbackup_helper instances. They are
not logically related in any way, the main thing is that its own list is passed
as an argument when starting gpbackup_helper.
The instance number is also added to the name of the script file
(scriptFile).
The instance number is also added to the name of the pipeFile file, error
file and skip file.
To create the initial list of pipes, one pipe is created for the first table
OID (according to the sorting of table OIDs by decreasing table size), which
will be processed by each instance of gpbackup_helper. Thus, at the time of
starting each instance of gpbackup_helper, the file system will already have
the first pipe corresponding to the first table OID for this instance.
When starting gpbackup_helper, the value of the --jobs parameter is taken
into account (in fact, the number of connections in the pool). The number of
launched instances corresponds to the value of the parameter. The value of the
path of the first pipe (pipeFile), the path to the agent startup script
(scriptFile), and the path to the list of table OIDs corresponding to this
instance are passed as arguments.
When deleting auxiliary files in DoCleanup, the files with the list of table
OIDs and the startup script files (scriptFile) are deleted.
For --single-data-file and --copy-queue-size modes, the current behavior
remains unchanged.

For more convenient management, all auxiliary files have been renamed. The
process pid is placed before the suffix. The word pipe has been removed from
the names of error and skip files.

New tests have been added and old ones have been adapted.

It is easier to view the changes with the "Hide whitespace" option enabled.

restore/data.go

silent-observer · 2024-10-30T05:53:10Z

The new test has failed, please look into this

filepath/filepath.go

helper/helper.go

filepath/filepath.go

RekGRpth · 2024-12-06T07:55:35Z

Shouldn't this error file name be updated as well?

yes, added

RekGRpth added 15 commits October 1, 2024 11:50

Implement MVP to paralellize gprestore

ca1c1db

fix test

e8b6c1a

fix backup and tests

fffe562

optimize

3c0a096

test

ee14bef

fix

733e6a6

fix

4f8a3ac

simplify

971d5ea

simplify

873b43b

rename

f1e5832

rename

6b56776

rename

70940ca

format

6f575f6

Merge branch 'master' into ADBDEV-6338

b164a27

Merge branch 'master' into ADBDEV-6599

b6b1705

RekGRpth mentioned this pull request Oct 29, 2024

ADBDEV-6338: Implement MVP to paralellize gprestore #106

Closed

RekGRpth changed the title ~~ADBDEV-6599: Parallelize of resize restore~~ ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration Oct 29, 2024

test

2476483

RekGRpth marked this pull request as ready for review October 29, 2024 10:13

silent-observer reviewed Oct 29, 2024

View reviewed changes

restore/data.go Show resolved Hide resolved

comment

39cf460

silent-observer previously approved these changes Oct 29, 2024

View reviewed changes

whitehawk reviewed Oct 30, 2024

View reviewed changes

restore/data.go Outdated Show resolved Hide resolved

optimize

defe56a

RekGRpth dismissed silent-observer’s stale review via defe56a October 30, 2024 04:23

whitehawk previously approved these changes Oct 30, 2024

View reviewed changes

This comment was marked as resolved.

Sign in to view

stabelize tests

163d2b4

RekGRpth dismissed whitehawk’s stale review via 163d2b4 October 30, 2024 11:39

RekGRpth added 15 commits December 4, 2024 15:51

fix

0ac6864

rename

3988ff5

fix tests

0a18bad

error

ba750c4

fix

1caa858

fix test

d8a1ebe

fix test

8de1a68

fix test

39a7a3a

optimize

c91309b

simplify

91484d2

fix test

f519803

skip

4a11b18

skip

f590f04

error

62a0bd6

test

b9543d8

RekGRpth marked this pull request as ready for review December 5, 2024 05:30

RekGRpth added 3 commits December 5, 2024 13:44

Merge branch 'master' into ADBDEV-6599

5e4a44c

max helpers

8dd19ca

fix

f94440f

whitehawk reviewed Dec 6, 2024

View reviewed changes

filepath/filepath.go Show resolved Hide resolved

helper/helper.go Outdated Show resolved Hide resolved

filepath/filepath.go Outdated Show resolved Hide resolved

RekGRpth added 3 commits December 6, 2024 10:26

fatal when ivalid

6956a93

optimize

fa6222e

replace last only

04d77da

This comment was marked as resolved.

Sign in to view

fix

9a4a45c

silent-observer approved these changes Dec 6, 2024

View reviewed changes

whitehawk approved these changes Dec 6, 2024

View reviewed changes

RekGRpth merged commit 43624bc into master Dec 6, 2024
2 checks passed

RekGRpth deleted the ADBDEV-6599 branch December 6, 2024 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration #110

ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration #110

RekGRpth commented Oct 29, 2024 •

edited

Loading

silent-observer commented Oct 30, 2024

This comment was marked as resolved.

This comment was marked as resolved.

RekGRpth commented Dec 6, 2024

ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration #110

ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration #110

Conversation

RekGRpth commented Oct 29, 2024 • edited Loading

silent-observer commented Oct 30, 2024

This comment was marked as resolved.

This comment was marked as resolved.

RekGRpth commented Dec 6, 2024

RekGRpth commented Oct 29, 2024 •

edited

Loading