diff --git a/.abi-check/.images/download-report-from-gh-action.png b/.abi-check/.images/download-report-from-gh-action.png new file mode 100644 index 000000000000..7c89c8af1729 Binary files /dev/null and b/.abi-check/.images/download-report-from-gh-action.png differ diff --git a/.abi-check/6.25.3/postgres.symbols.ignore b/.abi-check/6.25.3/postgres.symbols.ignore new file mode 100644 index 000000000000..de1b4294eed9 --- /dev/null +++ b/.abi-check/6.25.3/postgres.symbols.ignore @@ -0,0 +1,2 @@ +DummySymbol +ConfigureNamesInt_gp diff --git a/.abi-check/6.25.3/postgres.types.ignore b/.abi-check/6.25.3/postgres.types.ignore new file mode 100644 index 000000000000..7dd4f899ba78 --- /dev/null +++ b/.abi-check/6.25.3/postgres.types.ignore @@ -0,0 +1 @@ +DummyType diff --git a/.abi-check/README.md b/.abi-check/README.md new file mode 100644 index 000000000000..eb82783c7e1d --- /dev/null +++ b/.abi-check/README.md @@ -0,0 +1,74 @@ +# Check the compatibility of Greenplum ABI. + +## Introduction + +We use the [`abi-dumper`](https://github.com/lvc/abi-dumper) and [`abi-compliance-checker`](https://github.com/lvc/abi-compliance-checker/) to check the Greenplum's ABI. We also use the [GitHub action](../.github/workflows/greenplum-abi-tests.yml) to automate this job. + +## Requirements + +`abi-dumper` requires the binary being compiled with `-Og -g3`, hence the `CFLAGS` for configuration looks like: + +```bash +## GCC's maybe-uninitialized checker may produce false positives with different +## levels of optimizations. To prevent building failures, we append the '-Wno-maybe-uninitialized' +## to the $CFLAGS as well. +CFLAGS='-Og -g3 -Wno-maybe-uninitialized' ./configure --with-xxx --with-yyy --with-zzz +``` + +## Check the ABI's compatibility + +Several binaries are shipped in Greenplum, e.g., `$GPHOME/bin/postgres`, `$GPHOME/lib/libpq.so`, etc. Since the `postgres` binary are referenced by many extensions, the ABI compatibility of it is the most important. The following steps illustrate how to check the ABI compatibility of the `postgres` binary. + +1. Dump the ABI information of one `postgres` binary. + ``` + abi-dumper $GPHOME/bin/postgres -lver -o + ``` + - ``: The version of the binary. You can give it some reasonable name, e.g., `6.25.3` to indicate the binary is built from '6.25.3' tag. + - ``: The file path for dumping the ABI information, e.g., `greenplum-6.25.3.dump` + +2. Dump the ABI information of another `postgres` binary (same as the step 1). + +3. Compare the ABI between these two binaries with `abi-compliance-checker`. + ``` + abi-compliance-checker \ + -lib \ + -old \ + -new + ``` + - ``: The name of the library, e.g., `postgres`. + +4. By default, the `abi-compliance-checker` will produce an HTML web page and there will be detailed information about ABI changes. + +## Ignore the "Safe ABI breaking change" + +There might be "safe ABI breaking changes", e.g., some symbol being removed and not referenced by any extensions or programs. Here are steps on how to suppress such errors. + +1. Add ignored symbols to `gpdb_src/.abi-check//postgres.symbols.ignore` (one symbol per line). + - ``: The baseline version of Greenplum. If we want to ensure the ABI isn't broken between the `6.25.3` release and the latest `6X_STABLE`. The baseline version of Greenplum is `6.25.3`. See: [./6.25.3/postgres.symbols.ignore](./6.25.3/postgres.symbols.ignore) + +2. Add ignored types to `gpdb_src/.abi-check//postgres.types.ignore` (one type per line). + - ``: The baseline version of Greenplum. If we want to ensure the ABI isn't broken between the `6.25.3` release and the latest `6X_STABLE`. The baseline version of Greenplum is `6.25.3`. See: [./6.25.3/postgres.types.ignore](./6.25.3/postgres.types.ignore) + +3. Pass these two files to `abi-compliance-checker` and it will produce a report in HTML format. + ``` + abi-compliance-checker -skip-symbols gpdb_src/.abi-check//postgres.symbols.ignore \ + -skip-types gpdb_src/.abi-check//postgres.types.ignore \ + -lib postgres \ + -old greenplum-.dump + -new greenplum-new.dump + ``` + It will produce a ABI report in `./compat_reports/postgres/X_to_Y/compat_report.html`. + +## View the ABI compatibility report + +### View the report locally + +You can either open the HTML report in your browser or dump it to stdout using `lynx -dump compat_reports/postgres/X_to_Y/compat_report.html`. + +## View the report from GitHub Action + +1. Navigate to the "Summary" page of the test. +2. Click the report and download it. +3. View the report as above. + +![./.images/download-report-from-gh-action.png](./.images/download-report-from-gh-action.png) diff --git a/.github/workflows/greenplum-abi-tests.yml b/.github/workflows/greenplum-abi-tests.yml new file mode 100644 index 000000000000..7bde532b21a5 --- /dev/null +++ b/.github/workflows/greenplum-abi-tests.yml @@ -0,0 +1,178 @@ +name: Greenplum ABI Tests + +on: + workflow_dispatch: + pull_request: + paths: + - 'concourse/scripts/**' + - 'src/**' + - '.github/workflows/**' + - '.github/scripts/**' + - '.abi-check/**' + + push: + branches: + - 6X_STABLE + paths: + - 'concourse/scripts/**' + - 'src/**' + - '.github/workflows/**' + - '.github/scripts/**' + - '.abi-check/**' + +jobs: + abi-dump-setup: + runs-on: ubuntu-latest + outputs: + BASELINE_REF: ${{ steps.vars.outputs.BASELINE_REF }} + BASELINE_VERSION: ${{ steps.vars.outputs.BASELINE_VERSION }} + ABI_LIBS: ${{ steps.vars.outputs.ABI_LIBS }} + ABI_HEADERS: ${{ steps.vars.outputs.ABI_HEADERS }} + steps: + - name: Fetch source + uses: actions/checkout@v3 + + - name: Get Greenplum version variables + id: vars + run: | + remote_repo='https://github.com/greenplum-db/gpdb.git' + git ls-remote --tags --refs --sort='v:refname' $remote_repo '6.*' | tail -n 1 > baseline_version_ref + baseline_ref=$(cat baseline_version_ref | awk '{print $1}') + baseline_version=$(cat baseline_version_ref | awk '{print $2}') + echo "BASELINE_REF=${baseline_ref}" | tee -a $GITHUB_OUTPUT + echo "BASELINE_VERSION=${baseline_version#'refs/tags/'}" | tee -a $GITHUB_OUTPUT + echo "ABI_LIBS=postgres" | tee -a $GITHUB_OUTPUT + echo "ABI_HEADERS=." | tee -a $GITHUB_OUTPUT + + - name: Upload symbol/type checking exception list + uses: actions/upload-artifact@v3 + with: + name: exception_lists + path: '.abi-check/${{ steps.vars.outputs.BASELINE_VERSION }}/' + + abi-dump: + needs: abi-dump-setup + runs-on: ubuntu-latest + container: gcr.io/data-gpdb-public-images/gpdb6-rocky8-build + strategy: + matrix: + name: + - build-baseline + - build-latest + include: + - name: build-baseline + repo: greenplum-db/gpdb + ref: ${{ needs.abi-dump-setup.outputs.BASELINE_VERSION }} + - name: build-latest + repo: ${{ github.repository }} + ref: ${{ github.sha }} + + steps: + ## FIXME: abi-dumper requires 'Universal Ctags' but the package manager only provides + ## 'Exuberant Ctags'. + - name: Install universal-ctags. + run: | + wget 'https://github.com/universal-ctags/ctags-nightly-build/releases/download/2023.07.05%2Bafdae39c0c2e508d113cbc570f4635b96159840c/uctags-2023.07.05-linux-x86_64.tar.xz' + tar -xf uctags-2023.07.05-linux-x86_64.tar.xz + cp uctags-2023.07.05-linux-x86_64/bin/* /usr/bin/ + which ctags + + - name: Download Greenplum source code + uses: actions/checkout@v3 + with: + repository: ${{ matrix.repo }} + ref: ${{ matrix.ref }} + submodules: recursive + fetch-depth: 0 # Specify '0' to fetch all history for all branches and tags. + path: gpdb_src + + - name: Install abi-dumper + run: | + yum install -y epel-release + yum install -y abi-dumper + + - name: Build Greenplum + run: | + ## TODO: Since abi-dumper requires debug info and it's hard to inject CFLAGS via the script for + ## releasing Greenplum, we have to manually configure it here. Probably we can improve it in future. + export PATH=/opt/python-3.9.13/bin:/opt/python-2.7.18/bin:$PATH + pushd gpdb_src + CC='gcc -m64' \ + CFLAGS='-Og -g3 -Wno-maybe-uninitialized' LDFLAGS='-Wl,--enable-new-dtags -Wl,--export-dynamic' \ + ./configure --with-quicklz --disable-gpperfmon --with-gssapi --enable-mapreduce --enable-orafce --enable-ic-proxy \ + --enable-orca --with-libxml --with-pythonsrc-ext --with-uuid=e2fs --with-pgport=5432 --enable-tap-tests \ + --enable-debug-extensions --with-perl --with-python --with-openssl --with-pam --with-ldap --with-includes="" \ + --with-libraries="" --disable-rpath \ + --prefix=/usr/local/greenplum-db-devel \ + --mandir=/usr/local/greenplum-db-devel/man + make -j`nproc` && make install + + - name: Dump ABI + run: | + abi-dumper -lver ${{ matrix.ref }} -skip-cxx -public-headers /usr/local/greenplum-db-devel/include/${{ needs.abi-dump-setup.outputs.ABI_HEADERS }} -o postgres-${{ matrix.ref }}.abi /usr/local/greenplum-db-devel/bin/postgres + + - name: Upload ABI files + uses: actions/upload-artifact@v3 + with: + name: ${{ matrix.name }} + path: '*${{ matrix.ref }}.abi' + + abi-compare: + needs: + - abi-dump-setup + - abi-dump + runs-on: ubuntu-latest + container: gcr.io/data-gpdb-public-images/gpdb6-rocky8-build + steps: + - name: Download baseline + uses: actions/download-artifact@v3 + with: + name: build-baseline + path: build-baseline/ + - name: Download latest + uses: actions/download-artifact@v3 + with: + name: build-latest + path: build-latest/ + + - name: Download exception lists + uses: actions/download-artifact@v3 + with: + name: exception_lists + path: exception_lists/ + + - name: Install abi-compliance-checker and report viewer (lynx) + run: | + yum install -y epel-release + yum install -y abi-compliance-checker + yum install -y --enablerepo=powertools lynx + + - name: Compare ABI + run: | + SKIP_POSTGRES_SYMBOLS_LIST="exception_lists/postgres.symbols.ignore" + SKIP_POSTGRES_SYMBOLS_OPTION="" + if [[ -f "$SKIP_POSTGRES_SYMBOLS_LIST" ]]; then + SKIP_POSTGRES_SYMBOLS_OPTION="-skip-symbols ${SKIP_POSTGRES_SYMBOLS_LIST}" + fi + SKIP_POSTGRES_TYPES_LIST="exception_lists/postgres.types.ignore" + SKIP_POSTGRES_TYPES_OPTION="" + if [[ -f "$SKIP_POSTGRES_TYPES_LIST" ]]; then + SKIP_POSTGRES_TYPES_OPTION="-skip-types ${SKIP_POSTGRES_TYPES_LIST}" + fi + abi-compliance-checker ${SKIP_POSTGRES_SYMBOLS_OPTION} \ + ${SKIP_POSTGRES_TYPES_OPTION} \ + -lib postgres \ + -old build-baseline/postgres*.abi \ + -new build-latest/postgres*.abi + + - name: Print out ABI report + if: always() + run: | + lynx -dump $(find compat_reports/ | grep html) + + - name: Upload ABI Comparison + if: always() + uses: actions/upload-artifact@v3 + with: + name: compat-report-${{ github.sha }} + path: compat_reports/ diff --git a/arenadata/Dockerfile b/arenadata/Dockerfile index ee0fb40d6a8c..eab478296971 100644 --- a/arenadata/Dockerfile +++ b/arenadata/Dockerfile @@ -57,7 +57,8 @@ RUN yum -y install centos-release-scl && \ echo -e 'source /opt/rh/devtoolset-7/enable' >> /opt/gcc_env.sh && \ echo -e '#!/bin/sh' >> /etc/profile.d/jdk_home.sh && \ echo -e 'export JAVA_HOME=/etc/alternatives/java_sdk' >> /etc/profile.d/jdk_home.sh && \ - echo -e 'export PATH=$JAVA_HOME/bin:$PATH' >> /etc/profile.d/jdk_home.sh + echo -e 'export PATH=$JAVA_HOME/bin:$PATH' >> /etc/profile.d/jdk_home.sh && \ + echo -e 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf RUN rpm -i $sigar && rpm -i $sigar_headers diff --git a/concourse/pipelines/gpdb_6X_STABLE-generated.yml b/concourse/pipelines/gpdb_6X_STABLE-generated.yml index ba6ddeedd009..125c53216214 100644 --- a/concourse/pipelines/gpdb_6X_STABLE-generated.yml +++ b/concourse/pipelines/gpdb_6X_STABLE-generated.yml @@ -12,7 +12,7 @@ ## file (example: templates/gpdb-tpl.yml) and regenerate the pipeline ## using appropriate tool (example: gen_pipeline.py -t prod). ## ---------------------------------------------------------------------- -## Generated by gen_pipeline.py at: 2023-09-06 12:50:28.297086 +## Generated by gen_pipeline.py at: 2023-10-31 10:46:55.243925 ## Template file: gpdb-tpl.yml ## OS Types: ## Test Sections: ['icw', 'cli', 'aa', 'release'] @@ -1348,7 +1348,7 @@ jobs: <<: *ccp_default_params vars: <<: *ccp_default_vars - instance_type: n1-standard-4 + instance_type: n1-highmem-4 number_of_nodes: 2 - task: gen_cluster file: ccp_src/ci/tasks/gen_cluster.yml @@ -1436,7 +1436,6 @@ jobs: - unit_tests_gporca_rocky8 - gpdb_pitr_rocky8 - interconnect_rocky8 - - icw_extensions_gpcloud_rocky8 - gpexpand_rocky8 - pg_upgrade_rocky8 - get: gpdb_src @@ -1454,7 +1453,6 @@ jobs: - unit_tests_gporca_rocky8 - gpdb_pitr_rocky8 - interconnect_rocky8 - - icw_extensions_gpcloud_rocky8 - gpexpand_rocky8 - pg_upgrade_rocky8 trigger: true diff --git a/concourse/pipelines/templates/gpdb-tpl.yml b/concourse/pipelines/templates/gpdb-tpl.yml index 85e769d28c5c..d0eabee5f4b4 100644 --- a/concourse/pipelines/templates/gpdb-tpl.yml +++ b/concourse/pipelines/templates/gpdb-tpl.yml @@ -849,8 +849,8 @@ resources: type: time source: location: America/Los_Angeles - start: ((reduced-frequency-trigger-start-[[ os_type ]])) - stop: ((reduced-frequency-trigger-stop-[[ os_type ]])) + start: (("reduced-frequency-trigger-start-[[ os_type ]]")) + stop: (("reduced-frequency-trigger-stop-[[ os_type ]]")) {% if os_type != "centos7" %} days: [Monday] {% else %} @@ -1718,7 +1718,7 @@ jobs: <<: *ccp_default_params vars: <<: *ccp_default_vars - instance_type: n1-standard-4 + instance_type: n1-highmem-4 number_of_nodes: 2 - task: gen_cluster file: ccp_src/ci/tasks/gen_cluster.yml @@ -1811,7 +1811,6 @@ jobs: - unit_tests_gporca_[[ os_type ]] - gpdb_pitr_[[ os_type ]] - interconnect_[[ os_type ]] - - icw_extensions_gpcloud_[[ os_type ]] - gpexpand_[[ os_type ]] - pg_upgrade_[[ os_type ]] - get: gpdb_src @@ -1835,7 +1834,6 @@ jobs: - unit_tests_gporca_[[ os_type ]] - gpdb_pitr_[[ os_type ]] - interconnect_[[ os_type ]] - - icw_extensions_gpcloud_[[ os_type ]] - gpexpand_[[ os_type ]] - pg_upgrade_[[ os_type ]] trigger: true diff --git a/concourse/scripts/verify_gpdb_versions.bash b/concourse/scripts/verify_gpdb_versions.bash index a79b579f94b4..5955f2b8adf5 100755 --- a/concourse/scripts/verify_gpdb_versions.bash +++ b/concourse/scripts/verify_gpdb_versions.bash @@ -20,8 +20,6 @@ assert_postgres_version_matches() { fi } -yum -d0 -y install git - GREENPLUM_INSTALL_DIR=/usr/local/greenplum-db-devel GPDB_SRC_SHA=$(cd gpdb_src && git rev-parse HEAD) diff --git a/concourse/tasks/verify_gpdb_versions.yml b/concourse/tasks/verify_gpdb_versions.yml index e5b013e8e1f1..11016bea0053 100644 --- a/concourse/tasks/verify_gpdb_versions.yml +++ b/concourse/tasks/verify_gpdb_versions.yml @@ -4,8 +4,8 @@ platform: linux image_resource: type: registry-image source: - repository: centos - tag: 7 + repository: gcr.io/data-gpdb-public-images/gpdb6-rocky8-build + tag: latest inputs: - name: gpdb_src diff --git a/gpAux/Makefile b/gpAux/Makefile index 350007f47e11..7cacd61a1d87 100644 --- a/gpAux/Makefile +++ b/gpAux/Makefile @@ -604,8 +604,8 @@ copylibs : echo "INFO: Python not found on this platform, $(BLD_ARCH), not copying it into the GPDB package."; \ fi # Create the python3.9 directory to flag to build scripts that python has been handled - mkdir -p $(INSTLOC)/ext/python3.9 @if [ ! -z "$(PYTHONHOME39)" ]; then \ + mkdir -p $(INSTLOC)/ext/python3.9; \ echo "Copying python3.9, ., from $(PYTHONHOME39) into $(INSTLOC)/ext/python3.9..."; \ (cd $(PYTHONHOME39) && tar cf - .) | (cd $(INSTLOC)/ext/python3.9/ && tar xpf -); \ echo "...DONE"; \ diff --git a/gpAux/extensions/pgbouncer/source b/gpAux/extensions/pgbouncer/source index 331c06ed27a8..cbbdde1aa631 160000 --- a/gpAux/extensions/pgbouncer/source +++ b/gpAux/extensions/pgbouncer/source @@ -1 +1 @@ -Subproject commit 331c06ed27a89fd0d460552713b852b8b6bc9d3d +Subproject commit cbbdde1aa631256294336da5a05f4c8519b1c964 diff --git a/gpMgmt/bin/Makefile b/gpMgmt/bin/Makefile index 5540bd6efe46..ced907b69490 100644 --- a/gpMgmt/bin/Makefile +++ b/gpMgmt/bin/Makefile @@ -7,7 +7,7 @@ ifneq "$(wildcard $(top_builddir)/src/Makefile.global)" "" include $(top_builddir)/src/Makefile.global endif -SUBDIRS = stream gpcheckcat_modules gpconfig_modules gpssh_modules gppylib lib +SUBDIRS = stream gpcheckcat_modules gpconfig_modules gpssh_modules gppylib lib el8_migrate_locale SUBDIRS += ifaddrs $(recurse) diff --git a/gpMgmt/bin/analyzedb b/gpMgmt/bin/analyzedb index 800d79e4d3f9..ac110ea6035e 100755 --- a/gpMgmt/bin/analyzedb +++ b/gpMgmt/bin/analyzedb @@ -58,7 +58,7 @@ WHERE pp.paristemplate = false AND pp.parrelid = cl.oid AND pr1.paroid = pp.oid GET_ALL_DATA_TABLES_SQL = """ select n.nspname as schemaname, c.relname as tablename from pg_class c, pg_namespace n where -c.relnamespace = n.oid and c.relkind='r'::char and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') and c.oid not in (select reloid from pg_exttable) +c.relnamespace = n.oid and (c.relkind='r'::char or c.relkind = 'm'::char) and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') and c.oid not in (select reloid from pg_exttable) EXCEPT select distinct schemaname, tablename from (%s) AS pps1 EXCEPT @@ -67,7 +67,7 @@ select distinct partitionschemaname, parentpartitiontablename from (%s) AS pps2 GET_VALID_DATA_TABLES_SQL = """ select n.nspname as schemaname, c.relname as tablename from pg_class c, pg_namespace n where -c.relnamespace = n.oid and c.oid in (%s) and c.relkind='r'::char and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') and c.oid not in (select reloid from pg_exttable) +c.relnamespace = n.oid and c.oid in (%s) and (c.relkind='r'::char or c.relkind = 'm'::char) and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') and c.oid not in (select reloid from pg_exttable) """ GET_REQUESTED_AO_DATA_TABLE_INFO_SQL = """ @@ -91,7 +91,7 @@ GET_REQUESTED_LAST_OP_INFO_SQL = """ GET_ALL_DATA_TABLES_IN_SCHEMA_SQL = """ select n.nspname as schemaname, c.relname as tablename from pg_class c, pg_namespace n where -c.relnamespace = n.oid and c.relkind='r'::char and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') and c.oid not in (select reloid from pg_exttable) +c.relnamespace = n.oid and (c.relkind='r'::char or c.relkind = 'm'::char) and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') and c.oid not in (select reloid from pg_exttable) and n.nspname = '%s' EXCEPT select distinct schemaname, tablename from (%s) AS pps1 @@ -112,7 +112,7 @@ select distinct partitionschemaname, parentpartitiontablename from (%s) AS pps1 GET_REQUESTED_NON_AO_TABLES_SQL = """ select n.nspname as schemaname, c.relname as tablename from pg_class c, pg_namespace n where -c.relnamespace = n.oid and c.relkind='r'::char and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') +c.relnamespace = n.oid and (c.relkind='r'::char or c.relkind = 'm'::char) and (c.relnamespace >= 16384 or n.nspname = 'public' or n.nspname = 'pg_catalog') and c.oid not in (select relid from pg_appendonly) and c.oid in (%s) and c.oid not in (select reloid from pg_exttable) EXCEPT select distinct schemaname, tablename from (%s) AS pps1 @@ -565,7 +565,7 @@ class AnalyzeDb(Operation): At the same time, parse the requested columns and populate the col_dict. If a requested table is partitioned, expand all the leaf partitions. """ - logger.info("Getting and verifying input tables...") + logger.info("Getting and verifying input tables and materialized views...") if self.single_table: # Check that the table name given on the command line is schema-qualified. diff --git a/gpMgmt/bin/el8_migrate_locale/Makefile b/gpMgmt/bin/el8_migrate_locale/Makefile new file mode 100644 index 000000000000..a5136f05c68c --- /dev/null +++ b/gpMgmt/bin/el8_migrate_locale/Makefile @@ -0,0 +1,17 @@ +# gpMgmt/bin/el8_migrate_locale/Makefile + +top_builddir = ../../.. +include $(top_builddir)/src/Makefile.global + +installdirs: + $(MKDIR_P) '$(DESTDIR)$(bindir)/el8_migrate_locale' + +install: installdirs + $(INSTALL_SCRIPT) el8_migrate_locale.py '$(DESTDIR)$(bindir)/el8_migrate_locale/'; + $(INSTALL_SCRIPT) README.md '$(DESTDIR)$(bindir)/el8_migrate_locale/'; + +uninstall: + rm -rf '$(DESTDIR)$(bindir)/el8_migrate_locale/'; + +clean distclean: + rm -f *.pyc diff --git a/gpMgmt/bin/el8_migrate_locale/README.md b/gpMgmt/bin/el8_migrate_locale/README.md new file mode 100644 index 000000000000..844203bf6baa --- /dev/null +++ b/gpMgmt/bin/el8_migrate_locale/README.md @@ -0,0 +1,213 @@ +1. use `python el8_migrate_locale.py precheck-index` to list affected indexes. +2. use `python el8_migrate_locale.py precheck-table` to list affected partitioned tables. +3. use `python el8_migrate_locale.py migrate` to run the reindex and alter partition table commands. + +(Note: For easier reading, some example output is omitted with ellipses.) + +``` +$ python el8_migrate_locale.py --help +usage: el8_migrate_locale [-h] [--host HOST] [--port PORT] + [--dbname DBNAME] [--user USER] + {precheck-index,precheck-table,migrate} ... + +positional arguments: + {precheck-index,precheck-table,migrate} + sub-command help + precheck-index list affected index + precheck-table list affected tables + migrate run the reindex and the rebuild partition commands + +optional arguments: + -h, --help show this help message and exit + --host HOST Greenplum Database hostname + --port PORT Greenplum Database port + --dbname DBNAME Greenplum Database database name + --user USER Greenplum Database user name +``` +``` +$ python el8_migrate_locale.py precheck-index --help +usage: el8_migrate_locale precheck-index [-h] --out OUT + +optional arguments: + -h, --help show this help message and exit + +required arguments: + --out OUT outfile path for the reindex commands + +Example usage: + +$ python el8_migrate_locale.py precheck-index --out index.out +2023-10-18 11:04:13,944 - INFO - There are 2 catalog indexes that needs reindex when doing OS upgrade from EL7->EL8. +2023-10-18 11:04:14,001 - INFO - There are 7 user indexes in database test that needs reindex when doing OS upgrade from EL7->EL8. + +$ cat index.out +\c postgres +-- catalog indexrelid: 3597 | index name: pg_seclabel_object_index | table name: pg_seclabel | collname: default | indexdef: CREATE UNIQUE INDEX pg_seclabel_object_index ON pg_catalog.pg_seclabel USING btree (objoid, classoid, objsubid, provider) +reindex index pg_seclabel_object_index; + +-- catalog indexrelid: 3593 | index name: pg_shseclabel_object_index | table name: pg_shseclabel | collname: default | indexdef: CREATE UNIQUE INDEX pg_shseclabel_object_index ON pg_catalog.pg_shseclabel USING btree (objoid, classoid, provider) +reindex index pg_shseclabel_object_index; + +\c test +-- indexrelid: 16512 | index name: testupgrade.hash_idx1 | table name: testupgrade.hash_test1 | collname: default | indexdef: CREATE INDEX hash_idx1 ON testupgrade.hash_test1 USING btree (content) +reindex index testupgrade.hash_idx1; +... +``` +``` +$ python el8_migrate_locale.py precheck-table --help +usage: el8_migrate_locale precheck-table [-h] --out OUT [--pre_upgrade] + [--order_size_ascend] + [--nthread NTHREAD] + +optional arguments: + -h, --help show this help message and exit + --pre_upgrade check tables before os upgrade to EL8 + --order_size_ascend sort the tables by size in ascending order + --nthread NTHREAD the concurrent threads to check partition tables + +Notes: there is a new option pre_upgrade, which is used for step1 before OS upgrade, and it will print all the potential affected partition tables. + +Example usage for check before OS upgrade: +$ python el8_migrate_locale.py precheck-table --pre_upgrade --out table_pre_upgrade.out +2023-10-18 08:04:06,907 - INFO - There are 6 partitioned tables in database testupgrade that should be checked when doing OS upgrade from EL7->EL8. +2023-10-18 08:04:06,947 - WARNING - no default partition for testupgrade.partition_range_test_3 +2023-10-18 08:04:06,984 - WARNING - no default partition for testupgrade.partition_range_test_ao +2023-10-18 08:04:07,021 - WARNING - no default partition for testupgrade.partition_range_test_2 +2023-10-18 08:04:07,100 - WARNING - no default partition for testupgrade.root +--------------------------------------------- +total partition tables size : 416 KB +total partition tables : 6 +total leaf partitions : 19 +--------------------------------------------- + +Example usage for check after OS upgrade: +$ python el8_migrate_locale.py precheck-table --out table.out +2023-10-16 04:12:19,064 - WARNING - There are 2 tables in database test that the distribution key is using custom operator class, should be checked when doing OS upgrade from EL7->EL8. +--------------------------------------------- +tablename | distclass +('testdiskey', 16397) +('testupgrade.test_citext', 16454) +--------------------------------------------- +2023-10-16 04:12:19,064 - INFO - There are 6 partitioned tables in database testupgrade that should be checked when doing OS upgrade from EL7->EL8. +2023-10-16 04:12:19,066 - INFO - worker[0]: begin: +2023-10-16 04:12:19,066 - INFO - worker[0]: connect to ... +2023-10-16 04:12:19,110 - INFO - start checking table testupgrade.partition_range_test_3_1_prt_mar ... +2023-10-16 04:12:19,162 - INFO - check table testupgrade.partition_range_test_3_1_prt_mar OK. +2023-10-16 04:12:19,162 - INFO - start checking table testupgrade.partition_range_test_3_1_prt_feb ... +2023-10-16 04:12:19,574 - INFO - check table testupgrade.partition_range_test_3_1_prt_feb error out: ERROR: trying to insert row into wrong partition (seg1 10.0.138.96:20001 pid=3975) +DETAIL: Expected partition: partition_range_test_3_1_prt_mar, provided partition: partition_range_test_3_1_prt_feb. + +2023-10-16 04:12:19,575 - INFO - start checking table testupgrade.partition_range_test_3_1_prt_jan ... +2023-10-16 04:12:19,762 - INFO - check table testupgrade.partition_range_test_3_1_prt_jan error out: ERROR: trying to insert row into wrong partition (seg1 10.0.138.96:20001 pid=3975) +DETAIL: Expected partition: partition_range_test_3_1_prt_feb, provided partition: partition_range_test_3_1_prt_jan. + +2023-10-16 04:12:19,804 - WARNING - no default partition for testupgrade.partition_range_test_3 +... +2023-10-16 04:12:22,058 - INFO - Current progress: have 0 remaining, 2.77 seconds passed. +2023-10-16 04:12:22,058 - INFO - worker[0]: finish. +--------------------------------------------- +total partition tables size : 416 KB +total partition tables : 6 +total leaf partitions : 19 +--------------------------------------------- + +Example Usage for using nthreads (check passed example): +$ python el8_migrate_locale.py precheck-table --out table.out --nthread 3 +2023-10-18 11:19:11,717 - INFO - There are 4 partitioned tables in database test that should be checked when doing OS upgrade from EL7->EL8. +2023-10-18 11:19:11,718 - INFO - worker[0]: begin: +2023-10-18 11:19:11,718 - INFO - worker[0]: connect to ... +2023-10-18 11:19:11,718 - INFO - worker[1]: begin: +2023-10-18 11:19:11,719 - INFO - worker[1]: connect to ... +2023-10-18 11:19:11,718 - INFO - worker[2]: begin: +2023-10-18 11:19:11,719 - INFO - worker[2]: connect to ... +2023-10-18 11:19:11,744 - INFO - start checking table testupgrade.partition_range_test_1_1_prt_mar ... +2023-10-18 11:19:11,745 - INFO - start checking table testupgrade.partition_range_test_ao_1_prt_mar ... +2023-10-18 11:19:11,746 - INFO - start checking table testupgrade.partition_range_test_2_1_prt_mar ... +2023-10-18 11:19:11,749 - INFO - check table testupgrade.partition_range_test_1_1_prt_mar OK. +2023-10-18 11:19:11,749 - INFO - start checking table testupgrade.partition_range_test_1_1_prt_feb ... +2023-10-18 11:19:11,751 - INFO - check table testupgrade.partition_range_test_ao_1_prt_mar OK. +2023-10-18 11:19:11,751 - INFO - start checking table testupgrade.partition_range_test_ao_1_prt_feb ... +2023-10-18 11:19:11,751 - INFO - check table testupgrade.partition_range_test_2_1_prt_mar OK. +2023-10-18 11:19:11,751 - INFO - start checking table testupgrade.partition_range_test_2_1_prt_feb ... +2023-10-18 11:19:11,752 - INFO - check table testupgrade.partition_range_test_1_1_prt_feb OK. +2023-10-18 11:19:11,752 - INFO - start checking table testupgrade.partition_range_test_1_1_prt_others ... +2023-10-18 11:19:11,754 - INFO - check table testupgrade.partition_range_test_2_1_prt_feb OK. +2023-10-18 11:19:11,754 - INFO - start checking table testupgrade.partition_range_test_2_1_prt_jan ... +2023-10-18 11:19:11,755 - INFO - check table testupgrade.partition_range_test_1_1_prt_others OK. +2023-10-18 11:19:11,755 - INFO - check table testupgrade.partition_range_test_ao_1_prt_feb OK. +2023-10-18 11:19:11,755 - INFO - start checking table testupgrade.partition_range_test_ao_1_prt_jan ... +2023-10-18 11:19:11,756 - INFO - Current progress: have 1 remaining, 0.97 seconds passed. +2023-10-18 11:19:11,757 - INFO - check table testupgrade.partition_range_test_2_1_prt_jan OK. +2023-10-18 11:19:11,758 - INFO - Current progress: have 0 remaining, 0.99 seconds passed. +2023-10-18 11:19:11,758 - INFO - worker[2]: finish. +2023-10-18 11:19:11,761 - INFO - check table testupgrade.partition_range_test_ao_1_prt_jan OK. +2023-10-18 11:19:11,761 - INFO - Current progress: have 0 remaining, 1.07 seconds passed. +2023-10-18 11:19:11,761 - INFO - worker[1]: finish. +2023-10-18 11:19:11,763 - INFO - start checking table testupgrade.root_1_prt_mar ... +2023-10-18 11:19:11,766 - INFO - check table testupgrade.root_1_prt_mar OK. +2023-10-18 11:19:11,767 - INFO - start checking table testupgrade.root_1_prt_feb ... +2023-10-18 11:19:11,769 - INFO - check table testupgrade.root_1_prt_feb OK. +2023-10-18 11:19:11,770 - INFO - start checking table testupgrade.root_1_prt_jan ... +2023-10-18 11:19:11,772 - INFO - check table testupgrade.root_1_prt_jan OK. +2023-10-18 11:19:11,773 - INFO - Current progress: have 0 remaining, 1.4 seconds passed. +2023-10-18 11:19:11,773 - INFO - worker[0]: finish. +--------------------------------------------- +total partition tables size : 0 Bytes +total partition tables : 0 +total leaf partitions : 0 +--------------------------------------------- + +$ cat table.out +-- order table by size in descending order +\c testupgrade + +-- parrelid: 16649 | coll: 100 | attname: date | msg: partition table, 3 leafs, size 98304 +begin; create temp table "testupgrade.partition_range_test_3_bak" as select * from testupgrade.partition_range_test_3; truncate testupgrade.partition_range_test_3; insert into testupgrade.partition_range_test_3 select * from "testupgrade.partition_range_test_3_bak"; commit; +... + +``` +``` +$ python el8_migrate_locale.py migrate --help +usage: el8_migrate_locale migrate [-h] --input INPUT + +optional arguments: + -h, --help show this help message and exit + +required arguments: + --input INPUT the file contains reindex or rebuild partition commands + +Example usage for migrate index: +$ python el8_migrate_locale.py migrate --input index.out +2023-10-16 04:12:02,461 - INFO - db: testupgrade, total have 7 commands to execute +2023-10-16 04:12:02,467 - INFO - db: testupgrade, executing command: reindex index testupgrade.test_id1; +2023-10-16 04:12:02,541 - INFO - db: testupgrade, executing command: reindex index testupgrade.test_id2; +2023-10-16 04:12:02,566 - INFO - db: testupgrade, executing command: reindex index testupgrade.test_id3; +2023-10-16 04:12:02,592 - INFO - db: testupgrade, executing command: reindex index testupgrade.test_citext_pkey; +2023-10-16 04:12:02,623 - INFO - db: testupgrade, executing command: reindex index testupgrade.test_idx_citext; +2023-10-16 04:12:02,647 - INFO - db: testupgrade, executing command: reindex index testupgrade.hash_idx1; +2023-10-16 04:12:02,673 - INFO - db: testupgrade, executing command: reindex index testupgrade.idx_projecttag; +2023-10-16 04:12:02,692 - INFO - db: postgres, total have 2 commands to execute +2023-10-16 04:12:02,698 - INFO - db: postgres, executing command: reindex index pg_seclabel_object_index; +2023-10-16 04:12:02,730 - INFO - db: postgres, executing command: reindex index pg_shseclabel_object_index; +2023-10-16 04:12:02,754 - INFO - All done + +Example usage for migrate tables: +$ python el8_migrate_locale.py migrate --input table.out +2023-10-16 04:14:17,003 - INFO - db: testupgrade, total have 6 commands to execute +2023-10-16 04:14:17,009 - INFO - db: testupgrade, executing command: begin; create temp table "testupgrade.partition_range_test_3_bak" as select * from testupgrade.partition_range_test_3; truncate testupgrade.partition_range_test_3; insert into testupgrade.partition_range_test_3 select * from "testupgrade.partition_range_test_3_bak"; commit; +2023-10-16 04:14:17,175 - INFO - db: testupgrade, executing analyze command: analyze testupgrade.partition_range_test_3;; +2023-10-16 04:14:17,201 - INFO - db: testupgrade, executing command: begin; create temp table "testupgrade.partition_range_test_2_bak" as select * from testupgrade.partition_range_test_2; truncate testupgrade.partition_range_test_2; insert into testupgrade.partition_range_test_2 select * from "testupgrade.partition_range_test_2_bak"; commit; +2023-10-16 04:14:17,490 - ERROR - ERROR: no partition for partitioning key (seg1 10.0.138.96:20001 pid=4028) + +2023-10-16 04:14:17,497 - INFO - db: testupgrade, executing command: begin; create temp table "testupgrade.partition_range_test_4_bak" as select * from testupgrade.partition_range_test_4; truncate testupgrade.partition_range_test_4; insert into testupgrade.partition_range_test_4 select * from "testupgrade.partition_range_test_4_bak"; commit; +2023-10-16 04:14:17,628 - INFO - db: testupgrade, executing analyze command: analyze testupgrade.partition_range_test_4;; +2023-10-16 04:14:17,660 - INFO - db: testupgrade, executing command: begin; create temp table "testupgrade.partition_range_test_1_bak" as select * from testupgrade.partition_range_test_1; truncate testupgrade.partition_range_test_1; insert into testupgrade.partition_range_test_1 select * from "testupgrade.partition_range_test_1_bak"; commit; +2023-10-16 04:14:17,784 - INFO - db: testupgrade, executing analyze command: analyze testupgrade.partition_range_test_1;; +2023-10-16 04:14:17,808 - INFO - db: testupgrade, executing command: begin; create temp table "testupgrade.root_bak" as select * from testupgrade.root; truncate testupgrade.root; insert into testupgrade.root select * from "testupgrade.root_bak"; commit; +2023-10-16 04:14:17,928 - INFO - db: testupgrade, executing analyze command: analyze testupgrade.root;; +2023-10-16 04:14:17,952 - INFO - db: testupgrade, executing command: begin; create temp table "testupgrade.partition_range_test_ao_bak" as select * from testupgrade.partition_range_test_ao; truncate testupgrade.partition_range_test_ao; insert into testupgrade.partition_range_test_ao select * from "testupgrade.partition_range_test_ao_bak"; commit; +2023-10-16 04:14:18,276 - ERROR - ERROR: no partition for partitioning key (seg1 10.0.138.96:20001 pid=4060) + +2023-10-16 04:14:18,277 - INFO - All done +``` + diff --git a/gpMgmt/bin/el8_migrate_locale/el8_migrate_locale.py b/gpMgmt/bin/el8_migrate_locale/el8_migrate_locale.py new file mode 100644 index 000000000000..f52a762a8246 --- /dev/null +++ b/gpMgmt/bin/el8_migrate_locale/el8_migrate_locale.py @@ -0,0 +1,517 @@ +#!/usr/bin/env python +#!-*- coding: utf-8 -*- +import argparse +import sys +from pygresql.pg import DB +import logging +import signal +from multiprocessing import Queue +from threading import Thread, Lock +import time +import string +from collections import defaultdict +import os +import re +try: + from pygresql import pg +except ImportError, e: + sys.exit('ERROR: Cannot import modules. Please check that you have sourced greenplum_path.sh. Detail: ' + str(e)) + +class connection(object): + def __init__(self, host, port, dbname, user): + self.host = host + self.port = port + self.dbname = dbname + self.user = user + + def _get_pg_port(self, port): + if port is not None: + return port + try: + port = os.environ.get('PGPORT') + if not port: + port = self.get_port_from_conf() + return int(port) + except: + sys.exit("No port has been set, please set env PGPORT or MASTER_DATA_DIRECTORY or specify the port in the command line") + + def get_port_from_conf(self): + datadir = os.environ.get('MASTER_DATA_DIRECTORY') + if datadir: + file = datadir +'/postgresql.conf' + if os.path.isfile(file): + with open(file) as f: + for line in f.xreadlines(): + match = re.search('port=\d+',line) + if match: + match1 = re.search('\d+', match.group()) + if match1: + return match1.group() + + def get_default_db_conn(self): + db = DB(dbname=self.dbname, + host=self.host, + port=self._get_pg_port(self.port), + user=self.user) + return db + + def get_db_conn(self, dbname): + db = DB(dbname=dbname, + host=self.host, + port=self._get_pg_port(self.port), + user=self.user) + return db + + def get_db_list(self): + db = self.get_default_db_conn() + sql = "select datname from pg_database where datname not in ('template0');" + dbs = [datname for datname, in db.query(sql).getresult()] + db.close + return dbs + +class CheckIndexes(connection): + def get_affected_user_indexes(self, dbname): + db = self.get_db_conn(dbname) + # The built-in collatable data types are text,varchar,and char, and the indcollation contains the OID of the collation + # to use for the index, or zero if the column is not of a collatable data type. + sql = """ + SELECT distinct(indexrelid), indexrelid::regclass::text as indexname, indrelid::regclass::text as tablename, collname, pg_get_indexdef(indexrelid) +FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s +JOIN pg_collation c ON coll=c.oid +WHERE collname != 'C' and collname != 'POSIX' and indexrelid >= 16384; + """ + index = db.query(sql).getresult() + if index: + logger.info("There are {} user indexes in database {} that needs reindex when doing OS upgrade from EL7->EL8.".format(len(index), dbname)) + db.close() + return index + + def get_affected_catalog_indexes(self): + db = self.get_default_db_conn() + sql = """ + SELECT distinct(indexrelid), indexrelid::regclass::text as indexname, indrelid::regclass::text as tablename, collname, pg_get_indexdef(indexrelid) +FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s +JOIN pg_collation c ON coll=c.oid +WHERE collname != 'C' and collname != 'POSIX' and indexrelid < 16384; + """ + index = db.query(sql).getresult() + if index: + logger.info("There are {} catalog indexes that needs reindex when doing OS upgrade from EL7->EL8.".format(len(index))) + db.close() + return index + + def handle_one_index(self, name): + # no need to handle special charactor here, because the name will include the double quotes if it has special charactors. + sql = """ + reindex index {}; + """.format(name) + return sql.strip() + + def dump_index_info(self, fn): + dblist = self.get_db_list() + f = open(fn, "w") + + # print all catalog indexes that might be affected. + cindex = self.get_affected_catalog_indexes() + if cindex: + print>>f, "\c ", self.dbname + for indexrelid, indexname, tablename, collname, indexdef in cindex: + print>>f, "-- catalog indexrelid:", indexrelid, "| index name:", indexname, "| table name:", tablename, "| collname:", collname, "| indexdef: ", indexdef + print>>f, self.handle_one_index(indexname) + print>>f + + # print all user indexes in all databases that might be affected. + for dbname in dblist: + index = self.get_affected_user_indexes(dbname) + if index: + print>>f, "\c ", dbname + for indexrelid, indexname, tablename, collname, indexdef in index: + print>>f, "-- indexrelid:", indexrelid, "| index name:", indexname, "| table name:", tablename, "| collname:", collname, "| indexdef: ", indexdef + print>>f, self.handle_one_index(indexname) + print>>f + + f.close() + +class CheckTables(connection): + def __init__(self, host, port, dbname, user, order_size_ascend, nthread, pre_upgrade): + self.host = host + self.port = port + self.dbname = dbname + self.user = user + self.order_size_ascend = order_size_ascend + self.nthread = nthread + self.filtertabs = [] + self.filtertabslock = Lock() + self.total_leafs = 0 + self.total_roots = 0 + self.total_root_size = 0 + self.lock = Lock() + self.qlist = Queue() + self.pre_upgrade = pre_upgrade + signal.signal(signal.SIGTERM, self.sig_handler) + signal.signal(signal.SIGINT, self.sig_handler) + + def get_affected_partitioned_tables(self, dbname): + db = self.get_db_conn(dbname) + # The built-in collatable data types are text,varchar,and char, and the defined collation of the column, or zero if the column is not of a collatable data type + # filter the partition by list, because only partiton by range might be affected. + sql = """ + WITH might_affected_tables AS ( + SELECT + prelid, + coll, + attname, + attnum, + parisdefault + FROM + ( + select + p.oid as poid, + p.parrelid as prelid, + t.attcollation coll, + t.attname as attname, + t.attnum as attnum + from + pg_partition p + join pg_attribute t on p.parrelid = t.attrelid + and t.attnum = ANY(p.paratts :: smallint[]) + and p.parkind = 'r' + ) s + JOIN pg_collation c ON coll = c.oid + JOIN pg_partition_rule r ON poid = r.paroid + WHERE + collname != 'C' and collname != 'POSIX' + ), + par_has_default AS ( + SELECT + prelid, + coll, + attname, + parisdefault + FROM + might_affected_tables group by (prelid, coll, attname, parisdefault) + ) + select prelid, prelid::regclass::text as partitionname, coll, attname, bool_or(parisdefault) as parhasdefault from par_has_default group by (prelid, coll, attname) ; + """ + tabs = db.query(sql).getresult() + db.close() + return tabs + + # get the tables which distribution column is using custom operator class, it may be affected by the OS upgrade, so give a warning. + def get_custom_opclass_as_distribute_keys_tables(self, dbname): + db = self.get_db_conn(dbname) + sql = """ + select table_oid::regclass::text as tablename, max(distclass) from (select localoid , unnest(distclass::int[]) distclass from gp_distribution_policy) x(table_oid, distclass) group by table_oid having max(distclass) > 16384; + """ + tables = db.query(sql).getresult() + if tables: + logger.warning("There are {} tables in database {} that the distribution key is using custom operator class, should be checked when doing OS upgrade from EL7->EL8.".format(len(tables), dbname)) + print "---------------------------------------------" + print "tablename | distclass" + for t in tables: + print t + print "---------------------------------------------" + db.close() + + # Escape double-quotes in a string, so that the resulting string is suitable for + # embedding as in SQL. Analogouous to libpq's PQescapeIdentifier + def escape_identifier(self, str): + # Does the string need quoting? Simple strings with all-lower case ASCII + # letters don't. + SAFE_RE = re.compile('[a-z][a-z0-9_]*$') + + if SAFE_RE.match(str): + return str + + # Otherwise we have to quote it. Any double-quotes in the string need to be escaped + # by doubling them. + return '"' + str.replace('"', '""') + '"' + + def handle_one_table(self, name): + bakname = "{}".format(self.escape_identifier(name + "_bak")) + sql = """ + begin; create temp table {1} as select * from {0}; truncate {0}; insert into {0} select * from {1}; commit; + """.format(name, bakname) + return sql.strip() + + def get_table_size_info(self, dbname, parrelid): + db = self.get_db_conn(dbname) + sql_size = """ + with recursive cte(nlevel, table_oid) as ( + select 0, {}::regclass::oid + union all + select nlevel+1, pi.inhrelid + from cte, pg_inherits pi + where cte.table_oid = pi.inhparent + ) + select sum(pg_relation_size(table_oid)) as size, count(1) as nleafs + from cte where nlevel = (select max(nlevel) from cte); + """ + r = db.query(sql_size.format(parrelid)) + size = r.getresult()[0][0] + nleafs = r.getresult()[0][1] + self.lock.acquire() + self.total_root_size += size + self.total_leafs += nleafs + self.total_roots += 1 + self.lock.release() + db.close() + return "partition table, %s leafs, size %s" % (nleafs, size), size + + def dump_tables(self, fn): + dblist = self.get_db_list() + f = open(fn, "w") + + for dbname in dblist: + table_info = [] + # check tables that the distribution columns are using custom operator class + self.get_custom_opclass_as_distribute_keys_tables(dbname) + + # get all the might-affected partitioned tables + tables = self.get_affected_partitioned_tables(dbname) + + if tables: + logger.info("There are {} partitioned tables in database {} that should be checked when doing OS upgrade from EL7->EL8.".format(len(tables), dbname)) + # if check before os upgrade, it will print the SQL results and doesn't do the GUC check. + if self.pre_upgrade: + for parrelid, tablename, coll, attname, has_default_partition in tables: + # get the partition table size info to estimate the time + msg, size = self.get_table_size_info(dbname, parrelid) + table_info.append((parrelid, tablename, coll, attname, msg, size)) + # if no default partition, give a warning, in case of migrate failed + if has_default_partition == 'f': + logger.warning("no default partition for {}".format(tablename)) + else: + # start multiple threads to check if the rows are still in the correct partitions after os upgrade, if check failed, add these tables to filtertabs + for t in tables: + # qlist is used by multiple threads + self.qlist.put(t) + self.concurrent_check(dbname) + table_info = self.filtertabs[:] + self.filtertabs = [] + + # dump the table info to the specified output file + if table_info: + print>>f, "-- order table by size in %s order " % 'ascending' if self.order_size_ascend else '-- order table by size in descending order' + print>>f, "\c ", dbname + print>>f + + # sort the tables by size + if self.order_size_ascend: + self.filtertabs.sort(key=lambda x: x[-1], reverse=False) + else: + self.filtertabs.sort(key=lambda x: x[-1], reverse=True) + + for result in table_info: + parrelid = result[0] + name = result[1] + coll = result[2] + attname = result[3] + msg = result[4] + print>>f, "-- parrelid:", parrelid, "| coll:", coll, "| attname:", attname, "| msg:", msg + print>>f, self.handle_one_table(name) + print>>f + + # print the total partition table size + self.print_size_summary_info() + + f.close() + + def print_size_summary_info(self): + print "---------------------------------------------" + KB = float(1024) + MB = float(KB ** 2) + GB = float(KB ** 3) + if self.total_root_size < KB: + print("total partition tables size : {} Bytes".format(int(float(self.total_root_size)))) + elif KB <= self.total_root_size < MB: + print("total partition tables size : {} KB".format(int(float(self.total_root_size) / KB))) + elif MB <= self.total_root_size < GB: + print("total partition tables size : {} MB".format(int(float(self.total_root_size) / MB))) + else: + print("total partition tables size : {} GB".format(int(float(self.total_root_size) / GB))) + + print("total partition tables : {}".format(self.total_roots)) + print("total leaf partitions : {}".format(self.total_leafs)) + print "---------------------------------------------" + + # start multiple threads to do the check + def concurrent_check(self, dbname): + threads = [] + for i in range(self.nthread): + t = Thread(target=CheckTables.check_partitiontables_by_guc, + args=[self, i, dbname]) + threads.append(t) + for t in threads: + t.start() + for t in threads: + t.join() + + def sig_handler(self, sig, arg): + sys.stderr.write("terminated by signal %s\n" % sig) + sys.exit(127) + + @staticmethod + # check these tables by using GUC gp_detect_data_correctness, dump the error tables to the output file + def check_partitiontables_by_guc(self, idx, dbname): + logger.info("worker[{}]: begin: ".format(idx)) + logger.info("worker[{}]: connect to <{}> ...".format(idx, dbname)) + start = time.time() + db = self.get_db_conn(dbname) + has_error = False + + while not self.qlist.empty(): + result = self.qlist.get() + parrelid = result[0] + tablename = result[1] + coll = result[2] + attname = result[3] + has_default_partition = result[4] + + try: + db.query("set gp_detect_data_correctness = 1;") + except Exception as e: + logger.warning("missing GUC gp_detect_data_correctness") + db.close() + + # get the leaf partition names + get_partitionname_sql = """ + with recursive cte(root_oid, table_oid, nlevel) as ( + select parrelid, parrelid, 0 from pg_partition where not paristemplate and parlevel = 0 + union all + select root_oid, pi.inhrelid, nlevel+1 + from cte, pg_inherits pi + where cte.table_oid = pi.inhparent + ) + select root_oid::regclass::text as tablename, table_oid::regclass::text as partitioname + from cte where nlevel = (select max(nlevel) from cte) and root_oid = {}; + """ + partitiontablenames = db.query(get_partitionname_sql.format(parrelid)).getresult() + for tablename, partitioname in partitiontablenames: + sql = "insert into {tab} select * from {tab}".format(tab=partitioname) + try: + logger.info("start checking table {tab} ...".format(tab=partitioname)) + db.query(sql) + logger.info("check table {tab} OK.".format(tab=partitioname)) + except Exception as e: + logger.info("check table {tab} error out: {err_msg}".format(tab=partitioname, err_msg=str(e))) + has_error = True + + # if check failed, dump the table to the specified out file. + if has_error: + # get the partition table size info to estimate the time + msg, size = self.get_table_size_info(dbname, parrelid) + self.filtertabslock.acquire() + self.filtertabs.append((parrelid, tablename, coll, attname, msg, size)) + self.filtertabslock.release() + has_error = False + if has_default_partition == 'f': + logger.warning("no default partition for {}".format(tablename)) + + db.query("set gp_detect_data_correctness = 0;") + + end = time.time() + total_time = end - start + logger.info("Current progress: have {} remaining, {} seconds passed.".format(self.qlist.qsize(), round(total_time, 2))) + + db.close() + logger.info("worker[{}]: finish.".format(idx)) + +class migrate(connection): + def __init__(self, dbname, port, host, user, script_file): + self.dbname = dbname + self.port = self._get_pg_port(port) + self.host = host + self.user = user + self.script_file = script_file + self.dbdict = defaultdict(list) + + self.parse_inputfile() + + def parse_inputfile(self): + with open(self.script_file) as f: + for line in f: + sql = line.strip() + if sql.startswith("\c"): + db_name = sql.split("\c")[1].strip() + if (sql.startswith("reindex") and sql.endswith(";") and sql.count(";") == 1): + self.dbdict[db_name].append(sql) + if (sql.startswith("begin;") and sql.endswith("commit;")): + self.dbdict[db_name].append(sql) + + def run(self): + try: + for db_name, commands in self.dbdict.items(): + total_counts = len(commands) + logger.info("db: {}, total have {} commands to execute".format(db_name, total_counts)) + for command in commands: + self.run_alter_command(db_name, command) + except KeyboardInterrupt: + sys.exit('\nUser Interrupted') + + logger.info("All done") + + def run_alter_command(self, db_name, command): + try: + db = self.get_db_conn(db_name) + logger.info("db: {}, executing command: {}".format(db_name, command)) + db.query(command) + + if (command.startswith("begin")): + pieces = [p for p in re.split("( |\\\".*?\\\"|'.*?')", command) if p.strip()] + index = pieces.index("truncate") + if 0 < index < len(pieces) - 1: + table_name = pieces[index+1] + analyze_sql = "analyze {};".format(table_name) + logger.info("db: {}, executing analyze command: {}".format(db_name, analyze_sql)) + db.query(analyze_sql) + + db.close() + except Exception, e: + logger.error("{}".format(str(e))) + +def parseargs(): + parser = argparse.ArgumentParser(prog='el8_migrate_locale') + parser.add_argument('--host', type=str, help='Greenplum Database hostname') + parser.add_argument('--port', type=int, help='Greenplum Database port') + parser.add_argument('--dbname', type=str, default='postgres', help='Greenplum Database database name') + parser.add_argument('--user', type=str, help='Greenplum Database user name') + + subparsers = parser.add_subparsers(help='sub-command help', dest='cmd') + parser_precheck_index = subparsers.add_parser('precheck-index', help='list affected index') + required = parser_precheck_index.add_argument_group('required arguments') + required.add_argument('--out', type=str, help='outfile path for the reindex commands', required=True) + + parser_precheck_table = subparsers.add_parser('precheck-table', help='list affected tables') + required = parser_precheck_table.add_argument_group('required arguments') + required.add_argument('--out', type=str, help='outfile path for the rebuild partition commands', required=True) + parser_precheck_table.add_argument('--pre_upgrade', action='store_true', help='check tables before os upgrade to EL8') + parser_precheck_table.add_argument('--order_size_ascend', action='store_true', help='sort the tables by size in ascending order') + parser_precheck_table.set_defaults(order_size_ascend=False) + parser_precheck_table.add_argument('--nthread', type=int, default=1, help='the concurrent threads to check partition tables') + + parser_run = subparsers.add_parser('migrate', help='run the reindex and the rebuild partition commands') + required = parser_run.add_argument_group('required arguments') + required.add_argument('--input', type=str, help='the file contains reindex or rebuild partition commands', required=True) + + args = parser.parse_args() + return args + +if __name__ == "__main__": + args = parseargs() + # initialize logger + logging.basicConfig(level=logging.DEBUG, stream=sys.stdout, format="%(asctime)s - %(levelname)s - %(message)s") + logger = logging.getLogger() + + if args.cmd == 'precheck-index': + ci = CheckIndexes(args.host, args.port, args.dbname, args.user) + ci.dump_index_info(args.out) + elif args.cmd == 'precheck-table': + ct = CheckTables(args.host, args.port, args.dbname, args.user, args.order_size_ascend, args.nthread, args.pre_upgrade) + ct.dump_tables(args.out) + elif args.cmd == 'migrate': + cr = migrate(args.dbname, args.port, args.host, args.user, args.input) + cr.run() + else: + sys.stderr.write("unknown subcommand!") + sys.exit(127) diff --git a/gpMgmt/bin/el8_migrate_locale/test.sql b/gpMgmt/bin/el8_migrate_locale/test.sql new file mode 100644 index 000000000000..99b4ce3044c5 --- /dev/null +++ b/gpMgmt/bin/el8_migrate_locale/test.sql @@ -0,0 +1,261 @@ +-- case1 test basic table and index with char/varchar/text type +CREATE TABLE test_character_type +( + char_1 CHAR(1), + varchar_10 VARCHAR(10), + txt TEXT +); + +INSERT INTO test_character_type (char_1) +VALUES ('Y ') RETURNING *; + +INSERT INTO test_character_type (varchar_10) +VALUES ('HelloWorld ') RETURNING *; + +INSERT INTO test_character_type (txt) +VALUES ('TEXT column can store a string of any length') RETURNING txt; + +create index "test_id1 's " on test_character_type (char_1); +create index "test_id2 \ $ \\" on test_character_type (varchar_10); +create index " test_id "" 3 " on test_character_type (txt); + +-- case2 test type citext; +create extension citext; +CREATE TABLE test_citext +( + nick CITEXT PRIMARY KEY, + pass TEXT NOT NULL +); + +INSERT INTO test_citext VALUES ('larry', random()::text); +INSERT INTO test_citext VALUES ('Tom', random()::text); +INSERT INTO test_citext VALUES ('Damian', random()::text); +INSERT INTO test_citext VALUES ('NEAL', random()::text); +INSERT INTO test_citext VALUES ('Bjørn', random()::text); + +create index test_idx_citext on test_citext (nick); + +----- case 3 test special case with $ +create table test1 +( + content varchar +) DISTRIBUTED by (content); +insert into test1 (content) +values ('a'), + ('$a'), + ('a$'), + ('b'), + ('$b'), + ('b$'), + ('A'), + ('B'); +create index id1 on test1 (content); + +---- case4 test speical case with '""' +CREATE TABLE hash_test +( + id int, + date text +) DISTRIBUTED BY (date); +insert into hash_test values (1, '01'); +insert into hash_test values (1, '"01"'); +insert into hash_test values (2, '"02"'); +insert into hash_test values (3, '02'); +insert into hash_test values (4, '03'); + +---- case5 test speical case with 1-1 vs 11 +CREATE TABLE test2 +( + id int, + date text +) DISTRIBUTED BY (id) +PARTITION BY RANGE (date) +( START (text '01-01') INCLUSIVE + END (text '11-01') EXCLUSIVE + ); + +insert into test2 +values (2, '02-1'), + (2, '03-1'), + (2, '08-1'), + (2, '09-01'), + (1, '11'), + (1, '1-1'); + +--- case6 test range partition with special character '“”' +CREATE TABLE partition_range_test +( + id int, + date text +) DISTRIBUTED BY (id) +PARTITION BY RANGE (date) + (PARTITION Jan START ( '01') INCLUSIVE , + PARTITION Feb START ( '02') INCLUSIVE , + PARTITION Mar START ( '03') INCLUSIVE + END ( '04') EXCLUSIVE); + +insert into partition_range_test values (1, '01'); +insert into partition_range_test values (1, '"01"'); +insert into partition_range_test values (2, '"02"'); +insert into partition_range_test values (2, '02'); +insert into partition_range_test values (3, '03'); +insert into partition_range_test values (3, '"03"'); + +-- case7 test range partition with default partition. +CREATE TABLE partition_range_test_default (id int, date text) DISTRIBUTED BY (id) +PARTITION BY RANGE (date) + (PARTITION feb START ( '02') INCLUSIVE , + PARTITION Mar START ( '03') INCLUSIVE, + Default partition others); + +insert into partition_range_test_default values (1, '01'), (1, '"01"'), (2, '"02"'), (2, '02'), (3, '03'), (3, '"03"'), (4, '04'), (4, '"04"'); + +-- case8 for testing insert into root select * from partition_range_test where date > '"02"'; +create table root +( + id int, + date text +) DISTRIBUTED BY (id) +PARTITION BY RANGE (date) +(PARTITION Jan START ( '01') INCLUSIVE , +PARTITION Feb START ( '02') INCLUSIVE , +PARTITION Mar START ( '03') INCLUSIVE +END ( '04') EXCLUSIVE); + +insert into root +select * +from partition_range_test +where date > '"02"'; + +--- case9 test range partition with special character '“”' with ao +CREATE TABLE partition_range_test_ao +( + id int, + date text +) + WITH (appendonly = true) + DISTRIBUTED BY (id) + PARTITION BY RANGE (date) + (PARTITION Jan START ('01') INCLUSIVE , + PARTITION Feb START ('02') INCLUSIVE , + PARTITION Mar START ('03') INCLUSIVE + END ('04') EXCLUSIVE); + +insert into partition_range_test_ao values (1, '01'); +insert into partition_range_test_ao values (1, '"01"'); +insert into partition_range_test_ao values (1, '"01-1"'); +insert into partition_range_test_ao values (2, '"02-1"'); +insert into partition_range_test_ao values (2, '"02"'); +insert into partition_range_test_ao values (2, '02'); + +--- case10 for index constraint violation +CREATE TABLE repository +( + id integer, + slug character varying(100), + name character varying(100), + project_id character varying(100) +) DISTRIBUTED BY (slug, project_id); + +insert into repository values (793, 'text-rnn', 'text-rnn', 146); +insert into repository values (812, 'ink_data', 'ink_data', 146); + +-- case11 for index unique constraint violation +create table gitrefresh +( + projecttag text, + state character(1), + analysis_started timestamp without time zone, + analysis_ended timestamp without time zone, + counter_requested integer, + customer_id integer, + id int, + constraint idx_projecttag unique (projecttag) +); +create index pk_gitrefresh on gitrefresh (id); +INSERT INTO gitrefresh(projecttag, state, analysis_started, counter_requested, customer_id) +VALUES ('npm@randombytes', 'Q', NOW(), 1, 0); + +-- case12 for partition range list and special characters +CREATE TABLE rank +( + id int, + gender char(1) +) DISTRIBUTED BY (id) +PARTITION BY LIST (gender) +( PARTITION girls VALUES ('F'), + PARTITION boys VALUES ('M'), + DEFAULT PARTITION other ); + +CREATE TABLE "rank $ % &" +( + id int, + gender char(1) +) DISTRIBUTED BY (id) +PARTITION BY LIST (gender) +( PARTITION girls VALUES ('F'), + PARTITION boys VALUES ('M'), + DEFAULT PARTITION other ); + +CREATE TABLE "rank $ % & ! *" +( + id int, + gender char(1) +) DISTRIBUTED BY (id) +PARTITION BY LIST (gender) +( PARTITION girls VALUES ('F'), + PARTITION boys VALUES ('M'), + DEFAULT PARTITION other ); + +CREATE TABLE "rank 's " +( + id int, + gender char(1) +) DISTRIBUTED BY (id) +PARTITION BY LIST (gender) +( PARTITION girls VALUES ('F'), + PARTITION boys VALUES ('M'), + DEFAULT PARTITION other ); + +CREATE TABLE "rank 's' " +( + id int, + gender char(1) +) DISTRIBUTED BY (id) +PARTITION BY LIST (gender) +( PARTITION girls VALUES ('F'), + PARTITION boys VALUES ('M'), + DEFAULT PARTITION other ); + +CREATE TABLE "rank b c" +( + id int, + gender char(1) +) DISTRIBUTED BY (id) +PARTITION BY LIST (gender) +( PARTITION girls VALUES ('F'), + PARTITION boys VALUES ('M'), + DEFAULT PARTITION other ); + +-- case13 for testing partition key is type date +CREATE TABLE sales (id int, time date, amt decimal(10,2)) +DISTRIBUTED BY (id) +PARTITION BY RANGE (time) +( START (date '2022-01-01') INCLUSIVE + END (date '2023-01-01') EXCLUSIVE + EVERY (INTERVAL '1 month') ); + +-- case14 for testing partition range with special characters in name +CREATE TABLE "partition_range_ 's " (id int, date text) +DISTRIBUTED BY (id) +PARTITION BY RANGE (date) + (PARTITION feb START ( '02') INCLUSIVE , + PARTITION Mar START ( '03') INCLUSIVE, + Default partition others); + +CREATE TABLE "partition_range_ 's' " (id int, date text) +DISTRIBUTED BY (id) +PARTITION BY RANGE (date) + (PARTITION feb START ( '02') INCLUSIVE , + PARTITION Mar START ( '03') INCLUSIVE, + Default partition others); diff --git a/gpMgmt/bin/gpcheckperf b/gpMgmt/bin/gpcheckperf index 520b94272481..82e89703a0c0 100755 --- a/gpMgmt/bin/gpcheckperf +++ b/gpMgmt/bin/gpcheckperf @@ -149,7 +149,8 @@ def gpsync(src, dst): -P : print information showing the progress of the transfer """ proc = [] - for peer in GV.opt['-h']: + host_list = getHostList() + for peer in host_list: cmd = 'rsync -P -a -c -e "ssh -o BatchMode=yes -o StrictHostKeyChecking=no" {0} {1}:{2}' \ .format(src, unix.canonicalize(peer), dst) if GV.opt['-v']: @@ -665,10 +666,11 @@ def setupNetPerfTest(): print '-------------------' hostlist = ssh_utils.HostList() - for h in GV.opt['-h']: - hostlist.add(h) if GV.opt['-f']: hostlist.parseFile(GV.opt['-f']) + else: + for h in GV.opt['-h']: + hostlist.add(h) h = hostlist.get() if len(h) == 0: @@ -999,20 +1001,21 @@ def getHostList(): :return: returns a list of hosts """ hostlist = ssh_utils.HostList() - for h in GV.opt['-h']: - hostlist.add(h) if GV.opt['-f']: hostlist.parseFile(GV.opt['-f']) + else: + for h in GV.opt['-h']: + hostlist.add(h) try: hostlist.checkSSH() except ssh_utils.SSHError, e: sys.exit('[Error] {0}' .format(str(e))) - GV.opt['-h'] = hostlist.filterMultiHomedHosts() - if len(GV.opt['-h']) == 0: + host_list = hostlist.filterMultiHomedHosts() + if len(host_list) == 0: usage('Error: missing hosts in -h and/or -f arguments') - return GV.opt['-h'] + return host_list def main(): diff --git a/gpMgmt/bin/gpexpand b/gpMgmt/bin/gpexpand index 7e9e5be6dc52..0a1f5466a2e3 100755 --- a/gpMgmt/bin/gpexpand +++ b/gpMgmt/bin/gpexpand @@ -1258,6 +1258,20 @@ class gpexpand: tablespace_inputfile = self.options.filename + ".ts" + """ + Check if the tablespace input file exists or not + In cases where the user manually creates an input file, the file + will not be present. In such cases create the file and exit giving the + user a chance to review it and re-run gpexpand. + """ + if not os.path.exists(tablespace_inputfile): + self.generate_tablespace_inputfile(tablespace_inputfile) + self.logger.warning("Could not locate tablespace input configuration file '{0}'. A new tablespace input configuration file is written " \ + "to '{0}'. Please review the file and re-run with: gpexpand -i {1}".format(tablespace_inputfile, self.options.filename)) + + logger.info("Exiting...") + sys.exit(1) + new_tblspc_info = {} with open(tablespace_inputfile) as f: @@ -2573,10 +2587,10 @@ def main(options, args, parser): _gp_expand.validate_heap_checksums() newSegList = _gp_expand.read_input_files() _gp_expand.addNewSegments(newSegList) + newTableSpaceInfo = _gp_expand.read_tablespace_file() _gp_expand.sync_packages() _gp_expand.start_prepare() _gp_expand.lock_catalog() - newTableSpaceInfo = _gp_expand.read_tablespace_file() _gp_expand.add_segments(newTableSpaceInfo) _gp_expand.update_original_segments() _gp_expand.cleanup_new_segments() diff --git a/gpMgmt/bin/gppylib/commands/gp.py b/gpMgmt/bin/gppylib/commands/gp.py index 97f288e5c528..1b26ac04905d 100644 --- a/gpMgmt/bin/gppylib/commands/gp.py +++ b/gpMgmt/bin/gppylib/commands/gp.py @@ -1177,12 +1177,27 @@ def get_gphome(): raise GpError('Environment Variable GPHOME not set') return gphome +''' +gprecoverseg, gpstart, gpstate, gpstop, gpaddmirror have -d option to give the master data directory. +but its value was not used throughout the utilities. to fix this the best possible way is +to set and retrieve that set master dir when we call get_masterdatadir(). +''' +option_master_datadir = None +def set_masterdatadir(master_datadir=None): + global option_master_datadir + option_master_datadir = master_datadir ###### +# if -d is provided with utility, it will be prioritiese over other options. def get_masterdatadir(): - master_datadir = os.environ.get('MASTER_DATA_DIRECTORY') + if option_master_datadir is not None: + master_datadir = option_master_datadir + else: + master_datadir = os.environ.get('MASTER_DATA_DIRECTORY') + if not master_datadir: raise GpError("Environment Variable MASTER_DATA_DIRECTORY not set!") + return master_datadir ###### diff --git a/gpMgmt/bin/gppylib/commands/test/unit/test_unit_unix.py b/gpMgmt/bin/gppylib/commands/test/unit/test_unit_unix.py index c436e5186113..4f8a1ba5dd1c 100644 --- a/gpMgmt/bin/gppylib/commands/test/unit/test_unit_unix.py +++ b/gpMgmt/bin/gppylib/commands/test/unit/test_unit_unix.py @@ -65,5 +65,21 @@ def test_kill_9_segment_processes_kill_error(self): self.subject.logger.info.assert_called_once_with('Terminating processes for segment /data/primary/gpseg0') self.subject.logger.error.assert_called_once_with('Failed to kill process 789 for segment /data/primary/gpseg0: Kill Error') + + @patch('gppylib.commands.unix.get_rsync_version', return_value='rsync version 3.2.7') + @patch('gppylib.commands.unix.LooseVersion', side_effect=['3.2.7', '3.1.0']) + def test_compare_rsync_version(self, mock_parse_version, mock_get_cmd_version): + + result = self.subject.validate_rsync_version("3.2.7") + self.assertTrue(result) + + + @patch('gppylib.commands.unix.get_rsync_version', return_value='rsync version 2.6.9') + @patch('gppylib.commands.unix.LooseVersion', side_effect=['2.6.9', '3.1.0']) + def test_validate_rsync_version_false(self, mock_parse_version, mock_get_cmd_version): + + result =self.subject.validate_rsync_version("2.6.9") + self.assertFalse(result) + if __name__ == '__main__': run_tests() diff --git a/gpMgmt/bin/gppylib/commands/unix.py b/gpMgmt/bin/gppylib/commands/unix.py index 8bb1b0dca71d..92f45e1ae710 100644 --- a/gpMgmt/bin/gppylib/commands/unix.py +++ b/gpMgmt/bin/gppylib/commands/unix.py @@ -13,6 +13,8 @@ import signal import uuid import pipes +import re +from distutils.version import LooseVersion from gppylib.gplog import get_default_logger from gppylib.commands.base import * @@ -173,6 +175,20 @@ def kill_sequence(pid): logandkill(pid, signal.SIGABRT) +def get_remote_link_path(path, host): + """ + Function to get symlink target path for a given path on given host. + :param path: path for which symlink has to be found + :param host: host on which the given path is available + :return: returns symlink target path + """ + + cmdStr = """python -c 'import os; print(os.readlink("%s"))'""" % path + cmd = Command('get remote link path', cmdStr=cmdStr, ctxt=REMOTE, + remoteHost=host) + cmd.run(validateAfter=True) + return cmd.get_stdout() + # ---------------Platform Framework-------------------- """ The following platform framework is used to handle any differences between @@ -520,8 +536,10 @@ def __init__(self, name, srcFile, dstFile, srcHost=None, dstHost=None, recursive if checksum: cmd_tokens.append('-c') + # Shows the progress of the whole transfer, + # Note : It is only supported with rsync 3.1.0 or above if progress: - cmd_tokens.append('--progress') + cmd_tokens.append('--info=progress2,name0') # To show file transfer stats if stats: @@ -554,11 +572,14 @@ def __init__(self, name, srcFile, dstFile, srcHost=None, dstHost=None, recursive cmd_tokens.extend(exclude_str) + # Combines output streams, uses 'sed' to find lines with 'kB/s' or 'MB/s' and appends ':%s' as suffix to the end + # of each line and redirects it to progress_file if progress_file: - cmd_tokens.append('> %s 2>&1' % pipes.quote(progress_file)) + cmd_tokens.append( + '2>&1 | tr "\\r" "\\n" |sed -E "/[0-9]+%/ s/$/ :{0}/" > {1}'.format(name, pipes.quote(progress_file))) cmdStr = ' '.join(cmd_tokens) - + cmdStr = "set -o pipefail; {}".format(cmdStr) self.command_tokens = cmd_tokens Command.__init__(self, name, cmdStr, ctxt, remoteHost) @@ -798,3 +819,25 @@ def isScpEnabled(hostlist): return False return True + + + +def validate_rsync_version(min_ver): + """ + checks the version of the 'rsync' command and compares it with a required version. + If the current version is lower than the required version, it raises an exception + """ + rsync_version_info = get_rsync_version() + pattern = r"version (\d+\.\d+\.\d+)" + match = re.search(pattern, rsync_version_info) + current_rsync_version = match.group(1) + if LooseVersion(current_rsync_version) < LooseVersion(min_ver): + return False + return True + +def get_rsync_version(): + """ get the rsync current version """ + cmdStr = findCmdInPath("rsync") + " --version" + cmd = Command("get rsync version", cmdStr=cmdStr) + cmd.run(validateAfter=True) + return cmd.get_stdout() diff --git a/gpMgmt/bin/gppylib/mainUtils.py b/gpMgmt/bin/gppylib/mainUtils.py index 17ab20afde93..c48972c92c59 100644 --- a/gpMgmt/bin/gppylib/mainUtils.py +++ b/gpMgmt/bin/gppylib/mainUtils.py @@ -174,7 +174,7 @@ def acquire(self): # If the process is already killed, remove the lock directory. if not unix.check_pid(self.pidfilepid): shutil.rmtree(self.ppath) - + # try and acquire the lock try: self.pidlockfile.acquire() @@ -264,6 +264,18 @@ def simple_main(createOptionParserFn, createCommandFn, mainOptions=None): def simple_main_internal(createOptionParserFn, createCommandFn, mainOptions): + + """ + if -d option is provided in that case doing parsing after creating + lock file would not be a good idea therefore handling -d option before lock. + """ + parser = createOptionParserFn() + (parserOptions, parserArgs) = parser.parse_args() + + if parserOptions.ensure_value("masterDataDirectory", None) is not None: + parserOptions.master_data_directory = os.path.abspath(parserOptions.masterDataDirectory) + gp.set_masterdatadir(parserOptions.master_data_directory) + """ If caller specifies 'pidlockpath' in mainOptions then we manage the specified pid file within the MASTER_DATA_DIRECTORY before proceeding @@ -282,13 +294,13 @@ def simple_main_internal(createOptionParserFn, createCommandFn, mainOptions): # at this point we have whatever lock we require try: - simple_main_locked(createOptionParserFn, createCommandFn, mainOptions) + simple_main_locked(parser, parserOptions, parserArgs, createCommandFn, mainOptions) finally: if sml is not None: sml.release() -def simple_main_locked(createOptionParserFn, createCommandFn, mainOptions): +def simple_main_locked(parser, parserOptions, parserArgs, createCommandFn, mainOptions): """ Not to be called externally -- use simple_main instead """ @@ -301,10 +313,8 @@ def simple_main_locked(createOptionParserFn, createCommandFn, mainOptions): faultProberInterface.registerFaultProber(faultProberImplGpdb.GpFaultProberImplGpdb()) commandObject = None - parser = None forceQuiet = mainOptions is not None and mainOptions.get("forceQuietOutput") - options = None if mainOptions is not None and mainOptions.get("programNameOverride"): global gProgramName @@ -320,30 +330,24 @@ def simple_main_locked(createOptionParserFn, createCommandFn, mainOptions): hostname = unix.getLocalHostname() username = unix.getUserName() - parser = createOptionParserFn() - (options, args) = parser.parse_args() - if useHelperToolLogging: gplog.setup_helper_tool_logging(execname, hostname, username) else: gplog.setup_tool_logging(execname, hostname, username, - logdir=options.ensure_value("logfileDirectory", None), nonuser=nonuser) + logdir=parserOptions.ensure_value("logfileDirectory", None), nonuser=nonuser) if forceQuiet: gplog.quiet_stdout_logging() else: - if options.ensure_value("verbose", False): + if parserOptions.ensure_value("verbose", False): gplog.enable_verbose_logging() - if options.ensure_value("quiet", False): + if parserOptions.ensure_value("quiet", False): gplog.quiet_stdout_logging() - if options.ensure_value("masterDataDirectory", None) is not None: - options.master_data_directory = os.path.abspath(options.masterDataDirectory) - if not suppressStartupLogMessage: logger.info("Starting %s with args: %s" % (gProgramName, ' '.join(sys.argv[1:]))) - commandObject = createCommandFn(options, args) + commandObject = createCommandFn(parserOptions, parserArgs) exitCode = commandObject.run() exit_status = exitCode @@ -365,10 +369,10 @@ def simple_main_locked(createOptionParserFn, createCommandFn, mainOptions): e.cmd.results.stderr)) exit_status = 2 except Exception, e: - if options is None: + if parserOptions is None: logger.exception("%s failed. exiting...", gProgramName) else: - if options.ensure_value("verbose", False): + if parserOptions.ensure_value("verbose", False): logger.exception("%s failed. exiting...", gProgramName) else: logger.fatal("%s failed. (Reason='%s') exiting..." % (gProgramName, e)) diff --git a/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py b/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py index 92705a8e1cb8..a3197223eb28 100644 --- a/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py +++ b/gpMgmt/bin/gppylib/operations/buildMirrorSegments.py @@ -70,7 +70,7 @@ def get_recovery_progress_pattern(recovery_type='incremental'): progress of rsync looks like: "1,036,923,510 99% 39.90MB/s 0:00:24" """ if recovery_type == 'differential': - return r" +\d+%\ +\d+.\d+(kB|mB)\/s" + return r" +\d+%\ +\d+.\d+(kB|MB)\/s" return r"\d+\/\d+ (kB|mB) \(\d+\%\)" @@ -459,18 +459,66 @@ def print_progress(): os.remove(combined_progress_filepath) - def _get_progress_cmd(self, progressFile, targetSegmentDbId, targetHostname): + def _get_progress_cmd(self, progressFile, targetSegmentDbId, targetHostname, isDifferentialRecovery): """ # There is race between when the recovery process creates the progressFile # when this progress cmd is run. Thus, the progress command touches # the file to ensure its presence before tailing. """ if self.__progressMode != GpMirrorListToBuild.Progress.NONE: - return GpMirrorListToBuild.ProgressCommand("tail the last line of the file", - "set -o pipefail; touch -a {0}; tail -1 {0} | tr '\\r' '\\n' |" - " tail -1".format(pipes.quote(progressFile)), - targetSegmentDbId, progressFile, ctxt=base.REMOTE, - remoteHost=targetHostname) + cmd_desc = "tail the last line of the file" + if isDifferentialRecovery: + # For differential recovery, use sed to filter lines with specific patterns to avoid race condition. + + # Set the option to make the pipeline fail if any command within it fails; + # Example: set -o pipefail; + + # Create or update a file with the name specified in {0}; + # Example: touch -a 'rsync.20230926_145006.dbid2.out'; + + # Display the last 3 lines of the file specified in {0} and pass them to the next command; + # Example: If {0} contains: + # receiving incremental file list + # + # 0 0% 0.00kB/s 0:00:00 :Syncing pg_control file of dbid 5 + # 8,192 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=0/1) :Syncing pg_control file of dbid 5 + # 8,192 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=0/1) :Syncing pg_control file of dbid 5 + # + # This command will pass the above lines (excluding the first) to the next command. + + # Process the output using sed (stream editor), printing lines that match certain patterns; + # Example: If the output is " 8,192 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=0/1) :Syncing pg_control file of dbid 5", + # this command will print: + # 8,192 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=0/1) :Syncing pg_control file of dbid 5 + # + # It will print lines that contain ":Syncing.*dbid", "error:", or "total". + + # Translate carriage return characters to newline characters; + # Example: If the output contains '\r' characters, they will be replaced with '\n'. + + # Display only the last line of the processed output. + # Example: If the output after the previous command is: + # 8,192 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=0/1) :Syncing pg_control file of dbid 5 + # This command will output the same line. + + cmd_str = ( + "set -o pipefail; touch -a {0}; tail -3 {0} | sed -n -e '/:Syncing.*dbid/p; /error:/p; /total/p' | tr '\\r' '\\n' | tail -1" + .format(pipes.quote(progressFile)) + ) + else: + # For full and incremental recovery, simply tail the last line. + cmd_str = ( + "set -o pipefail; touch -a {0}; tail -1 {0} | tr '\\r' '\\n' | tail -1" + .format(pipes.quote(progressFile)) + ) + + progress_command = GpMirrorListToBuild.ProgressCommand( + cmd_desc, cmd_str, + targetSegmentDbId, progressFile, ctxt=base.REMOTE, + remoteHost=targetHostname + ) + + return progress_command return None def _get_remove_cmd(self, remove_file, target_host): @@ -533,7 +581,7 @@ def _do_recovery(self, recovery_info_by_host, gpEnv): era = read_era(gpEnv.getMasterDataDir(), logger=self.__logger) for hostName, recovery_info_list in recovery_info_by_host.items(): for ri in recovery_info_list: - progressCmd = self._get_progress_cmd(ri.progress_file, ri.target_segment_dbid, hostName) + progressCmd = self._get_progress_cmd(ri.progress_file, ri.target_segment_dbid, hostName, ri.is_differential_recovery) if progressCmd: progress_cmds.append(progressCmd) diff --git a/gpMgmt/bin/gppylib/operations/rebalanceSegments.py b/gpMgmt/bin/gppylib/operations/rebalanceSegments.py index 5f2470a477c5..0ab305c57438 100644 --- a/gpMgmt/bin/gppylib/operations/rebalanceSegments.py +++ b/gpMgmt/bin/gppylib/operations/rebalanceSegments.py @@ -1,5 +1,6 @@ import sys import signal +from contextlib import closing from gppylib.gparray import GpArray from gppylib.db import dbconn from gppylib.commands.gp import GpSegStopCmd @@ -8,7 +9,32 @@ from gppylib.operations.segment_reconfigurer import SegmentReconfigurer -MIRROR_PROMOTION_TIMEOUT=600 +MIRROR_PROMOTION_TIMEOUT = 600 + +logger = gplog.get_default_logger() + + +def replay_lag(primary_db): + """ + This function returns replay lag (diff of flush_lsn and replay_lsn) on mirror segment. Goal being if there is a + lot to catchup on mirror the user should be warned about that and rebalance opertion should be aborted. + params: primary segment info + return value: replay lag in bytes + replay lag in bytes: diff of flush_lsn and replay_lsn on mirror + """ + port = primary_db.getSegmentPort() + host = primary_db.getSegmentHostName() + logger.debug('Get replay lag on mirror of primary segment with host:{}, port:{}'.format(host, port)) + sql = "select pg_xlog_location_diff(flush_location, replay_location) from pg_stat_replication;" + + try: + dburl = dbconn.DbURL(hostname=host, port=port) + with closing(dbconn.connect(dburl, utility=True, encoding='UTF8')) as conn: + replay_lag = dbconn.execSQLForSingleton(conn, sql) + except Exception as ex: + raise Exception("Failed to query pg_stat_replication for host:{}, port:{}, error: {}". + format(host, port, str(ex))) + return replay_lag class ReconfigDetectionSQLQueryCommand(base.SQLCommand): @@ -26,11 +52,12 @@ def run(self): class GpSegmentRebalanceOperation: - def __init__(self, gpEnv, gpArray, batch_size, segment_batch_size): + def __init__(self, gpEnv, gpArray, batch_size, segment_batch_size, replay_lag): self.gpEnv = gpEnv self.gpArray = gpArray self.batch_size = batch_size self.segment_batch_size = segment_batch_size + self.replay_lag = replay_lag self.logger = gplog.get_default_logger() def rebalance(self): @@ -45,10 +72,19 @@ def rebalance(self): continue if segmentPair.up() and segmentPair.reachable() and segmentPair.synchronized(): + if self.replay_lag is not None: + self.logger.info("Allowed replay lag during rebalance is {} GB".format(self.replay_lag)) + replay_lag_in_bytes = replay_lag(segmentPair.primaryDB) + if float(replay_lag_in_bytes) >= (self.replay_lag * 1024 * 1024 * 1024): + raise Exception("{} bytes of xlog is still to be replayed on mirror with dbid {}, let " + "mirror catchup on replay then trigger rebalance. Use --replay-lag to " + "configure the allowed replay lag limit." + .format(replay_lag_in_bytes, segmentPair.primaryDB.getSegmentDbId())) unbalanced_primary_segs.append(segmentPair.primaryDB) else: self.logger.warning( - "Not rebalancing primary segment dbid %d with its mirror dbid %d because one is either down, unreachable, or not synchronized" \ + "Not rebalancing primary segment dbid %d with its mirror dbid %d because one is either down, " + "unreachable, or not synchronized" \ % (segmentPair.primaryDB.dbid, segmentPair.mirrorDB.dbid)) if not len(unbalanced_primary_segs): @@ -76,7 +112,7 @@ def rebalance(self): pool.addCommand(cmd) base.join_and_indicate_progress(pool) - + failed_count = 0 completed = pool.getCompletedItems() for res in completed: diff --git a/gpMgmt/bin/gppylib/operations/segment_tablespace_locations.py b/gpMgmt/bin/gppylib/operations/segment_tablespace_locations.py index 06ee46e39d21..135d4dd410d9 100644 --- a/gpMgmt/bin/gppylib/operations/segment_tablespace_locations.py +++ b/gpMgmt/bin/gppylib/operations/segment_tablespace_locations.py @@ -46,7 +46,7 @@ def get_tablespace_locations(all_hosts, mirror_data_directory): return tablespace_locations -def get_segment_tablespace_locations(primary_hostname, primary_port): +def get_segment_tablespace_oid_locations(primary_hostname, primary_port): """ to get user defined tablespace locations for a specific primary segment. This function is called by gprecoverseg --differential to get the tablespace locations by connecting to primary while mirror is down. @@ -54,9 +54,9 @@ def get_segment_tablespace_locations(primary_hostname, primary_port): as parameter and it is called before mirrors are moved to new location by gpmovemirrors. :param primary_hostname: string type primary hostname :param primary_port: int type primary segment port - :return: list of tablespace locations + :return: list of tablespace oids and locations """ - sql = "SELECT distinct(tblspc_loc) FROM ( SELECT oid FROM pg_tablespace WHERE spcname NOT IN " \ + sql = "SELECT distinct(oid),tblspc_loc FROM ( SELECT oid FROM pg_tablespace WHERE spcname NOT IN " \ "('pg_default', 'pg_global')) AS q,LATERAL gp_tablespace_location(q.oid);" try: query = RemoteQueryCommand("Get segment tablespace locations", sql, primary_hostname, primary_port) diff --git a/gpMgmt/bin/gppylib/operations/test/unit/test_unit_segment_tablespace_locations.py b/gpMgmt/bin/gppylib/operations/test/unit/test_unit_segment_tablespace_locations.py index 13dbc482b195..2cf90b00e665 100644 --- a/gpMgmt/bin/gppylib/operations/test/unit/test_unit_segment_tablespace_locations.py +++ b/gpMgmt/bin/gppylib/operations/test/unit/test_unit_segment_tablespace_locations.py @@ -1,7 +1,7 @@ #!/usr/bin/env python from mock import Mock, patch, call -from gppylib.operations.segment_tablespace_locations import get_tablespace_locations, get_segment_tablespace_locations +from gppylib.operations.segment_tablespace_locations import get_tablespace_locations, get_segment_tablespace_oid_locations from test.unit.gp_unittest import GpTestCase class GetTablespaceDirTestCase(GpTestCase): @@ -39,9 +39,9 @@ def test_validate_data_with_mirror_data_directory_get_tablespace_locations(self) self.assertEqual(expected, get_tablespace_locations(False, mirror_data_directory)) @patch('gppylib.db.catalog.RemoteQueryCommand.run', side_effect=Exception()) - def test_get_segment_tablespace_locations_exception(self, mock1): + def test_get_segment_tablespace_oid_locations_exception(self, mock1): with self.assertRaises(Exception) as ex: - get_segment_tablespace_locations('sdw1', 40000) + get_segment_tablespace_oid_locations('sdw1', 40000) self.assertEqual(0, self.mock_logger.debug.call_count) self.assertTrue('Failed to get segment tablespace locations for segment with host sdw1 and port 40000' in str(ex.exception)) @@ -49,8 +49,8 @@ def test_get_segment_tablespace_locations_exception(self, mock1): @patch('gppylib.db.catalog.RemoteQueryCommand.__init__', return_value=None) @patch('gppylib.db.catalog.RemoteQueryCommand.run') @patch('gppylib.db.catalog.RemoteQueryCommand.get_results') - def test_get_segment_tablespace_locations_success(self, mock1, mock2, mock3): - get_segment_tablespace_locations('sdw1', 40000) + def test_get_segment_tablespace_oid_locations_success(self, mock1, mock2, mock3): + get_segment_tablespace_oid_locations('sdw1', 40000) self.assertEqual(1, self.mock_logger.debug.call_count) self.assertEqual([call('Successfully got tablespace locations for segment with host sdw1, port 40000')], self.mock_logger.debug.call_args_list) diff --git a/gpMgmt/bin/gppylib/programs/clsRecoverSegment.py b/gpMgmt/bin/gppylib/programs/clsRecoverSegment.py index 0ceef210a637..e7088f758c06 100644 --- a/gpMgmt/bin/gppylib/programs/clsRecoverSegment.py +++ b/gpMgmt/bin/gppylib/programs/clsRecoverSegment.py @@ -98,7 +98,7 @@ def outputToFile(self, mirrorBuilder, gpArray, fileName): def getRecoveryActionsBasedOnOptions(self, gpEnv, gpArray): if self.__options.rebalanceSegments: - return GpSegmentRebalanceOperation(gpEnv, gpArray, self.__options.parallelDegree, self.__options.parallelPerHost) + return GpSegmentRebalanceOperation(gpEnv, gpArray, self.__options.parallelDegree, self.__options.parallelPerHost, self.__options.replayLag) else: instance = RecoveryTripletsFactory.instance(gpArray, self.__options.recoveryConfigFile, self.__options.newRecoverHosts, self.__options.parallelDegree) segs = [GpMirrorToBuild(t.failed, t.live, t.failover, self.__options.forceFullResynchronization, self.__options.differentialResynchronization) @@ -253,6 +253,17 @@ def run(self): if self.__options.differentialResynchronization and self.__options.outputSampleConfigFile: raise ProgramArgumentValidationException("Invalid -o provided with --differential argument") + if self.__options.replayLag and not self.__options.rebalanceSegments: + raise ProgramArgumentValidationException("--replay-lag should be used only with -r") + + # Checking rsync version before performing a differential recovery operation. + # the --info=progress2 option, which provides whole file transfer progress, requires rsync 3.1.0 or above + min_rsync_ver = "3.1.0" + if self.__options.differentialResynchronization and not unix.validate_rsync_version(min_rsync_ver): + raise ProgramArgumentValidationException("To perform a differential recovery, a minimum rsync version " + "of {0} is required. Please ensure that rsync is updated to " + "version {0} or higher.".format(min_rsync_ver)) + faultProberInterface.getFaultProber().initializeProber(gpEnv.getMasterPort()) confProvider = configInterface.getConfigurationProvider().initializeProvider(gpEnv.getMasterPort()) @@ -461,6 +472,9 @@ def createParser(): addTo.add_option("-r", None, default=False, action='store_true', dest='rebalanceSegments', help='Rebalance synchronized segments.') + addTo.add_option("--replay-lag", None, type="float", + dest="replayLag", + metavar="", help='Allowed replay lag on mirror, lag should be provided in GBs') addTo.add_option('', '--hba-hostnames', action='store_true', dest='hba_hostnames', help='use hostnames instead of CIDR in pg_hba.conf') diff --git a/gpMgmt/bin/gppylib/programs/clsSystemState.py b/gpMgmt/bin/gppylib/programs/clsSystemState.py index b9d3b49d00c7..7b147a3996d8 100644 --- a/gpMgmt/bin/gppylib/programs/clsSystemState.py +++ b/gpMgmt/bin/gppylib/programs/clsSystemState.py @@ -77,6 +77,7 @@ def __str__(self): return self.__name VALUE_RECOVERY_TOTAL_BYTES = FieldDefinition("Total bytes (kB)", "recovery_total_bytes", "int") VALUE_RECOVERY_PERCENTAGE = FieldDefinition("Percentage completed", "recovery_percentage", "int") VALUE_RECOVERY_TYPE = FieldDefinition("Recovery type", "recovery_type", "int") +VALUE_RECOVERY_STAGE = FieldDefinition("Stage", "recovery_stage", "text") CATEGORY__STATUS = "Status" VALUE__MASTER_REPORTS_STATUS = FieldDefinition("Configuration reports status as", "status_in_config", "text", "Config status") @@ -165,7 +166,7 @@ def __init__(self ): VALUE__ACTIVE_PID_INT, VALUE__POSTMASTER_PID_VALUE_INT, VALUE__POSTMASTER_PID_FILE, VALUE__POSTMASTER_PID_VALUE, VALUE__LOCK_FILES, VALUE_RECOVERY_COMPLETED_BYTES, VALUE_RECOVERY_TOTAL_BYTES, VALUE_RECOVERY_PERCENTAGE, - VALUE_RECOVERY_TYPE + VALUE_RECOVERY_TYPE, VALUE_RECOVERY_STAGE ]: self.__allValues[k] = True @@ -692,8 +693,14 @@ def logSegments(segments, logAsPairs, additionalFieldsToLog=[]): if segments_under_recovery: logger.info("----------------------------------------------------") logger.info("Segments in recovery") - logSegments(segments_under_recovery, False, [VALUE_RECOVERY_TYPE, VALUE_RECOVERY_COMPLETED_BYTES, VALUE_RECOVERY_TOTAL_BYTES, - VALUE_RECOVERY_PERCENTAGE]) + if data.getStrValue(segments_under_recovery[0], VALUE_RECOVERY_TYPE) == "differential": + logSegments(segments_under_recovery, False, + [VALUE_RECOVERY_TYPE, VALUE_RECOVERY_STAGE, VALUE_RECOVERY_COMPLETED_BYTES, + VALUE_RECOVERY_PERCENTAGE]) + else: + logSegments(segments_under_recovery, False, + [VALUE_RECOVERY_TYPE, VALUE_RECOVERY_COMPLETED_BYTES, VALUE_RECOVERY_TOTAL_BYTES, + VALUE_RECOVERY_PERCENTAGE]) exitCode = 1 # final output -- no errors, then log this message @@ -975,12 +982,26 @@ def _parse_recovery_progress_data(data, recovery_progress_file, gpArray): with open(recovery_progress_file, 'r') as fp: for line in fp: recovery_type, dbid, progress = line.strip().split(':',2) - pattern = re.compile(get_recovery_progress_pattern()) - if re.search(pattern, progress): - bytes, units, precentage_str = progress.strip().split(' ',2) - completed_bytes, total_bytes = bytes.split('/') - percentage = re.search(r'(\d+\%)', precentage_str).group() - recovery_progress_by_dbid[int(dbid)] = [recovery_type, completed_bytes, total_bytes, percentage] + # Define patterns for identifying different recovery types + rewind_bb_pattern = re.compile(get_recovery_progress_pattern()) + diff_pattern = re.compile(get_recovery_progress_pattern('differential')) + + # Check if the progress matches full,incremental or differential recovery patterns + if re.search(rewind_bb_pattern, progress) or re.search(diff_pattern, progress): + stage, total_bytes = "", "" + if recovery_type == "differential": + # Process differential recovery progress. + progress_parts = progress.strip().split(':') + stage = progress_parts[-1] + completed_bytes, percentage = progress_parts[0].split()[:2] + else: + # Process full or incremental recovery progress. + bytes, units, precentage_str = progress.strip().split(' ', 2) + completed_bytes, total_bytes = bytes.split('/') + percentage = re.search(r'(\d+\%)', precentage_str).group() + + recovery_progress_by_dbid[int(dbid)] = [recovery_type, completed_bytes, total_bytes, percentage, + stage] # Now the catalog update happens before we run recovery, # so now when we query gpArray here, it will have new address/port for the recovering segments @@ -990,12 +1011,20 @@ def _parse_recovery_progress_data(data, recovery_progress_file, gpArray): if dbid in recovery_progress_by_dbid.keys(): data.switchSegment(seg) recovery_progress_segs.append(seg) - recovery_type, completed_bytes, total_bytes, percentage = recovery_progress_by_dbid[dbid] + recovery_type, completed_bytes, total_bytes, percentage, stage = recovery_progress_by_dbid[dbid] + + # Add recovery progress values to GpstateData data.addValue(VALUE_RECOVERY_TYPE, recovery_type) data.addValue(VALUE_RECOVERY_COMPLETED_BYTES, completed_bytes) - data.addValue(VALUE_RECOVERY_TOTAL_BYTES, total_bytes) data.addValue(VALUE_RECOVERY_PERCENTAGE, percentage) + if recovery_type == "differential": + # If differential recovery, add stage information. + data.addValue(VALUE_RECOVERY_STAGE, stage) + else: + # If full or incremental, add total bytes' information. + data.addValue(VALUE_RECOVERY_TOTAL_BYTES, total_bytes) + return recovery_progress_segs diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckperf.py b/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckperf.py index 644790e41634..c7efb3e99081 100644 --- a/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckperf.py +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckperf.py @@ -1,13 +1,17 @@ import imp import os import sys -from mock import patch +from mock import patch, MagicMock from gppylib.test.unit.gp_unittest import GpTestCase,run_tests +from gppylib.util import ssh_utils class GpCheckPerf(GpTestCase): def setUp(self): - gpcheckcat_file = os.path.abspath(os.path.dirname(__file__) + "/../../../gpcheckperf") - self.subject = imp.load_source('gpcheckperf', gpcheckcat_file) + gpcheckperf_file = os.path.abspath(os.path.dirname(__file__) + "/../../../gpcheckperf") + self.subject = imp.load_source('gpcheckperf', gpcheckperf_file) + self.mocked_hostlist = MagicMock() + ssh_utils.HostList = MagicMock(return_value=self.mocked_hostlist) + def tearDown(self): super(GpCheckPerf, self).tearDown() @@ -83,13 +87,45 @@ def test_scp_enabled(self, mock_hostlist, mock_gpscp, mock_isScpEnabled): self.subject.main() mock_gpscp.assert_called_with(src, target) - def test_gpsync_failed_to_copy(self): + @patch('gpcheckperf.getHostList', return_value=['localhost', "invalid_host"]) + def test_gpsync_failed_to_copy(self, mock_hostlist): src = '%s/lib/multidd' % os.path.abspath(os.path.dirname(__file__) + "/../../../") target = '=:tmp/' - self.subject.GV.opt['-h'] = ['localhost', "invalid_host"] with self.assertRaises(SystemExit) as e: self.subject.gpsync(src, target) self.assertIn('[Error] command failed for host:invalid_host', e.exception.code) + + def test_get_host_list_with_host_file(self): + self.subject.GV.opt = {'-f': 'hostfile.txt', '-h': ['host1', 'host2']} + self.mocked_hostlist.filterMultiHomedHosts.return_value = ['host3', 'host4'] + + result = self.subject.getHostList() + + self.assertEqual(result, ['host3', 'host4']) + self.mocked_hostlist.parseFile.assert_called_with('hostfile.txt') + self.mocked_hostlist.checkSSH.assert_called() + + + def test_get_host_list_without_host_file(self): + self.subject.GV.opt = {'-f': '', '-h': ['host1', 'host2']} + self.mocked_hostlist.filterMultiHomedHosts.return_value = ['host1', 'host2'] + + result = self.subject.getHostList() + + self.assertEqual(result, ['host1', 'host2']) + self.mocked_hostlist.add.assert_any_call('host1') + self.mocked_hostlist.add.assert_any_call('host2') + self.mocked_hostlist.checkSSH.assert_called() + + + def test_get_host_list_with_ssh_error(self): + self.mocked_hostlist.checkSSH.side_effect = ssh_utils.SSHError("Test ssh error") + + with self.assertRaises(SystemExit) as e: + self.subject.getHostList() + + self.assertEqual(e.exception.code, '[Error] Test ssh error') + if __name__ == '__main__': run_tests() diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_gprecoverseg.py b/gpMgmt/bin/gppylib/test/unit/test_unit_gprecoverseg.py index 17130b791b1a..ebe8f73848b3 100644 --- a/gpMgmt/bin/gppylib/test/unit/test_unit_gprecoverseg.py +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_gprecoverseg.py @@ -24,6 +24,7 @@ def __init__(self): self.recoveryConfigFile = None self.outputSpareDataDirectoryFile = None self.rebalanceSegments = None + self.replayLag = None self.outputSampleConfigFile = None self.parallelDegree = 1 diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_gpsegrecovery.py b/gpMgmt/bin/gppylib/test/unit/test_unit_gpsegrecovery.py index e1918fa60b1a..56ba305cfa94 100644 --- a/gpMgmt/bin/gppylib/test/unit/test_unit_gpsegrecovery.py +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_gpsegrecovery.py @@ -478,8 +478,12 @@ def test_pg_stop_backup_success(self, mock1, mock2): self.mock_logger.debug.call_args_list) @patch('gppylib.db.catalog.RemoteQueryCommand.get_results', - return_value=[['/data/mytblspace1'], ['/data/mytblspace2']]) - def test_sync_tablespaces_outside_data_dir(self, mock): + return_value=[['1111','/data/mytblspace1'], ['2222','/data/mytblspace2']]) + @patch('gpsegrecovery.get_remote_link_path', + return_value='/data/mytblspace1/2') + @patch('os.listdir') + @patch('os.symlink') + def test_sync_tablespaces_outside_data_dir(self, mock1,mock2,mock3,mock4): self.diff_recovery_cmd.sync_tablespaces() self.assertEqual(2, self.mock_rsync_init.call_count) self.assertEqual(2, self.mock_rsync_run.call_count) @@ -488,8 +492,10 @@ def test_sync_tablespaces_outside_data_dir(self, mock): self.mock_logger.debug.call_args_list) @patch('gppylib.db.catalog.RemoteQueryCommand.get_results', - return_value=[['/data/mirror0']]) - def test_sync_tablespaces_within_data_dir(self, mock): + return_value=[['1234','/data/primary0']]) + @patch('os.listdir') + @patch('os.symlink') + def test_sync_tablespaces_within_data_dir(self, mock, mock2,mock3): self.diff_recovery_cmd.sync_tablespaces() self.assertEqual(0, self.mock_rsync_init.call_count) self.assertEqual(0, self.mock_rsync_run.call_count) @@ -497,8 +503,12 @@ def test_sync_tablespaces_within_data_dir(self, mock): self.mock_logger.debug.call_args_list) @patch('gppylib.db.catalog.RemoteQueryCommand.get_results', - return_value=[['/data/mirror0'], ['/data/mytblspace1']]) - def test_sync_tablespaces_mix_data_dir(self, mock): + return_value=[['1111','/data/primary0'], ['2222','/data/mytblspace1']]) + @patch('gpsegrecovery.get_remote_link_path', + return_value='/data/mytblspace1/2') + @patch('os.listdir') + @patch('os.symlink') + def test_sync_tablespaces_mix_data_dir(self, mock1, mock2, mock3,mock4): self.diff_recovery_cmd.sync_tablespaces() self.assertEqual(1, self.mock_rsync_init.call_count) self.assertEqual(1, self.mock_rsync_run.call_count) diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_gpstate.py b/gpMgmt/bin/gppylib/test/unit/test_unit_gpstate.py index bd5b24383464..8be17eb0b020 100644 --- a/gpMgmt/bin/gppylib/test/unit/test_unit_gpstate.py +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_gpstate.py @@ -61,11 +61,14 @@ def setUp(self): self.gpArrayMock = mock.MagicMock(spec=gparray.GpArray) self.gpArrayMock.getSegDbList.return_value = [self.primary1, self.primary2, self.primary3] - def check_recovery_fields(self, segment, type, completed, total, percentage): + def check_recovery_fields(self, segment, type, completed, total, percentage, stage=None): self.assertEqual(type, self.data.getStrValue(segment, VALUE_RECOVERY_TYPE)) self.assertEqual(completed, self.data.getStrValue(segment, VALUE_RECOVERY_COMPLETED_BYTES)) - self.assertEqual(total, self.data.getStrValue(segment, VALUE_RECOVERY_TOTAL_BYTES)) self.assertEqual(percentage, self.data.getStrValue(segment, VALUE_RECOVERY_PERCENTAGE)) + if type == "differential": + self.assertEqual(stage, self.data.getStrValue(segment, VALUE_RECOVERY_STAGE)) + else: + self.assertEqual(total, self.data.getStrValue(segment, VALUE_RECOVERY_TOTAL_BYTES)) def test_parse_recovery_progress_data_returns_empty_when_file_does_not_exist(self): self.assertEqual([], GpSystemStateProgram._parse_recovery_progress_data(self.data, '/file/does/not/exist', self.gpArrayMock)) @@ -88,12 +91,16 @@ def test_parse_recovery_progress_data_adds_recovery_progress_data_during_multipl with tempfile.NamedTemporaryFile() as f: f.write("full:1: 1164848/1371715 kB (0%), 0/1 tablespace (...t1/demoDataDir0/base/16384/40962)\n".encode("utf-8")) f.write("incremental:2: 1171384/1371875 kB (85%)anything can appear here".encode('utf-8')) + f.write("incremental:2: 1171384/1371875 kB (85%)anything can appear here\n".encode('utf-8')) + f.write( + "differential:3: 122,017,543 74% 74.02MB/s 0:00:01 (xfr#1994, to-chk=963/2979) :Syncing pg_data of dbid 1\n".encode( + "utf-8")) f.flush() - self.assertEqual([self.primary1, self.primary2], GpSystemStateProgram._parse_recovery_progress_data(self.data, f.name, self.gpArrayMock)) + self.assertEqual([self.primary1, self.primary2, self.primary3], GpSystemStateProgram._parse_recovery_progress_data(self.data, f.name, self.gpArrayMock)) self.check_recovery_fields(self.primary1,'full', '1164848', '1371715', '0%') self.check_recovery_fields(self.primary2, 'incremental', '1171384', '1371875', '85%') - self.check_recovery_fields(self.primary3, '', '', '', '') + self.check_recovery_fields(self.primary3, 'differential', '122,017,543', '', '74%', 'Syncing pg_data of dbid 1') def test_parse_recovery_progress_data_doesnt_adds_recovery_progress_data_only_for_completed_recoveries(self): with tempfile.NamedTemporaryFile() as f: @@ -126,6 +133,30 @@ def test_parse_recovery_progress_data_doesnt_adds_recovery_progress_data_only_fo self.check_recovery_fields(self.primary3, '', '', '', '') + def test_parse_recovery_progress_data_adds_differential_recovery_progress_data_during_single_recovery(self): + with tempfile.NamedTemporaryFile() as f: + f.write("differential:1: 38,861,653 7% 43.45MB/s 0:00:00 (xfr#635, ir-chk=9262/9919) :Syncing pg_data of dbid 1\n".encode("utf-8")) + f.flush() + self.assertEqual([self.primary1], GpSystemStateProgram._parse_recovery_progress_data(self.data, f.name, self.gpArrayMock)) + + self.check_recovery_fields(self.primary1, 'differential', '38,861,653', '', '7%', "Syncing pg_data of dbid 1") + self.check_recovery_fields(self.primary2, '', '', '', '') + self.check_recovery_fields(self.primary3, '', '', '', '') + + + def test_parse_recovery_progress_data_adds_differential_recovery_progress_data_during_multiple_recovery(self): + with tempfile.NamedTemporaryFile() as f: + f.write("differential:1: 38,861,653 7% 43.45MB/s 0:00:00 (xfr#635, ir-chk=9262/9919) :Syncing pg_data of dbid 1\n".encode("utf-8")) + f.write("differential:2: 122,017,543 74% 74.02MB/s 0:00:01 (xfr#1994, to-chk=963/2979) :Syncing tablespace of dbid 2 for oid 17934\n".encode("utf-8")) + f.write("differential:3: 122,017,543 (74%) 74.02MB/s 0:00:01 (xfr#1994, to-chk=963/2979) :Invalid format\n".encode("utf-8")) + f.flush() + self.assertEqual([self.primary1, self.primary2], GpSystemStateProgram._parse_recovery_progress_data(self.data, f.name, self.gpArrayMock)) + + self.check_recovery_fields(self.primary1, 'differential', '38,861,653', '', '7%', "Syncing pg_data of dbid 1") + self.check_recovery_fields(self.primary2, 'differential', '122,017,543', '', '74%', "Syncing tablespace of dbid 2 for oid 17934") + self.check_recovery_fields(self.primary3, '', '', '', '') + + class ReplicationInfoTestCase(unittest.TestCase): """ A test case for GpSystemStateProgram._add_replication_info(). diff --git a/gpMgmt/bin/gppylib/test/unit/test_unit_rebalance_segment.py b/gpMgmt/bin/gppylib/test/unit/test_unit_rebalance_segment.py index 39dafa75d203..2cda90f970ce 100644 --- a/gpMgmt/bin/gppylib/test/unit/test_unit_rebalance_segment.py +++ b/gpMgmt/bin/gppylib/test/unit/test_unit_rebalance_segment.py @@ -4,6 +4,7 @@ from gppylib.gparray import GpArray, Segment from gppylib.commands.base import CommandResult from gppylib.operations.rebalanceSegments import GpSegmentRebalanceOperation +from gppylib.operations.rebalanceSegments import replay_lag class RebalanceSegmentsTestCase(GpTestCase): @@ -11,10 +12,15 @@ def setUp(self): self.pool = Mock() self.pool.getCompletedItems.return_value = [] + mock_logger = Mock(spec=['log', 'warn', 'info', 'debug', 'error', 'warning', 'fatal']) + self.apply_patches([ patch("gppylib.commands.base.WorkerPool.__init__", return_value=None), patch("gppylib.commands.base.WorkerPool", return_value=self.pool), patch('gppylib.programs.clsRecoverSegment.GpRecoverSegmentProgram'), + patch('gppylib.operations.rebalanceSegments.logger', return_value=mock_logger), + patch('gppylib.db.dbconn.connect', autospec=True), + patch('gppylib.db.dbconn.execSQLForSingleton', return_value='5678') ]) self.mock_gp_recover_segment_prog_class = self.get_mock_from_apply_patch('GpRecoverSegmentProgram') @@ -32,8 +38,10 @@ def setUp(self): self.success_command_mock.get_results.return_value = CommandResult( 0, "stdout success text", "stderr text", True, False) - self.subject = GpSegmentRebalanceOperation(Mock(), self._create_gparray_with_2_primary_2_mirrors(), 1, 1) - self.subject.logger = Mock() + self.subject = GpSegmentRebalanceOperation(Mock(), self._create_gparray_with_2_primary_2_mirrors(), 1, 1, 10) + self.subject.logger = Mock(spec=['log', 'warn', 'info', 'debug', 'error', 'warning', 'fatal']) + + self.mock_logger = self.get_mock_from_apply_patch('logger') def tearDown(self): super(RebalanceSegmentsTestCase, self).tearDown() @@ -58,6 +66,37 @@ def test_rebalance_returns_failure(self): result = self.subject.rebalance() self.assertFalse(result) + @patch('gppylib.db.dbconn.execSQLForSingleton', return_value='56780000000') + def test_rebalance_returns_warning(self, mock1): + with self.assertRaises(Exception) as ex: + self.subject.rebalance() + self.assertEqual('56780000000 bytes of xlog is still to be replayed on mirror with dbid 2, let mirror catchup ' + 'on replay then trigger rebalance. Use --replay-lag to configure the allowed replay lag limit.' + , str(ex.exception)) + self.assertEqual([call("Get replay lag on mirror of primary segment with host:sdw1, port:40000")], + self.mock_logger.debug.call_args_list) + self.assertEqual([call("Determining primary and mirror segment pairs to rebalance"), + call('Allowed replay lag during rebalance is 10 GB')], + self.subject.logger.info.call_args_list) + + @patch('gppylib.db.dbconn.execSQLForSingleton', return_value='5678000000') + def test_rebalance_does_not_return_warning(self, mock1): + self.subject.rebalance() + self.assertEqual([call("Get replay lag on mirror of primary segment with host:sdw1, port:40000")], + self.mock_logger.debug.call_args_list) + + @patch('gppylib.db.dbconn.connect', side_effect=Exception()) + def test_replay_lag_connect_exception(self, mock1): + with self.assertRaises(Exception) as ex: + replay_lag(self.primary0) + self.assertEqual('Failed to query pg_stat_replication for host:sdw1, port:40000, error: ', str(ex.exception)) + + @patch('gppylib.db.dbconn.execSQLForSingleton', side_effect=Exception()) + def test_replay_lag_query_exception(self, mock1): + with self.assertRaises(Exception) as ex: + replay_lag(self.primary0) + self.assertEqual('Failed to query pg_stat_replication for host:sdw1, port:40000, error: ', str(ex.exception)) + def _create_gparray_with_2_primary_2_mirrors(self): master = Segment.initFromString( "1|-1|p|p|s|u|mdw|mdw|5432|/data/master") diff --git a/gpMgmt/doc/gprecoverseg_help b/gpMgmt/doc/gprecoverseg_help index 2e9fdf9ef5f4..7ac114c0e2f1 100755 --- a/gpMgmt/doc/gprecoverseg_help +++ b/gpMgmt/doc/gprecoverseg_help @@ -14,7 +14,7 @@ gprecoverseg [-p [,...]] [-F] [-a] [-q] [-s] [--no-progress] [-l ] -gprecoverseg -r +gprecoverseg -r [--replay-lag ] gprecoverseg -o @@ -243,6 +243,12 @@ their preferred roles. All segments must be valid and synchronized before running gprecoverseg -r. If there are any in progress queries, they will be cancelled and rolled back. +--replay-lag +Replay lag(in GBs) allowed on mirror when rebalancing the segments. If the replay_lag +(flush_lsn-replay_lsn) is more than the value provided with this option then rebalance +will be aborted. + + -s Show pg_rewind/pg_basebackup progress sequentially instead of inplace. Useful diff --git a/gpMgmt/sbin/gpsegrecovery.py b/gpMgmt/sbin/gpsegrecovery.py index 4ed2efa11369..65ccc57c091e 100644 --- a/gpMgmt/sbin/gpsegrecovery.py +++ b/gpMgmt/sbin/gpsegrecovery.py @@ -11,9 +11,11 @@ from gppylib.commands.gp import SegmentStart from gppylib.gparray import Segment from gppylib.commands.gp import ModifyConfSetting +from gppylib.db import dbconn from gppylib.db.catalog import RemoteQueryCommand from gppylib.operations.get_segments_in_recovery import is_seg_in_backup_mode -from gppylib.operations.segment_tablespace_locations import get_segment_tablespace_locations +from gppylib.operations.segment_tablespace_locations import get_segment_tablespace_oid_locations +from gppylib.commands.unix import get_remote_link_path class FullRecovery(Command): @@ -172,20 +174,27 @@ def sync_pg_data(self): "current_logfiles.tmp", "postmaster.pid", "postmaster.opts", - "pg_internal.init", "internal.auto.conf", "pg_dynshmem", + # tablespace_map file is generated on call of pg_start_backup on primary, this file contains the target link + # of the tablespace like 17264 /tmp/testtblspc/6.if we do not add this in exclude list the file will get + # copied to the mirror.and after recovery, if we start the segment, because of the presence of the tablespace_map + # file in mirror data_directory, it honors the file and recreates the symlinks as available in the tabespace_map file. + # but the problem here is as the tablespace_map file has the content from the primary segment + # it will create a wrong symlink for table space. + "tablespace_map", "pg_notify/*", "pg_replslot/*", "pg_serial/*", "pg_stat_tmp/*", "pg_snapshots/*", "pg_subtrans/*", + "pg_tblspc/*", # excluding as the tablespace is handled in sync_tablespaces() "backups/*", "/db_dumps", # as we exclude during pg_basebackup "gpperfmon/data", # as we exclude during pg_basebackup "gpperfmon/logs", # as we exclude during pg_basebackup - "/promote", # Need to check why do we exclude it during pg_basebackup + "/promote", # as we exclude during pg_basebackup ] """ Rsync options used: @@ -201,7 +210,8 @@ def sync_pg_data(self): # os.path.join(dir, "") will append a '/' at the end of dir. When using "/" at the end of source, # rsync will copy the content of the last directory. When not using "/" at the end of source, rsync # will copy the last directory and the content of the directory. - cmd = Rsync(name="Sync pg data_dir", srcFile=os.path.join(self.recovery_info.source_datadir, ""), + cmd = Rsync(name='Syncing pg_data of dbid {}'.format(self.recovery_info.target_segment_dbid), + srcFile=os.path.join(self.recovery_info.source_datadir, ""), dstFile=self.recovery_info.target_datadir, srcHost=self.recovery_info.source_hostname, exclude_list=rsync_exclude_list, delete=True, checksum=True, progress=True, progress_file=self.recovery_info.progress_file) @@ -250,7 +260,7 @@ def sync_xlog_and_control_file(self): # os.path.join(dir, "") will append a '/' at the end of dir. When using "/" at the end of source, # rsync will copy the content of the last directory. When not using "/" at the end of source, rsync # will copy the last directory and the content of the directory. - cmd = Rsync(name="Sync pg_xlog files", srcFile=os.path.join(self.recovery_info.source_datadir, "pg_xlog", ""), + cmd = Rsync(name="Syncing pg_xlog files of dbid {}".format(self.recovery_info.target_segment_dbid), srcFile=os.path.join(self.recovery_info.source_datadir, "pg_xlog", ""), dstFile=os.path.join(self.recovery_info.target_datadir, "pg_xlog", ""), progress=True, checksum=True, srcHost=self.recovery_info.source_hostname, progress_file=self.recovery_info.progress_file) @@ -269,24 +279,46 @@ def sync_tablespaces(self): "Syncing tablespaces of dbid {} which are outside of data_dir".format( self.recovery_info.target_segment_dbid)) - # get the tablespace locations - tablespaces = get_segment_tablespace_locations(self.recovery_info.source_hostname, + # get the oid and tablespace locations + tablespaces = get_segment_tablespace_oid_locations(self.recovery_info.source_hostname, self.recovery_info.source_port) - for tablespace_location in tablespaces: - if tablespace_location[0].startswith(self.recovery_info.target_datadir): - continue - # os.path.join(dir, "") will append a '/' at the end of dir. When using "/" at the end of source, - # rsync will copy the content of the last directory. When not using "/" at the end of source, rsync - # will copy the last directory and the content of the directory. - cmd = Rsync(name="Sync tablespace", - srcFile=os.path.join(tablespace_location[0], ""), - dstFile=tablespace_location[0], - srcHost=self.recovery_info.source_hostname, - progress=True, - checksum=True, - progress_file=self.recovery_info.progress_file) - cmd.run(validateAfter=True) + # clear all tablespace symlink for target. + for file in os.listdir(os.path.join(self.recovery_info.target_datadir,"pg_tblspc")): + file_path = os.path.join(self.recovery_info.target_datadir,"pg_tblspc",file) + try: + if os.path.isfile(file_path) or os.path.islink(file_path): + os.unlink(file_path) + except Exception as e: + raise Exception("Failed to remove link {} for dbid {} : {}". + format(file_path,self.recovery_info.target_segment_dbid, str(e))) + + for oid, tablespace_location in tablespaces: + # tablespace_location is the link path who's symlink is created at $DATADIR/pg_tblspc/{oid} + # tablespace_location is the base path in which datafiles are stored in respective dbid directory. + targetOidPath = os.path.join(self.recovery_info.target_datadir, "pg_tblspc", str(oid)) + targetPath = os.path.join(tablespace_location, str(self.recovery_info.target_segment_dbid)) + + #if tablespace is not inside the datadir do rsync for copy, if it is inside datadirectory + #files would have been copied while doing rsync for data dir. + if not tablespace_location.startswith(self.recovery_info.source_datadir): + srcOidPath = os.path.join(self.recovery_info.source_datadir, "pg_tblspc", str(oid)) + srcPath = get_remote_link_path(srcOidPath,self.recovery_info.source_hostname) + + # os.path.join(dir, "") will append a '/' at the end of dir. When using "/" at the end of source, + # rsync will copy the content of the last directory. When not using "/" at the end of source, rsync + # will copy the last directory and the content of the directory. + cmd = Rsync(name="Syncing tablespace of dbid {0} for oid {1}" .format(self.recovery_info.target_segment_dbid, str(oid)), + srcFile=os.path.join(srcPath, ""), + dstFile=targetPath, + srcHost=self.recovery_info.source_hostname, + progress=True, + checksum=True, + progress_file=self.recovery_info.progress_file) + cmd.run(validateAfter=True) + + # create tablespace symlink for target data directory. + os.symlink(targetPath, targetOidPath) def start_segment(recovery_info, logger, era): diff --git a/gpMgmt/test/behave/mgmt_utils/analyzedb.feature b/gpMgmt/test/behave/mgmt_utils/analyzedb.feature index 4165310d6426..a673f88a0897 100644 --- a/gpMgmt/test/behave/mgmt_utils/analyzedb.feature +++ b/gpMgmt/test/behave/mgmt_utils/analyzedb.feature @@ -1777,3 +1777,11 @@ Feature: Incrementally analyze the database And the user runs "dropdb schema_with_temp_table" And the user drops the named connection "default" + Scenario: analyzedb finds materialized views + Given a materialized view "public.mv_test_view" exists on table "pg_class" + And the user runs "analyzedb -a -d incr_analyze" + Then analyzedb should print "-public.mv_test_view" to stdout + And the user runs "analyzedb -a -s public -d incr_analyze" + Then analyzedb should print "-public.mv_test_view" to stdout + And the user runs "analyzedb -a -t public.mv_test_view -d incr_analyze" + Then analyzedb should print "-public.mv_test_view" to stdout diff --git a/gpMgmt/test/behave/mgmt_utils/gpcheckperf.feature b/gpMgmt/test/behave/mgmt_utils/gpcheckperf.feature index 7f1764c25502..a39977350b21 100644 --- a/gpMgmt/test/behave/mgmt_utils/gpcheckperf.feature +++ b/gpMgmt/test/behave/mgmt_utils/gpcheckperf.feature @@ -126,3 +126,16 @@ Feature: Tests for gpcheckperf Then gpcheckperf should return a return code of 0 And gpcheckperf should print "--buffer-size value is not specified or invalid. Using default \(32 kilobytes\)" to stdout And gpcheckperf should print "avg = " to stdout + + + @concourse_cluster + Scenario: gpcheckperf runs sequential network test with hostfile + Given the database is running + Given the user runs command "echo -e "cdw\nsdw1" > /tmp/hostfile_gpchecknet" + When the user runs "gpcheckperf -f /tmp/hostfile_gpchecknet -d /data/gpdata/ -r n" + Then gpcheckperf should return a return code of 0 + And gpcheckperf should print the following lines 1 times to stdout + """ + cdw -> sdw1 + sdw1 -> cdw + """ diff --git a/gpMgmt/test/behave/mgmt_utils/gpexpand.feature b/gpMgmt/test/behave/mgmt_utils/gpexpand.feature index 246d70d4a675..60515272ed80 100644 --- a/gpMgmt/test/behave/mgmt_utils/gpexpand.feature +++ b/gpMgmt/test/behave/mgmt_utils/gpexpand.feature @@ -229,6 +229,33 @@ Feature: expand the cluster by adding more segments When the user runs gpexpand to redistribute Then the tablespace is valid after gpexpand + @gpexpand_no_mirrors + Scenario: expand a cluster with tablespace when there is no tablespace configuration file + Given the database is not running + And a working directory of the test as '/data/gpdata/gpexpand' + And the user runs command "rm -rf /data/gpdata/gpexpand/*" + And a temporary directory under "/data/gpdata/gpexpand/expandedData" to expand into + And a cluster is created with no mirrors on "cdw" and "sdw1" + And database "gptest" exists + And a tablespace is created with data + And another tablespace is created with data + And there are no gpexpand_inputfiles + And the cluster is setup for an expansion on hosts "cdw" + And the user runs gpexpand interview to add 1 new segment and 0 new host "ignore.host" + And the number of segments have been saved + And there are no gpexpand tablespace input configuration files + When the user runs gpexpand with the latest gpexpand_inputfile without ret code check + Then gpexpand should return a return code of 1 + And gpexpand should print "[WARNING]:-Could not locate tablespace input configuration file" escaped to stdout + And gpexpand should print "A new tablespace input configuration file is written to" escaped to stdout + And gpexpand should print "Please review the file and re-run with: gpexpand -i" escaped to stdout + And verify if a gpexpand tablespace input configuration file is created + When the user runs gpexpand with the latest gpexpand_inputfile with additional parameters "--silent" + And verify that the cluster has 1 new segments + And all the segments are running + When the user runs gpexpand to redistribute + Then the tablespace is valid after gpexpand + @gpexpand_verify_redistribution Scenario: Verify data is correctly redistributed after expansion Given the database is not running diff --git a/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature b/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature index 9c1aed722251..d71151b2529d 100644 --- a/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature +++ b/gpMgmt/test/behave/mgmt_utils/gprecoverseg.feature @@ -1,16 +1,17 @@ @gprecoverseg Feature: gprecoverseg tests - Scenario Outline: recovery works with tablespaces + Scenario Outline: recovery works with tablespaces Given the database is running - And a tablespace is created with data And user stops all primary processes And user can start transactions + And a tablespace is created with data When the user runs "gprecoverseg " Then gprecoverseg should return a return code of 0 And the segments are synchronized And verify replication slot internal_wal_replication_slot is available on all the segments And the tablespace is valid + And the tablespace has valid symlink And the database segments are in execute mode Given another tablespace is created with data @@ -19,6 +20,7 @@ Feature: gprecoverseg tests And the segments are synchronized And verify replication slot internal_wal_replication_slot is available on all the segments And the tablespace is valid + And the tablespace has valid symlink And the other tablespace is valid And the database segments are in execute mode Examples: @@ -108,6 +110,51 @@ Feature: gprecoverseg tests And verify replication slot internal_wal_replication_slot is available on all the segments And the cluster is rebalanced + @concourse_cluster + Scenario: gpstate track of differential recovery for single host + Given the database is running + And all files in gpAdminLogs directory are deleted on all hosts in the cluster + And user immediately stops all mirror processes for content 0 + And the user waits until mirror on content 0 is down + And user can start transactions + And sql "DROP TABLE IF EXISTS test_recoverseg; CREATE TABLE test_recoverseg AS SELECT generate_series(1,100000000) AS a;" is executed in "postgres" db + And sql "DROP TABLE IF EXISTS test_recoverseg_1; CREATE TABLE test_recoverseg_1 AS SELECT generate_series(1,100000000) AS a;" is executed in "postgres" db + When the user asynchronously runs "gprecoverseg -a --differential" and the process is saved + Then the user waits until recovery_progress.file is created in gpAdminLogs and verifies that all dbids progress with pg_data are present + When the user runs "gpstate -e" + Then gpstate should print "Segments in recovery" to stdout + And gpstate output contains "differential" entries for mirrors of content 0 + And gpstate output looks like + | Segment | Port | Recovery type | Stage | Completed bytes \(kB\) | Percentage completed | + | \S+ | [0-9]+ | differential | Syncing pg_data of dbid 6 | ([\d,]+)[ \t] | \d+% | + And the user waits until saved async process is completed + And all files in gpAdminLogs directory are deleted on all hosts in the cluster + And sql "DROP TABLE IF EXISTS test_recoverseg;" is executed in "postgres" db + And sql "DROP TABLE IF EXISTS test_recoverseg_1;" is executed in "postgres" db + And the cluster is rebalanced + + + @concourse_cluster + Scenario: check Tablespace Recovery Progress with gpstate + Given the database is running + And all files in gpAdminLogs directory are deleted on all hosts in the cluster + And user immediately stops all mirror processes for content 0 + And user can start transactions + And a tablespace is created with data + And insert additional data into the tablespace + When the user asynchronously runs "gprecoverseg -a --differential" and the process is saved + Then the user waits until recovery_progress.file is created in gpAdminLogs and verifies that all dbids progress with tablespace are present + When the user runs "gpstate -e" + Then gpstate should print "Segments in recovery" to stdout + And gpstate output contains "differential" entries for mirrors of content 0 + And gpstate output looks like + | Segment | Port | Recovery type | Stage | Completed bytes \(kB\) | Percentage completed | + | \S+ | [0-9]+ | differential | Syncing tablespace of dbid 6 for oid \d+ | ([\d,]+)[ \t] | \d+% | + And the user waits until saved async process is completed + And all files in gpAdminLogs directory are deleted on all hosts in the cluster + And the cluster is rebalanced + + Scenario: full recovery works with tablespaces Given the database is running And a tablespace is created with data @@ -296,7 +343,33 @@ Feature: gprecoverseg tests And all the segments are running And the segments are synchronized - Scenario: gprecoverseg differential recovery displays rsync progress to the user + Scenario: gprecoverseg runs with given master data directory option + Given the database is running + And all the segments are running + And the segments are synchronized + And user stops all mirror processes + And user can start transactions + And "MASTER_DATA_DIRECTORY" environment variable is not set + Then the user runs utility "gprecoverseg" with master data directory and "-F -a" + And gprecoverseg should return a return code of 0 + And "MASTER_DATA_DIRECTORY" environment variable should be restored + And all the segments are running + And the segments are synchronized + + Scenario: gprecoverseg priorities given master data directory over env option + Given the database is running + And all the segments are running + And the segments are synchronized + And user stops all mirror processes + And user can start transactions + And the environment variable "MASTER_DATA_DIRECTORY" is set to "/tmp/" + Then the user runs utility "gprecoverseg" with master data directory and "-F -a" + And gprecoverseg should return a return code of 0 + And "MASTER_DATA_DIRECTORY" environment variable should be restored + And all the segments are running + And the segments are synchronized + + Scenario: gprecoverseg differential recovery displays rsync progress to the user Given the database is running And all the segments are running And the segments are synchronized @@ -600,6 +673,19 @@ Feature: gprecoverseg tests And gprecoverseg should return a return code of 0 Then the cluster is rebalanced + Scenario: gprecoverseg errors out with restricted options + Given the database is running + And user stops all primary processes + And user can start transactions + When the user runs "gprecoverseg xyz" + Then gprecoverseg should return a return code of 2 + And gprecoverseg should print "Recovers a primary or mirror segment instance" to stdout + And gprecoverseg should print "too many arguments: only options may be specified" to stdout + When the user runs "gprecoverseg -a" + Then gprecoverseg should return a return code of 0 + And the segments are synchronized + And the cluster is rebalanced + Scenario: gprecoverseg keeps segment logs Given the database is running And all the segments are running @@ -659,13 +745,14 @@ Feature: gprecoverseg tests @concourse_cluster Scenario Outline: incremental recovery works with tablespaces on a multi-host environment Given the database is running - And a tablespace is created with data And user stops all primary processes And user can start transactions + And a tablespace is created with data When the user runs "gprecoverseg " Then gprecoverseg should return a return code of 0 And the segments are synchronized And the tablespace is valid + And the tablespace has valid symlink And the database segments are in execute mode Given another tablespace is created with data @@ -674,6 +761,7 @@ Feature: gprecoverseg tests And the segments are synchronized And verify replication slot internal_wal_replication_slot is available on all the segments And the tablespace is valid + And the tablespace has valid symlink And the other tablespace is valid And the database segments are in execute mode Examples: @@ -721,6 +809,7 @@ Feature: gprecoverseg tests # verify the data And the tablespace is valid + And the tablespace has valid symlink And the row count from table "public.before_host_is_down" in "gptest" is verified against the saved data And the row count from table "public.after_host_is_down" in "gptest" is verified against the saved data @@ -1422,6 +1511,7 @@ Feature: gprecoverseg tests And the segments are synchronized And the backup pid file is deleted on "primary" segment And the background pid is killed on "primary" segment + Examples: | scenario | args | | differential | -a --differential | @@ -1889,6 +1979,30 @@ Feature: gprecoverseg tests Then gprecoverseg should return a return code of 0 And the cluster is rebalanced + @demo_cluster + @concourse_cluster + Scenario: gprecoverseg rebalance aborts and throws exception if replay lag on mirror is more than or equal to the allowed limit + Given the database is running + And all the segments are running + And the segments are synchronized + And all files in gpAdminLogs directory are deleted on all hosts in the cluster + And user immediately stops all primary processes for content 0 + And user can start transactions + When the user runs "gprecoverseg -av --replay-lag 10" + Then gprecoverseg should return a return code of 2 + And gprecoverseg should print "--replay-lag should be used only with -r" to stdout + When the user runs "gprecoverseg -av" + Then gprecoverseg should return a return code of 0 + When the user runs "gprecoverseg -ar --replay-lag 0" + Then gprecoverseg should return a return code of 2 + And gprecoverseg should print "Allowed replay lag during rebalance is 0.0 GB" to stdout + And gprecoverseg should print ".* bytes of xlog is still to be replayed on mirror with dbid.*, let mirror catchup on replay then trigger rebalance" regex to logfile + When the user runs "gprecoverseg -ar" + Then gprecoverseg should return a return code of 0 + And all the segments are running + And user can start transactions + + @remove_rsync_bash @concourse_cluster Scenario: None of the accumulated wal (after running pg_start_backup and before copying the pg_control file) is lost during differential @@ -1909,3 +2023,24 @@ Feature: gprecoverseg tests And user can start transactions Then the row count of table test_recoverseg in "postgres" should be 2000 And the cluster is recovered in full and rebalanced + + + @demo_cluster + @concourse_cluster + Scenario: Cleanup orphaned directory of dropped database after differential recovery + Given the database is running + And all the segments are running + And the segments are synchronized + And the user runs psql with "-c 'CREATE DATABASE test_orphan_dir'" against database "template1" + And save the information of the database "test_orphan_dir" + And the "primary" segment information is saved + And the primary on content 0 is stopped + And user can start transactions + And the user runs psql with "-c 'DROP DATABASE test_orphan_dir'" against database "template1" + When the user runs "gprecoverseg -a --differential" + Then gprecoverseg should return a return code of 0 + And the user runs psql with "-c 'SELECT gp_request_fts_probe_scan()'" against database "template1" + And the status of the primary on content 0 should be "u" + Then verify deletion of orphaned directory of the dropped database + And the cluster is rebalanced + diff --git a/gpMgmt/test/behave/mgmt_utils/gpstart.feature b/gpMgmt/test/behave/mgmt_utils/gpstart.feature index ce6787ebac22..4c7c21c74a8d 100644 --- a/gpMgmt/test/behave/mgmt_utils/gpstart.feature +++ b/gpMgmt/test/behave/mgmt_utils/gpstart.feature @@ -27,6 +27,32 @@ Feature: gpstart behave tests And gpstart should return a return code of 0 And all the segments are running + @demo_cluster + Scenario: gpstart runs with given master data directory option + Given the database is running + And running postgres processes are saved in context + And the user runs "gpstop -a" + And gpstop should return a return code of 0 + And verify no postgres process is running on all hosts + And "MASTER_DATA_DIRECTORY" environment variable is not set + Then the user runs utility "gpstart" with master data directory and "-a" + And gpstart should return a return code of 0 + And "MASTER_DATA_DIRECTORY" environment variable should be restored + And all the segments are running + + @demo_cluster + Scenario: gpstart priorities given master data directory over env option + Given the database is running + And running postgres processes are saved in context + And the user runs "gpstop -a" + And gpstop should return a return code of 0 + And verify no postgres process is running on all hosts + And the environment variable "MASTER_DATA_DIRECTORY" is set to "/tmp/" + Then the user runs utility "gpstart" with master data directory and "-a" + And gpstart should return a return code of 0 + And "MASTER_DATA_DIRECTORY" environment variable should be restored + And all the segments are running + @concourse_cluster @demo_cluster Scenario: gpstart starts even if a segment host is unreachable diff --git a/gpMgmt/test/behave/mgmt_utils/gpstate.feature b/gpMgmt/test/behave/mgmt_utils/gpstate.feature index d126cf8d191d..869b5c29abc1 100644 --- a/gpMgmt/test/behave/mgmt_utils/gpstate.feature +++ b/gpMgmt/test/behave/mgmt_utils/gpstate.feature @@ -596,6 +596,55 @@ Feature: gpstate tests And the pg_log files on primary segments should not contain "connections to primary segments are not allowed" And the user drops log_timestamp table + Scenario: gpstate runs with given master data directory option + Given the cluster is generated with "3" primaries only + And "MASTER_DATA_DIRECTORY" environment variable is not set + Then the user runs utility "gpstate" with master data directory and "-a -b" + And gpstate should return a return code of 0 + And gpstate output has rows with keys values + | Master instance = Active | + | Master standby = No master standby configured | + | Total segment instance count from metadata = 3 | + | Primary Segment Status | + | Total primary segments = 3 | + | Total primary segment valid \(at master\) = 3 | + | Total primary segment failures \(at master\) = 0 | + | Total number of postmaster.pid files missing = 0 | + | Total number of postmaster.pid files found = 3 | + | Total number of postmaster.pid PIDs missing = 0 | + | Total number of postmaster.pid PIDs found = 3 | + | Total number of /tmp lock files missing = 0 | + | Total number of /tmp lock files found = 3 | + | Total number postmaster processes missing = 0 | + | Total number postmaster processes found = 3 | + | Mirror Segment Status | + | Mirrors not configured on this array + And "MASTER_DATA_DIRECTORY" environment variable should be restored + + Scenario: gpstate priorities given master data directory over env option + Given the cluster is generated with "3" primaries only + And the environment variable "MASTER_DATA_DIRECTORY" is set to "/tmp/" + Then the user runs utility "gpstate" with master data directory and "-a -b" + And gpstate should return a return code of 0 + And gpstate output has rows with keys values + | Master instance = Active | + | Master standby = No master standby configured | + | Total segment instance count from metadata = 3 | + | Primary Segment Status | + | Total primary segments = 3 | + | Total primary segment valid \(at master\) = 3 | + | Total primary segment failures \(at master\) = 0 | + | Total number of postmaster.pid files missing = 0 | + | Total number of postmaster.pid files found = 3 | + | Total number of postmaster.pid PIDs missing = 0 | + | Total number of postmaster.pid PIDs found = 3 | + | Total number of /tmp lock files missing = 0 | + | Total number of /tmp lock files found = 3 | + | Total number postmaster processes missing = 0 | + | Total number postmaster processes found = 3 | + | Mirror Segment Status | + | Mirrors not configured on this array + And "MASTER_DATA_DIRECTORY" environment variable should be restored ########################### @concourse_cluster tests ########################### # The @concourse_cluster tag denotes the scenario that requires a remote cluster @@ -607,3 +656,21 @@ Feature: gpstate tests And the user runs command "unset PGDATABASE && $GPHOME/bin/gpstate -e -v" Then command should print "pg_isready -q -h .* -p .* -d postgres" to stdout And command should print "All segments are running normally" to stdout + + + Scenario: gpstate -e shows information about segments with ongoing differential recovery + Given a standard local demo cluster is running + Given all files in gpAdminLogs directory are deleted + And a sample recovery_progress.file is created with ongoing differential recoveries in gpAdminLogs + And we run a sample background script to generate a pid on "master" segment + And a sample gprecoverseg.lock directory is created using the background pid in master_data_directory + When the user runs "gpstate -e" + Then gpstate should print "Segments in recovery" to stdout + And gpstate output contains "differential,differential" entries for mirrors of content 0,1 + And gpstate output looks like + | Segment | Port | Recovery type | Stage | Completed bytes \(kB\) | Percentage completed | + | \S+ | [0-9]+ | differential | Syncing pg_data of dbid 5 | 16,454,866 | 4% | + | \S+ | [0-9]+ | differential | Syncing tablespace of dbid 6 for oid 20516 | 8,192 | 100% | + And all files in gpAdminLogs directory are deleted + And the background pid is killed on "master" segment + And the gprecoverseg lock directory is removed diff --git a/gpMgmt/test/behave/mgmt_utils/gpstop.feature b/gpMgmt/test/behave/mgmt_utils/gpstop.feature index bc1bae6f029c..3336fd5be3ed 100644 --- a/gpMgmt/test/behave/mgmt_utils/gpstop.feature +++ b/gpMgmt/test/behave/mgmt_utils/gpstop.feature @@ -10,6 +10,26 @@ Feature: gpstop behave tests Then gpstop should return a return code of 0 And verify no postgres process is running on all hosts + @demo_cluster + Scenario: gpstop runs with given master data directory option + Given the database is running + And running postgres processes are saved in context + And "MASTER_DATA_DIRECTORY" environment variable is not set + Then the user runs utility "gpstop" with master data directory and "-a" + And gpstop should return a return code of 0 + And "MASTER_DATA_DIRECTORY" environment variable should be restored + And verify no postgres process is running on all hosts + + @demo_cluster + Scenario: gpstop priorities given master data directory over env option + Given the database is running + And running postgres processes are saved in context + And the environment variable "MASTER_DATA_DIRECTORY" is set to "/tmp/" + Then the user runs utility "gpstop" with master data directory and "-a" + And gpstop should return a return code of 0 + And "MASTER_DATA_DIRECTORY" environment variable should be restored + And verify no postgres process is running on all hosts + @concourse_cluster @demo_cluster Scenario: when there are user connections gpstop waits to shutdown until user switches to fast mode diff --git a/gpMgmt/test/behave/mgmt_utils/steps/analyzedb_mgmt_utils.py b/gpMgmt/test/behave/mgmt_utils/steps/analyzedb_mgmt_utils.py index f418cdbd703f..2ce4aacce7d6 100644 --- a/gpMgmt/test/behave/mgmt_utils/steps/analyzedb_mgmt_utils.py +++ b/gpMgmt/test/behave/mgmt_utils/steps/analyzedb_mgmt_utils.py @@ -35,7 +35,6 @@ """ - @given('there is a regular "{storage_type}" table "{tablename}" with column name list "{col_name_list}" and column type list "{col_type_list}" in schema "{schemaname}"') def impl(context, storage_type, tablename, col_name_list, col_type_list, schemaname): schemaname_no_quote = schemaname @@ -93,6 +92,12 @@ def impl(context, view_name, table_name): create_view_on_table(context.conn, view_name, table_name) +@given('a materialized view "{view_name}" exists on table "{table_name}"') +def impl(context, view_name, table_name): + create_materialized_view_on_table_in_schema(context.conn, viewname=view_name, + tablename=table_name) + + @given('"{qualified_table}" appears in the latest state files') @then('"{qualified_table}" should appear in the latest state files') def impl(context, qualified_table): @@ -448,3 +453,11 @@ def create_view_on_table(conn, viewname, tablename): " AS SELECT * FROM " + tablename dbconn.execSQL(conn, query) conn.commit() + + +def create_materialized_view_on_table_in_schema(conn, tablename, viewname): + query = "DROP MATERIALIZED VIEW IF EXISTS " + viewname + ";" \ + "CREATE MATERIALIZED VIEW " + viewname + \ + " AS SELECT * FROM " + tablename + dbconn.execSQL(conn, query) + conn.commit() diff --git a/gpMgmt/test/behave/mgmt_utils/steps/gpstate_utils.py b/gpMgmt/test/behave/mgmt_utils/steps/gpstate_utils.py index e8778493852c..caefcd36623a 100644 --- a/gpMgmt/test/behave/mgmt_utils/steps/gpstate_utils.py +++ b/gpMgmt/test/behave/mgmt_utils/steps/gpstate_utils.py @@ -66,8 +66,12 @@ def impl(context, recovery_types, contents): for index, seg_to_display in enumerate(segments_to_display): hostname = seg_to_display.getSegmentHostName() port = seg_to_display.getSegmentPort() - expected_msg = "{}[ \t]+{}[ \t]+{}[ \t]+[0-9]+[ \t]+[0-9]+[ \t]+[0-9]+\%".format(hostname, port, - recovery_types[index]) + if recovery_types[index] == "differential": + expected_msg = "{}[ \t]+{}[ \t]+{}[ \t]+(.+?)[ \t]+([\d,]+)[ \t]+[0-9]+\%".format(hostname, port, + recovery_types[index]) + else: + expected_msg = "{}[ \t]+{}[ \t]+{}[ \t]+[0-9]+[ \t]+[0-9]+[ \t]+[0-9]+\%".format(hostname, port, + recovery_types[index]) check_stdout_msg(context, expected_msg) #TODO assert that only segments_to_display are printed to the console @@ -125,3 +129,14 @@ def check_stdout_msg_in_order(context, msg): context.stdout_position = match.end() + +@given('a sample recovery_progress.file is created with ongoing differential recoveries in gpAdminLogs') +def impl(context): + with open('{}/gpAdminLogs/recovery_progress.file'.format(os.path.expanduser("~")), 'w+') as fp: + fp.write( + "differential:5: 16,454,866 4% 16.52MB/s 0:00:00 (xfr#216, ir-chk=9669/9907) :Syncing pg_data " + "of dbid 5\n") + fp.write("differential:6: 8,192 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=0/1) :Syncing tablespace of " + "dbid 6 for oid 20516") + + diff --git a/gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py b/gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py index f4d70f534702..e0dfab003c68 100644 --- a/gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py +++ b/gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py @@ -362,7 +362,7 @@ def impl(context, dbname): drop_database(context, dbname) -@given('{env_var} environment variable is not set') +@given('"{env_var}" environment variable is not set') def impl(context, env_var): if not hasattr(context, 'orig_env'): context.orig_env = dict() @@ -459,7 +459,7 @@ def impl(context, logdir): with open(recovery_progress_file, 'r') as fp: context.recovery_lines = fp.readlines() for line in context.recovery_lines: - recovery_type, dbid, progress = line.strip().split(':', 2) + recovery_type, dbid, progress = line.strip().split(':')[:3] progress_pattern = re.compile(get_recovery_progress_pattern(recovery_type)) # TODO: assert progress line in the actual hosts bb/rewind progress file if re.search(progress_pattern, progress) and dbid.isdigit() and recovery_type in ['full', 'differential', 'incremental']: @@ -1200,6 +1200,15 @@ def impl(context, options): context.execute_steps(u'''Then the user runs command "gpactivatestandby -a %s" from standby master''' % options) context.standby_was_activated = True + +@given('the user runs utility "{utility}" with master data directory and "{options}"') +@when('the user runs utility "{utility}" with master data directory and "{options}"') +@then('the user runs utility "{utility}" with master data directory and "{options}"') +def impl(context, utility, options): + cmd = "{} -d {} {}".format(utility, master_data_dir, options) + context.execute_steps(u'''then the user runs command "%s"''' % cmd ) + + @then('gpintsystem logs should {contain} lines about running backout script') def impl(context, contain): string_to_find = 'Run command bash .*backout_gpinitsystem.* on master to remove these changes$' @@ -2874,6 +2883,20 @@ def impl(context, command, target): if target not in contents: raise Exception("cannot find %s in %s" % (target, filename)) + +@then('{command} should print "{target}" regex to logfile') +def impl(context, command, target): + log_dir = _get_gpAdminLogs_directory() + filename = glob.glob('%s/%s_*.log' % (log_dir, command))[0] + contents = '' + with open(filename) as fr: + for line in fr: + contents += line + + pat = re.compile(target) + if not pat.search(contents): + raise Exception("cannot find %s in %s" % (target, filename)) + @given('verify that a role "{role_name}" exists in database "{dbname}"') @then('verify that a role "{role_name}" exists in database "{dbname}"') def impl(context, role_name, dbname): @@ -3041,6 +3064,17 @@ def impl(context, num_of_segments, num_of_hosts, hostnames): def impl(context): map(os.remove, glob.glob("gpexpand_inputfile*")) +@given('there are no gpexpand tablespace input configuration files') +def impl(context): + list(map(os.remove, glob.glob("{}/*.ts".format(context.working_directory)))) + if len(glob.glob('{}/*.ts'.format(context.working_directory))) != 0: + raise Exception("expected no gpexpand tablespace input configuration files") + +@then('verify if a gpexpand tablespace input configuration file is created') +def impl(context): + if len(glob.glob('{}/*.ts'.format(context.working_directory))) != 1: + raise Exception("expected gpexpand tablespace input configuration file to be created") + @when('the user runs gpexpand with the latest gpexpand_inputfile with additional parameters {additional_params}') def impl(context, additional_params=''): gpexpand = Gpexpand(context, working_directory=context.working_directory) @@ -4281,6 +4315,69 @@ def impl(context, table, dbname, count): raise Exception( "%s table in %s has %d rows, expected %d rows." % (table, dbname, sum(current_row_count), int(count))) +@then('{command} should print the following lines {num} times to stdout') +def impl(context, command, num): + """ + Verify that each pattern occurs a specific number of times in the output. + """ + expected_lines = context.text.strip().split('\n') + for expected_pattern in expected_lines: + match_count = len(re.findall(re.escape(expected_pattern), context.stdout_message)) + if match_count != int(num): + raise Exception( + "Expected %s to occur %s times but Found %d times" .format(expected_pattern, num, match_count)) + + + + +@given('save the information of the database "{dbname}"') +def impl(context, dbname): + with dbconn.connect(dbconn.DbURL(dbname='template1'), unsetSearchPath=False) as conn: + query = """SELECT datname,oid FROM pg_database WHERE datname='{0}';""" .format(dbname) + datname, oid = dbconn.execSQLForSingletonRow(conn, query) + context.db_name = datname + context.db_oid = oid + + + + +@then('the user waits until recovery_progress.file is created in {logdir} and verifies that all dbids progress with {stage} are present') +def impl(context, logdir, stage): + all_segments = GpArray.initFromCatalog(dbconn.DbURL()).getDbList() + failed_segments = filter(lambda seg: seg.getSegmentStatus() == 'd', all_segments) + stage_patterns = [] + for seg in failed_segments: + dbid = seg.getSegmentDbId() + if stage == "tablespace": + pat = "Syncing tablespace of dbid {} for oid".format(dbid) + else: + pat = "differential:{}" .format(dbid) + stage_patterns.append(pat) + if len(stage_patterns) == 0: + raise Exception('Failed to get the details of down segment') + attempt = 0 + num_retries = 9000 + log_dir = _get_gpAdminLogs_directory() if logdir == 'gpAdminLogs' else logdir + recovery_progress_file = '{}/recovery_progress.file'.format(log_dir) + while attempt < num_retries: + attempt += 1 + if os.path.exists(recovery_progress_file): + if verify_elements_in_file(recovery_progress_file, stage_patterns): + return + time.sleep(0.1) + if attempt == num_retries: + raise Exception('Timed out after {} retries'.format(num_retries)) + + +def verify_elements_in_file(filename, elements): + with open(filename, 'r') as file: + content = file.read() + for element in elements: + if element not in content: + return False + + return True + @given('"LC_ALL" is different from English') def step_impl(context): default_locale = os.environ.get('LC_ALL') diff --git a/gpMgmt/test/behave/mgmt_utils/steps/recoverseg_mgmt_utils.py b/gpMgmt/test/behave/mgmt_utils/steps/recoverseg_mgmt_utils.py index 229b2002c5f9..5c0c1588fc52 100644 --- a/gpMgmt/test/behave/mgmt_utils/steps/recoverseg_mgmt_utils.py +++ b/gpMgmt/test/behave/mgmt_utils/steps/recoverseg_mgmt_utils.py @@ -805,3 +805,15 @@ def get_host_address(hostname): return host_address[0] +@then('verify deletion of orphaned directory of the dropped database') +def impl(context): + hostname = context.pseg_hostname + db_data_dir = "{0}/base/{1}".format(context.pseg_data_dir, context.db_oid) + cmd = Command("list directory", cmdStr="test -d {}".format(db_data_dir), ctxt=REMOTE, remoteHost=hostname) + cmd.run() + rc = cmd.get_return_code() + if rc == 0: + raise Exception('Orphaned directory:"{0}" of dropped database:"{1}" exists on host:"{2}"' .format(db_data_dir, + context.db_name, hostname)) + + diff --git a/gpMgmt/test/behave/mgmt_utils/steps/tablespace_mgmt_utils.py b/gpMgmt/test/behave/mgmt_utils/steps/tablespace_mgmt_utils.py index fc05cb3ce8c4..a5b4a9d35443 100644 --- a/gpMgmt/test/behave/mgmt_utils/steps/tablespace_mgmt_utils.py +++ b/gpMgmt/test/behave/mgmt_utils/steps/tablespace_mgmt_utils.py @@ -1,6 +1,6 @@ import pipes import tempfile -import time +import os from behave import given, then from pygresql import pg @@ -9,6 +9,8 @@ from gppylib.gparray import GpArray from test.behave_utils.utils import run_cmd,wait_for_database_dropped from gppylib.commands.base import Command, REMOTE +from gppylib.commands.unix import get_remote_link_path +from contextlib import closing class Tablespace: def __init__(self, name): @@ -72,6 +74,32 @@ def verify(self, hostname=None, port=0): raise Exception("Tablespace data is not identically distributed. Expected:\n%r\n but found:\n%r" % ( sorted(self.initial_data), sorted(data))) + def verify_symlink(self, hostname=None, port=0): + url = dbconn.DbURL(hostname=hostname, port=port, dbname=self.dbname) + gparray = GpArray.initFromCatalog(url) + all_segments = gparray.getDbList() + + # fetching oid of available user created tablespaces + with closing(dbconn.connect(url, unsetSearchPath=False)) as conn: + tblspc_oids = dbconn.execSQL(conn, "SELECT oid FROM pg_tablespace WHERE spcname NOT IN ('pg_default', 'pg_global')").fetchall() + + if not tblspc_oids: + return None # no table space is present + + # keeping a list to check if any of the symlink has duplicate entry + tblspc = [] + for seg in all_segments: + for tblspc_oid in tblspc_oids: + symlink_path = os.path.join(seg.getSegmentTableSpaceDirectory(), str(tblspc_oid[0])) + target_path = get_remote_link_path(symlink_path, seg.getSegmentHostName()) + segDbId = seg.getSegmentDbId() + #checking for duplicate and wrong symlink target + if target_path in tblspc or os.path.basename(target_path) != str(segDbId): + raise Exception("tablespac has invalid/duplicate symlink for oid {0} in segment dbid {1}".\ + format(str(tblspc_oid[0]),str(segDbId))) + + tblspc.append(target_path) + def verify_for_gpexpand(self, hostname=None, port=0): """ For gpexpand, we need make sure: @@ -99,6 +127,14 @@ def verify_for_gpexpand(self, hostname=None, port=0): "Expected pre-gpexpand data:\n%\n but found post-gpexpand data:\n%r" % ( sorted(self.initial_data), sorted(data))) + def insert_more_data(self): + with dbconn.connect(dbconn.DbURL(dbname=self.dbname), unsetSearchPath=False) as conn: + db = pg.DB(conn) + db.query("CREATE TABLE tbl_1 (i int) DISTRIBUTED RANDOMLY") + db.query("INSERT INTO tbl_1 VALUES (GENERATE_SERIES(0, 100000000))") + db.query("CREATE TABLE tbl_2 (i int) DISTRIBUTED RANDOMLY") + db.query("INSERT INTO tbl_2 VALUES (GENERATE_SERIES(0, 100000000))") + def _checkpoint_and_wait_for_replication_replay(db): """ @@ -191,6 +227,9 @@ def _create_tablespace_with_data(context, name): def impl(context): context.tablespaces["outerspace"].verify() +@then('the tablespace has valid symlink') +def impl(context): + context.tablespaces["outerspace"].verify_symlink() @then('the tablespace is valid on the standby master') def impl(context): @@ -212,3 +251,8 @@ def impl(context): for tablespace in context.tablespaces.values(): tablespace.cleanup() context.tablespaces = {} + +@given('insert additional data into the tablespace') +def impl(context): + context.tablespaces["outerspace"].insert_more_data() + diff --git a/gpcontrib/Makefile b/gpcontrib/Makefile index 182a210129f7..1ef54b62b414 100644 --- a/gpcontrib/Makefile +++ b/gpcontrib/Makefile @@ -25,6 +25,7 @@ ifeq "$(enable_debug_extensions)" "yes" gp_percentile_agg \ gp_error_handling \ gp_subtransaction_overflow \ + gp_check_functions \ arenadata_toolkit else recurse_targets = gp_sparse_vector \ @@ -37,6 +38,7 @@ else gp_percentile_agg \ gp_error_handling \ gp_subtransaction_overflow \ + gp_check_functions \ arenadata_toolkit endif @@ -101,5 +103,6 @@ installcheck: $(MAKE) -C gp_sparse_vector installcheck $(MAKE) -C gp_percentile_agg installcheck $(MAKE) -C gp_subtransaction_overflow installcheck + $(MAKE) -C gp_check_functions installcheck $(MAKE) -C arenadata_toolkit installcheck diff --git a/gpcontrib/arenadata_toolkit/Makefile b/gpcontrib/arenadata_toolkit/Makefile index 97a38ebfdcc2..d5d74bddbdee 100644 --- a/gpcontrib/arenadata_toolkit/Makefile +++ b/gpcontrib/arenadata_toolkit/Makefile @@ -3,11 +3,12 @@ MODULES = arenadata_toolkit EXTENSION = arenadata_toolkit -EXTENSION_VERSION = 1.2 +EXTENSION_VERSION = 1.3 DATA = \ arenadata_toolkit--1.0.sql \ arenadata_toolkit--1.0--1.1.sql \ - arenadata_toolkit--1.1--1.2.sql + arenadata_toolkit--1.1--1.2.sql \ + arenadata_toolkit--1.2--1.3.sql DATA_built = $(EXTENSION)--$(EXTENSION_VERSION).sql @@ -15,7 +16,7 @@ $(DATA_built): $(DATA) cat $(DATA) > $(DATA_built) REGRESS = arenadata_toolkit_test arenadata_toolkit_skew_test adb_get_relfilenodes_test \ - adb_collect_table_stats_test + adb_collect_table_stats_test adb_vacuum_strategy_test adb_relation_storage_size_test REGRESS_OPTS += --init-file=$(top_srcdir)/src/test/regress/init_file ifdef USE_PGXS diff --git a/gpcontrib/arenadata_toolkit/arenadata_toolkit--1.2--1.3.sql b/gpcontrib/arenadata_toolkit/arenadata_toolkit--1.2--1.3.sql new file mode 100644 index 000000000000..bd787bc4bd92 --- /dev/null +++ b/gpcontrib/arenadata_toolkit/arenadata_toolkit--1.2--1.3.sql @@ -0,0 +1,62 @@ +/* gpcontrib/arenadata_toolkit/arenadata_toolkit--1.2--1.3.sql */ + +/* + * Returns columns (table_schema, table_name) ordered by increasing vacuum time. In this + * list, if newest_first is true, then tables that are not yet vacuumed are located first, + * and already vacuumed - at the end, else (newest_first is false) tables that are already + * vacuumed are located first, and tables that are not yet vacuumed are located at the end. + */ +CREATE FUNCTION arenadata_toolkit.adb_vacuum_strategy(actionname TEXT, newest_first BOOLEAN) +RETURNS TABLE (table_schema NAME, table_name NAME) AS +$func$ +BEGIN + RETURN query EXECUTE format($$ + SELECT nspname, relname + FROM pg_catalog.pg_class c + JOIN pg_catalog.pg_namespace n ON relnamespace = n.oid + LEFT JOIN pg_catalog.pg_partition_rule ON parchildrelid = c.oid + LEFT JOIN pg_catalog.pg_stat_last_operation ON staactionname = UPPER(%L) + AND objid = c.oid AND classid = 'pg_catalog.pg_class'::pg_catalog.regclass + WHERE relkind = 'r' AND relstorage != 'x' AND parchildrelid IS NULL + AND nspname NOT IN (SELECT schema_name FROM arenadata_toolkit.operation_exclude) + ORDER BY statime ASC NULLS %s + $$, actionname, CASE WHEN newest_first THEN 'FIRST' ELSE 'LAST' END); +END; +$func$ LANGUAGE plpgsql STABLE EXECUTE ON MASTER; + +/* + * Only for admin usage. + */ +REVOKE ALL ON FUNCTION arenadata_toolkit.adb_vacuum_strategy(TEXT, BOOLEAN) FROM public; + +/* + * Returns columns (table_schema, table_name) ordered by increasing vacuum time. + * In this list, tables that are not yet vacuumed are located first, + * and already vacuumed - at the end (default strategy). + */ +CREATE FUNCTION arenadata_toolkit.adb_vacuum_strategy_newest_first(actionname TEXT) +RETURNS TABLE (table_schema NAME, table_name NAME) AS +$$ + SELECT arenadata_toolkit.adb_vacuum_strategy(actionname, true); +$$ LANGUAGE sql STABLE EXECUTE ON MASTER; + +/* + * Only for admin usage. + */ +REVOKE ALL ON FUNCTION arenadata_toolkit.adb_vacuum_strategy_newest_first(TEXT) FROM public; + +/* + * Returns columns (table_schema, table_name) ordered by increasing vacuum time. + * In this list, tables that are already vacuumed are located first, + * and tables that are not yet vacuumed are located at the end. + */ +CREATE FUNCTION arenadata_toolkit.adb_vacuum_strategy_newest_last(actionname TEXT) +RETURNS TABLE (table_schema NAME, table_name NAME) AS +$$ + SELECT arenadata_toolkit.adb_vacuum_strategy(actionname, false); +$$ LANGUAGE sql STABLE EXECUTE ON MASTER; + +/* + * Only for admin usage. + */ +REVOKE ALL ON FUNCTION arenadata_toolkit.adb_vacuum_strategy_newest_last(TEXT) FROM public; diff --git a/gpcontrib/arenadata_toolkit/arenadata_toolkit.c b/gpcontrib/arenadata_toolkit/arenadata_toolkit.c index db99c45de79c..3625d7add4e5 100644 --- a/gpcontrib/arenadata_toolkit/arenadata_toolkit.c +++ b/gpcontrib/arenadata_toolkit/arenadata_toolkit.c @@ -39,6 +39,7 @@ static int64 get_ao_storage_total_bytes(Relation rel, char *relpath); static bool calculate_ao_storage_perSegFile(const int segno, void *ctx); static void fill_relation_seg_path(char *buf, int bufLen, const char *relpath, int segNo); +static int64 calculate_toast_table_size(Oid toastrelid, ForkNumber forknum); /* * Structure used to accumulate the size of AO/CO relation from callback. @@ -85,6 +86,9 @@ adb_relation_storage_size(PG_FUNCTION_ARGS) size += get_size_from_segDBs(sql); } + if (OidIsValid(rel->rd_rel->reltoastrelid)) + size += calculate_toast_table_size(rel->rd_rel->reltoastrelid, forkNumber); + relation_close(rel, AccessShareLock); PG_RETURN_INT64(size); @@ -147,6 +151,23 @@ calculate_ao_storage_perSegFile(const int segno, void *ctx) return true; } +/* + * Calculate total on-disk size of a TOAST relation. + * Must not be applied to non-TOAST relations. + * + * The code is based on calculate_toast_table_size from dbsize.c, but without + * calculating size of toast's indexes. + */ +static int64 +calculate_toast_table_size(Oid toastrelid, ForkNumber forknum) +{ + Relation toastRel = relation_open(toastrelid, AccessShareLock); + int64 size = calculate_relation_size(toastRel, forknum); + + relation_close(toastRel, AccessShareLock); + return size; +} + /* * Function calculates the size of heap tables. * diff --git a/gpcontrib/arenadata_toolkit/arenadata_toolkit.control b/gpcontrib/arenadata_toolkit/arenadata_toolkit.control index 69986410bab2..505f3ad1ad62 100644 --- a/gpcontrib/arenadata_toolkit/arenadata_toolkit.control +++ b/gpcontrib/arenadata_toolkit/arenadata_toolkit.control @@ -1,5 +1,5 @@ # arenadata_toolkit extension comment = 'extension is used for manipulation of objects created by adb-bundle' -default_version = '1.2' +default_version = '1.3' module_pathname = '$libdir/arenadata_toolkit' relocatable = false diff --git a/gpcontrib/arenadata_toolkit/expected/adb_relation_storage_size_test.out b/gpcontrib/arenadata_toolkit/expected/adb_relation_storage_size_test.out new file mode 100644 index 000000000000..a464250ab4f1 --- /dev/null +++ b/gpcontrib/arenadata_toolkit/expected/adb_relation_storage_size_test.out @@ -0,0 +1,95 @@ +CREATE EXTENSION arenadata_toolkit; +CREATE TABLE heap_table_with_toast(a INT, b TEXT) +DISTRIBUTED BY (a); +CREATE TABLE heap_table_without_toast(a INT, b INT) +DISTRIBUTED BY (a); +CREATE TABLE ao_table_with_toast(a INT, b TEXT) +WITH (APPENDOPTIMIZED=true) +DISTRIBUTED BY (a); +CREATE TABLE ao_table_without_toast(a INT, b INT) +WITH (APPENDOPTIMIZED=true) +DISTRIBUTED BY (a); +-- Check that toast exists only for "with_toast" tables +SELECT relname, reltoastrelid != 0 with_toast +FROM pg_class +WHERE relname IN ('heap_table_with_toast', 'heap_table_without_toast', + 'ao_table_with_toast', 'ao_table_without_toast') +ORDER BY 1; + relname | with_toast +--------------------------+------------ + ao_table_with_toast | t + ao_table_without_toast | f + heap_table_with_toast | t + heap_table_without_toast | f +(4 rows) + +-- Insert initial data to tables +INSERT INTO heap_table_with_toast SELECT i, 'short_text' FROM generate_series(1,15) AS i; +INSERT INTO heap_table_without_toast SELECT i, i*10 FROM generate_series(1,15) AS i; +INSERT INTO ao_table_with_toast SELECT i, 'short_text' FROM generate_series(1,15) AS i; +INSERT INTO ao_table_without_toast SELECT i, i*10 FROM generate_series(1,15) AS i; +-- Check sizes on segments +SELECT relname, sizes.gp_segment_id, sizes.size +FROM pg_class, arenadata_toolkit.adb_relation_storage_size_on_segments(oid) sizes +WHERE relname IN ('heap_table_with_toast', 'heap_table_without_toast', + 'ao_table_with_toast', 'ao_table_without_toast') +ORDER BY 1, 2; + relname | gp_segment_id | size +--------------------------+---------------+------- + ao_table_with_toast | 0 | 168 + ao_table_with_toast | 1 | 112 + ao_table_with_toast | 2 | 216 + ao_table_without_toast | 0 | 128 + ao_table_without_toast | 1 | 88 + ao_table_without_toast | 2 | 160 + heap_table_with_toast | 0 | 32768 + heap_table_with_toast | 1 | 32768 + heap_table_with_toast | 2 | 32768 + heap_table_without_toast | 0 | 32768 + heap_table_without_toast | 1 | 32768 + heap_table_without_toast | 2 | 32768 +(12 rows) + +-- Add random large data to get non-zero toast table's size +UPDATE heap_table_with_toast SET b = ( + SELECT string_agg( chr(trunc(65+random()*26)::integer), '') + FROM generate_series(1,50000)) +WHERE a = 1; +UPDATE ao_table_with_toast SET b = ( + SELECT string_agg( chr(trunc(65+random()*26)::integer), '') + FROM generate_series(1,50000)) +WHERE a = 1; +SELECT relname, sizes.gp_segment_id, sizes.size +FROM pg_class, arenadata_toolkit.adb_relation_storage_size_on_segments(oid) sizes +WHERE relname IN ('heap_table_with_toast', 'ao_table_with_toast') +ORDER BY 1, 2; + relname | gp_segment_id | size +-----------------------+---------------+------- + ao_table_with_toast | 0 | 168 + ao_table_with_toast | 1 | 65704 + ao_table_with_toast | 2 | 216 + heap_table_with_toast | 0 | 32768 + heap_table_with_toast | 1 | 98304 + heap_table_with_toast | 2 | 32768 +(6 rows) + +-- Check summary size of tables +SELECT relname, adb_relation_storage_size size +FROM pg_class, arenadata_toolkit.adb_relation_storage_size(oid) +WHERE relname IN ('heap_table_with_toast', 'heap_table_without_toast', + 'ao_table_with_toast', 'ao_table_without_toast') +ORDER BY 1; + relname | size +--------------------------+-------- + ao_table_with_toast | 66088 + ao_table_without_toast | 376 + heap_table_with_toast | 163840 + heap_table_without_toast | 98304 +(4 rows) + +-- Cleanup +DROP TABLE heap_table_with_toast; +DROP TABLE heap_table_without_toast; +DROP TABLE ao_table_with_toast; +DROP TABLE ao_table_without_toast; +DROP EXTENSION arenadata_toolkit; diff --git a/gpcontrib/arenadata_toolkit/expected/adb_vacuum_strategy_test.out b/gpcontrib/arenadata_toolkit/expected/adb_vacuum_strategy_test.out new file mode 100644 index 000000000000..0c577198c042 --- /dev/null +++ b/gpcontrib/arenadata_toolkit/expected/adb_vacuum_strategy_test.out @@ -0,0 +1,56 @@ +CREATE EXTENSION arenadata_toolkit; +SELECT arenadata_toolkit.adb_create_tables(); + adb_create_tables +------------------- + +(1 row) + +CREATE SCHEMA test_vacuum; +CREATE TABLE test_vacuum.vacuumed (a int) DISTRIBUTED BY (a); +CREATE TABLE test_vacuum.not_vacuumed (a int) DISTRIBUTED BY (a); +-- Disable multiple notifications about the creation of multiple subpartitions. +SET client_min_messages=WARNING; +CREATE TABLE test_vacuum.part_table (id INT, a INT, b INT, c INT, d INT, str TEXT) +DISTRIBUTED BY (id) +PARTITION BY RANGE (a) + SUBPARTITION BY RANGE (b) + SUBPARTITION TEMPLATE (START (1) END (3) EVERY (1)) + SUBPARTITION BY RANGE (c) + SUBPARTITION TEMPLATE (START (1) END (3) EVERY (1)) + SUBPARTITION BY RANGE (d) + SUBPARTITION TEMPLATE (START (1) END (3) EVERY (1)) + SUBPARTITION BY LIST (str) + SUBPARTITION TEMPLATE ( + SUBPARTITION sub_prt1 VALUES ('sub_prt1'), + SUBPARTITION sub_prt2 VALUES ('sub_prt2')) + (START (1) END (3) EVERY (1)); +RESET client_min_messages; +INSERT INTO test_vacuum.vacuumed SELECT generate_series(1, 10); +INSERT INTO test_vacuum.not_vacuumed SELECT generate_series(1, 10); +DELETE FROM test_vacuum.vacuumed WHERE a >= 5; +DELETE FROM test_vacuum.not_vacuumed WHERE a >= 5; +VACUUM test_vacuum.vacuumed; +-- default strategy +SELECT * FROM arenadata_toolkit.adb_vacuum_strategy_newest_first('VACUUM') WHERE table_schema = 'test_vacuum'; + table_schema | table_name +--------------+-------------- + test_vacuum | part_table + test_vacuum | not_vacuumed + test_vacuum | vacuumed +(3 rows) + +-- reversed strategy +SELECT * FROM arenadata_toolkit.adb_vacuum_strategy_newest_last('VACUUM') WHERE table_schema = 'test_vacuum'; + table_schema | table_name +--------------+-------------- + test_vacuum | vacuumed + test_vacuum | not_vacuumed + test_vacuum | part_table +(3 rows) + +DROP SCHEMA test_vacuum CASCADE; +NOTICE: drop cascades to 3 other objects +DETAIL: drop cascades to table test_vacuum.vacuumed +drop cascades to table test_vacuum.not_vacuumed +drop cascades to table test_vacuum.part_table +DROP EXTENSION arenadata_toolkit; diff --git a/gpcontrib/arenadata_toolkit/expected/arenadata_toolkit_test.out b/gpcontrib/arenadata_toolkit/expected/arenadata_toolkit_test.out index f4e2ad78f3df..4977ab778832 100644 --- a/gpcontrib/arenadata_toolkit/expected/arenadata_toolkit_test.out +++ b/gpcontrib/arenadata_toolkit/expected/arenadata_toolkit_test.out @@ -96,6 +96,9 @@ SELECT objname, objtype, objstorage, objacl FROM toolkit_objects_info ORDER BY o adb_relation_storage_size | proc | - | {=X/owner,owner=X/owner} adb_relation_storage_size_on_segments | proc | - | {=X/owner,owner=X/owner} adb_skew_coefficients | table | v | {owner=arwdDxt/owner,=r/owner} + adb_vacuum_strategy | proc | - | {owner=X/owner} + adb_vacuum_strategy_newest_first | proc | - | {owner=X/owner} + adb_vacuum_strategy_newest_last | proc | - | {owner=X/owner} arenadata_toolkit | schema | - | {owner=UC/owner,=U/owner} daily_operation | table | a | db_files_current | table | h | {owner=arwdDxt/owner,=r/owner} @@ -103,7 +106,7 @@ SELECT objname, objtype, objstorage, objacl FROM toolkit_objects_info ORDER BY o db_files_history_1_prt_default_part | table | a | db_files_history_1_prt_pYYYYMM | table | a | operation_exclude | table | a | -(16 rows) +(19 rows) -- check that toolkit objects now depends on extension SELECT objname, objtype, extname, deptype FROM pg_depend d JOIN @@ -121,7 +124,10 @@ WHERE d.deptype = 'e' AND e.extname = 'arenadata_toolkit' ORDER BY objname; adb_relation_storage_size | proc | arenadata_toolkit | e adb_relation_storage_size_on_segments | proc | arenadata_toolkit | e adb_skew_coefficients | table | arenadata_toolkit | e -(9 rows) + adb_vacuum_strategy | proc | arenadata_toolkit | e + adb_vacuum_strategy_newest_first | proc | arenadata_toolkit | e + adb_vacuum_strategy_newest_last | proc | arenadata_toolkit | e +(12 rows) DROP EXTENSION arenadata_toolkit; DROP SCHEMA arenadata_toolkit cascade; @@ -149,6 +155,9 @@ SELECT objname, objtype, objstorage, objacl FROM toolkit_objects_info ORDER BY o adb_relation_storage_size | proc | - | {=X/owner,owner=X/owner} adb_relation_storage_size_on_segments | proc | - | {=X/owner,owner=X/owner} adb_skew_coefficients | table | v | {owner=arwdDxt/owner,=r/owner} + adb_vacuum_strategy | proc | - | {owner=X/owner} + adb_vacuum_strategy_newest_first | proc | - | {owner=X/owner} + adb_vacuum_strategy_newest_last | proc | - | {owner=X/owner} arenadata_toolkit | schema | - | {owner=UC/owner,=U/owner} daily_operation | table | a | {owner=arwdDxt/owner} db_files_current | table | h | {owner=arwdDxt/owner,=r/owner} @@ -156,7 +165,7 @@ SELECT objname, objtype, objstorage, objacl FROM toolkit_objects_info ORDER BY o db_files_history_1_prt_default_part | table | a | {owner=arwdDxt/owner} db_files_history_1_prt_pYYYYMM | table | a | {owner=arwdDxt/owner} operation_exclude | table | a | {owner=arwdDxt/owner} -(16 rows) +(19 rows) -- check that toolkit objects now depends on extension SELECT objname, objtype, extname, deptype FROM pg_depend d JOIN @@ -174,7 +183,10 @@ WHERE d.deptype = 'e' AND e.extname = 'arenadata_toolkit' ORDER BY objname; adb_relation_storage_size | proc | arenadata_toolkit | e adb_relation_storage_size_on_segments | proc | arenadata_toolkit | e adb_skew_coefficients | table | arenadata_toolkit | e -(9 rows) + adb_vacuum_strategy | proc | arenadata_toolkit | e + adb_vacuum_strategy_newest_first | proc | arenadata_toolkit | e + adb_vacuum_strategy_newest_last | proc | arenadata_toolkit | e +(12 rows) DROP EXTENSION arenadata_toolkit; DROP SCHEMA arenadata_toolkit cascade; diff --git a/gpcontrib/arenadata_toolkit/sql/adb_relation_storage_size_test.sql b/gpcontrib/arenadata_toolkit/sql/adb_relation_storage_size_test.sql new file mode 100644 index 000000000000..2a1d019a1b0d --- /dev/null +++ b/gpcontrib/arenadata_toolkit/sql/adb_relation_storage_size_test.sql @@ -0,0 +1,66 @@ +CREATE EXTENSION arenadata_toolkit; + +CREATE TABLE heap_table_with_toast(a INT, b TEXT) +DISTRIBUTED BY (a); + +CREATE TABLE heap_table_without_toast(a INT, b INT) +DISTRIBUTED BY (a); + +CREATE TABLE ao_table_with_toast(a INT, b TEXT) +WITH (APPENDOPTIMIZED=true) +DISTRIBUTED BY (a); + +CREATE TABLE ao_table_without_toast(a INT, b INT) +WITH (APPENDOPTIMIZED=true) +DISTRIBUTED BY (a); + +-- Check that toast exists only for "with_toast" tables +SELECT relname, reltoastrelid != 0 with_toast +FROM pg_class +WHERE relname IN ('heap_table_with_toast', 'heap_table_without_toast', + 'ao_table_with_toast', 'ao_table_without_toast') +ORDER BY 1; + +-- Insert initial data to tables +INSERT INTO heap_table_with_toast SELECT i, 'short_text' FROM generate_series(1,15) AS i; +INSERT INTO heap_table_without_toast SELECT i, i*10 FROM generate_series(1,15) AS i; +INSERT INTO ao_table_with_toast SELECT i, 'short_text' FROM generate_series(1,15) AS i; +INSERT INTO ao_table_without_toast SELECT i, i*10 FROM generate_series(1,15) AS i; + +-- Check sizes on segments +SELECT relname, sizes.gp_segment_id, sizes.size +FROM pg_class, arenadata_toolkit.adb_relation_storage_size_on_segments(oid) sizes +WHERE relname IN ('heap_table_with_toast', 'heap_table_without_toast', + 'ao_table_with_toast', 'ao_table_without_toast') +ORDER BY 1, 2; + +-- Add random large data to get non-zero toast table's size +UPDATE heap_table_with_toast SET b = ( + SELECT string_agg( chr(trunc(65+random()*26)::integer), '') + FROM generate_series(1,50000)) +WHERE a = 1; + +UPDATE ao_table_with_toast SET b = ( + SELECT string_agg( chr(trunc(65+random()*26)::integer), '') + FROM generate_series(1,50000)) +WHERE a = 1; + +SELECT relname, sizes.gp_segment_id, sizes.size +FROM pg_class, arenadata_toolkit.adb_relation_storage_size_on_segments(oid) sizes +WHERE relname IN ('heap_table_with_toast', 'ao_table_with_toast') +ORDER BY 1, 2; + +-- Check summary size of tables +SELECT relname, adb_relation_storage_size size +FROM pg_class, arenadata_toolkit.adb_relation_storage_size(oid) +WHERE relname IN ('heap_table_with_toast', 'heap_table_without_toast', + 'ao_table_with_toast', 'ao_table_without_toast') +ORDER BY 1; + +-- Cleanup +DROP TABLE heap_table_with_toast; +DROP TABLE heap_table_without_toast; +DROP TABLE ao_table_with_toast; +DROP TABLE ao_table_without_toast; + +DROP EXTENSION arenadata_toolkit; diff --git a/gpcontrib/arenadata_toolkit/sql/adb_vacuum_strategy_test.sql b/gpcontrib/arenadata_toolkit/sql/adb_vacuum_strategy_test.sql new file mode 100644 index 000000000000..9d2ccb03a8cd --- /dev/null +++ b/gpcontrib/arenadata_toolkit/sql/adb_vacuum_strategy_test.sql @@ -0,0 +1,40 @@ +CREATE EXTENSION arenadata_toolkit; +SELECT arenadata_toolkit.adb_create_tables(); + +CREATE SCHEMA test_vacuum; + +CREATE TABLE test_vacuum.vacuumed (a int) DISTRIBUTED BY (a); +CREATE TABLE test_vacuum.not_vacuumed (a int) DISTRIBUTED BY (a); +-- Disable multiple notifications about the creation of multiple subpartitions. +SET client_min_messages=WARNING; +CREATE TABLE test_vacuum.part_table (id INT, a INT, b INT, c INT, d INT, str TEXT) +DISTRIBUTED BY (id) +PARTITION BY RANGE (a) + SUBPARTITION BY RANGE (b) + SUBPARTITION TEMPLATE (START (1) END (3) EVERY (1)) + SUBPARTITION BY RANGE (c) + SUBPARTITION TEMPLATE (START (1) END (3) EVERY (1)) + SUBPARTITION BY RANGE (d) + SUBPARTITION TEMPLATE (START (1) END (3) EVERY (1)) + SUBPARTITION BY LIST (str) + SUBPARTITION TEMPLATE ( + SUBPARTITION sub_prt1 VALUES ('sub_prt1'), + SUBPARTITION sub_prt2 VALUES ('sub_prt2')) + (START (1) END (3) EVERY (1)); +RESET client_min_messages; + +INSERT INTO test_vacuum.vacuumed SELECT generate_series(1, 10); +INSERT INTO test_vacuum.not_vacuumed SELECT generate_series(1, 10); + +DELETE FROM test_vacuum.vacuumed WHERE a >= 5; +DELETE FROM test_vacuum.not_vacuumed WHERE a >= 5; + +VACUUM test_vacuum.vacuumed; + +-- default strategy +SELECT * FROM arenadata_toolkit.adb_vacuum_strategy_newest_first('VACUUM') WHERE table_schema = 'test_vacuum'; +-- reversed strategy +SELECT * FROM arenadata_toolkit.adb_vacuum_strategy_newest_last('VACUUM') WHERE table_schema = 'test_vacuum'; + +DROP SCHEMA test_vacuum CASCADE; +DROP EXTENSION arenadata_toolkit; diff --git a/gpcontrib/gp_check_functions/Makefile b/gpcontrib/gp_check_functions/Makefile new file mode 100644 index 000000000000..ec3711801e53 --- /dev/null +++ b/gpcontrib/gp_check_functions/Makefile @@ -0,0 +1,15 @@ +EXTENSION = gp_check_functions +DATA = gp_check_functions--1.1.sql gp_check_functions--1.0.0--1.1.sql +MODULES = gp_check_functions +# REGRESS testing is covered by the main suite test 'gp_check_files' as we need the custom tablespace directory support + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = gpcontrib/gp_check_functions +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/gpcontrib/gp_check_functions/gp_check_functions--1.0.0--1.1.sql b/gpcontrib/gp_check_functions/gp_check_functions--1.0.0--1.1.sql new file mode 100644 index 000000000000..03ff8a766987 --- /dev/null +++ b/gpcontrib/gp_check_functions/gp_check_functions--1.0.0--1.1.sql @@ -0,0 +1,179 @@ +/* gpcontrib/gp_check_functions/gp_check_functions--1.0.0--1.1.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION gp_check_functions UPDATE TO '1.1'" to load this file. \quit + +-- Check orphaned data files on default and user tablespaces. +-- Compared to the previous version, add gp_segment_id to show which segment it is being executed. +CREATE OR REPLACE VIEW __check_orphaned_files AS +SELECT f1.tablespace, f1.filename, f1.filepath, pg_catalog.gp_execution_segment() AS gp_segment_id +from __get_exist_files f1 +LEFT JOIN __get_expect_files f2 +ON f1.tablespace = f2.tablespace AND substring(f1.filename from '[0-9]+') = f2.filename +WHERE f2.tablespace IS NULL + AND f1.filename SIMILAR TO '[0-9]+(\.)?(\_)?%'; + +-- Function to check orphaned files. +-- Compared to the previous version, adjust the SELECT ... FROM __check_orphaned_files since we added new column to it. +-- NOTE: this function does the same lock and checks as gp_check_functions.gp_move_orphaned_files(), and it needs to be that way. +CREATE OR REPLACE FUNCTION __gp_check_orphaned_files_func() +RETURNS TABLE ( + gp_segment_id int, + tablespace oid, + filename text, + filepath text +) +LANGUAGE plpgsql AS $$ +BEGIN + BEGIN + -- lock pg_class so that no one will be adding/altering relfilenodes + LOCK TABLE pg_class IN SHARE MODE NOWAIT; + + -- make sure no other active/idle transaction is running + IF EXISTS ( + SELECT 1 + FROM (SELECT * from pg_stat_activity UNION ALL SELECT * FROM gp_dist_random('pg_stat_activity'))q + WHERE + sess_id <> -1 + AND sess_id <> current_setting('gp_session_id')::int -- Exclude the current session + ) THEN + RAISE EXCEPTION 'There is a client session running on one or more segment. Aborting...'; + END IF; + + -- force checkpoint to make sure we do not include files that are normally pending delete + CHECKPOINT; + + RETURN QUERY + SELECT v.gp_segment_id, v.tablespace, v.filename, v.filepath + FROM gp_dist_random('__check_orphaned_files') v + UNION ALL + SELECT -1 AS gp_segment_id, v.tablespace, v.filename, v.filepath + FROM __check_orphaned_files v; + EXCEPTION + WHEN lock_not_available THEN + RAISE EXCEPTION 'cannot obtain SHARE lock on pg_class'; + WHEN OTHERS THEN + RAISE; + END; + + RETURN; +END; +$$; + +-- Function to move orphaned files to a designated location. +-- NOTE: this function does the same lock and checks as gp_move_orphaned_files(), +-- and it needs to be that way. +CREATE OR REPLACE FUNCTION __gp_check_orphaned_files_func() +RETURNS TABLE ( + gp_segment_id int, + tablespace oid, + filename text, + filepath text +) +LANGUAGE plpgsql AS $$ +BEGIN + BEGIN + -- lock pg_class so that no one will be adding/altering relfilenodes + LOCK TABLE pg_class IN SHARE MODE NOWAIT; + + -- make sure no other active/idle transaction is running + IF EXISTS ( + SELECT 1 + FROM (SELECT * from pg_stat_activity UNION ALL SELECT * FROM gp_dist_random('pg_stat_activity'))q + WHERE + sess_id <> -1 + AND sess_id <> current_setting('gp_session_id')::int -- Exclude the current session + ) THEN + RAISE EXCEPTION 'There is a client session running on one or more segment. Aborting...'; + END IF; + + -- force checkpoint to make sure we do not include files that are normally pending delete + CHECKPOINT; + + RETURN QUERY + SELECT v.gp_segment_id, v.tablespace, v.filename, v.filepath + FROM gp_dist_random('__check_orphaned_files') v + UNION ALL + SELECT -1 AS gp_segment_id, v.tablespace, v.filename, v.filepath + FROM __check_orphaned_files v; + EXCEPTION + WHEN lock_not_available THEN + RAISE EXCEPTION 'cannot obtain SHARE lock on pg_class'; + WHEN OTHERS THEN + RAISE; + END; + + RETURN; +END; +$$; + +GRANT EXECUTE ON FUNCTION __gp_check_orphaned_files_func() TO public; + +-- UDF to move orphaned files to a designated location +-- NOTE: this function does the same lock and checks as __gp_check_orphaned_files_func(), +-- and it needs to be that way. +CREATE FUNCTION gp_move_orphaned_files(target_location TEXT) RETURNS TABLE ( + gp_segment_id INT, + move_success BOOL, + oldpath TEXT, + newpath TEXT +) +LANGUAGE plpgsql AS $$ +BEGIN + -- lock pg_class so that no one will be adding/altering relfilenodes + LOCK TABLE pg_class IN SHARE MODE NOWAIT; + + -- make sure no other active/idle transaction is running + IF EXISTS ( + SELECT 1 + FROM (SELECT * from pg_stat_activity UNION ALL SELECT * FROM gp_dist_random('pg_stat_activity'))q + WHERE + sess_id <> -1 + AND sess_id <> current_setting('gp_session_id')::int -- Exclude the current session + ) THEN + RAISE EXCEPTION 'There is a client session running on one or more segment. Aborting...'; + END IF; + + -- force checkpoint to make sure we do not include files that are normally pending delete + CHECKPOINT; + + RETURN QUERY + SELECT + q.gp_segment_id, + q.move_success, + q.oldpath, + q.newpath + FROM ( + WITH OrphanedFiles AS ( + -- Coordinator + SELECT + o.gp_segment_id, + s.setting || '/' || o.filepath as oldpath, + target_location || '/seg' || o.gp_segment_id::text || '_' || REPLACE(o.filepath, '/', '_') as newpath + FROM __check_orphaned_files o, pg_settings s + WHERE s.name = 'data_directory' + UNION ALL + -- Segments + SELECT + o.gp_segment_id, + s.setting || '/' || o.filepath as oldpath, + target_location || '/seg' || o.gp_segment_id::text || '_' || REPLACE(o.filepath, '/', '_') as newpath + FROM gp_dist_random('__check_orphaned_files') o + JOIN (SELECT gp_execution_segment() as gp_segment_id, * FROM gp_dist_random('pg_settings')) s on o.gp_segment_id = s.gp_segment_id + WHERE s.name = 'data_directory' + ) + SELECT + OrphanedFiles.gp_segment_id, + OrphanedFiles.oldpath, + OrphanedFiles.newpath, + pg_file_rename(OrphanedFiles.oldpath, OrphanedFiles.newpath, NULL) AS move_success + FROM OrphanedFiles + ) q ORDER BY q.gp_segment_id, q.oldpath; +EXCEPTION + WHEN lock_not_available THEN + RAISE EXCEPTION 'cannot obtain SHARE lock on pg_class'; + WHEN OTHERS THEN + RAISE; +END; +$$; + diff --git a/gpcontrib/gp_check_functions/gp_check_functions--1.1.sql b/gpcontrib/gp_check_functions/gp_check_functions--1.1.sql new file mode 100644 index 000000000000..dce2ef3171c2 --- /dev/null +++ b/gpcontrib/gp_check_functions/gp_check_functions--1.1.sql @@ -0,0 +1,485 @@ +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION gp_check_functions" to load this file. \quit + +CREATE OR REPLACE FUNCTION get_tablespace_version_directory_name() +RETURNS text +AS '$libdir/gp_check_functions' +LANGUAGE C; + +-------------------------------------------------------------------------------- +-- @function: +-- __get_ao_segno_list +-- +-- @in: +-- +-- @out: +-- oid - relation oid +-- int - segment number +-- eof - eof of the segment file +-- +-- @doc: +-- UDF to retrieve AO segment file numbers for each ao_row table +-- +-------------------------------------------------------------------------------- + +CREATE OR REPLACE FUNCTION __get_ao_segno_list() +RETURNS TABLE (relid oid, segno int, eof bigint) AS +$$ +DECLARE + table_name text; + rec record; + cur refcursor; + row record; +BEGIN + -- iterate over the aoseg relations + FOR rec IN SELECT tc.oid tableoid, tc.relname, ns.nspname + FROM pg_appendonly a + JOIN pg_class tc ON a.relid = tc.oid + JOIN pg_namespace ns ON tc.relnamespace = ns.oid + WHERE tc.relstorage = 'a' + LOOP + table_name := rec.relname; + -- Fetch and return each row from the aoseg table + BEGIN + OPEN cur FOR EXECUTE format('SELECT segno, eof ' + 'FROM gp_toolkit.__gp_aoseg(''%I.%I'') ', + rec.nspname, rec.relname); + SELECT rec.tableoid INTO relid; + LOOP + FETCH cur INTO row; + EXIT WHEN NOT FOUND; + segno := row.segno; + eof := row.eof; + IF segno <> 0 THEN -- there's no '.0' file, it means the file w/o extension + RETURN NEXT; + END IF; + END LOOP; + CLOSE cur; + EXCEPTION + -- If failed to open the aoseg table (e.g. the table itself is missing), continue + WHEN OTHERS THEN + RAISE WARNING 'Failed to get aoseg info for %: %', table_name, SQLERRM; + END; + END LOOP; + RETURN; +END; +$$ +LANGUAGE plpgsql; + +GRANT EXECUTE ON FUNCTION __get_ao_segno_list() TO public; + +-------------------------------------------------------------------------------- +-- @function: +-- __get_aoco_segno_list +-- +-- @in: +-- +-- @out: +-- oid - relation oid +-- int - segment number +-- eof - eof of the segment file +-- +-- @doc: +-- UDF to retrieve AOCO segment file numbers for each ao_column table +-- +-------------------------------------------------------------------------------- + +CREATE OR REPLACE FUNCTION __get_aoco_segno_list() +RETURNS TABLE (relid oid, segno int, eof bigint) AS +$$ +DECLARE + table_name text; + rec record; + cur refcursor; + row record; +BEGIN + -- iterate over the aocoseg relations + FOR rec IN SELECT tc.oid tableoid, tc.relname, ns.nspname + FROM pg_appendonly a + JOIN pg_class tc ON a.relid = tc.oid + JOIN pg_namespace ns ON tc.relnamespace = ns.oid + WHERE tc.relstorage = 'c' + LOOP + table_name := rec.relname; + -- Fetch and return each extended segno corresponding to attnum and segno in the aocoseg table + BEGIN + OPEN cur FOR EXECUTE format('SELECT physical_segno as segno, eof ' + 'FROM gp_toolkit.__gp_aocsseg(''%I.%I'') ', + rec.nspname, rec.relname); + SELECT rec.tableoid INTO relid; + LOOP + FETCH cur INTO row; + EXIT WHEN NOT FOUND; + segno := row.segno; + eof := row.eof; + IF segno <> 0 THEN -- there's no '.0' file, it means the file w/o extension + RETURN NEXT; + END IF; + END LOOP; + CLOSE cur; + EXCEPTION + -- If failed to open the aocoseg table (e.g. the table itself is missing), continue + WHEN OTHERS THEN + RAISE WARNING 'Failed to get aocsseg info for %: %', table_name, SQLERRM; + END; + END LOOP; + RETURN; +END; +$$ +LANGUAGE plpgsql; + +GRANT EXECUTE ON FUNCTION __get_aoco_segno_list() TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- __get_exist_files +-- +-- @doc: +-- Retrieve a list of all existing data files in the default +-- and user tablespaces. +-- +-------------------------------------------------------------------------------- +-- return the list of existing files in the database +CREATE OR REPLACE VIEW __get_exist_files AS +WITH Tablespaces AS ( +-- 1. The default tablespace + SELECT 0 AS tablespace, 'base/' || d.oid::text AS dirname + FROM pg_database d + WHERE d.datname = current_database() + UNION +-- 2. The global tablespace + SELECT 1664 AS tablespace, 'global/' AS dirname + UNION +-- 3. The user-defined tablespaces + SELECT ts.oid AS tablespace, + 'pg_tblspc/' || ts.oid::text || '/' || get_tablespace_version_directory_name() || '/' || + (SELECT d.oid::text FROM pg_database d WHERE d.datname = current_database()) AS dirname + FROM pg_tablespace ts + WHERE ts.oid > 1664 +) +SELECT tablespace, files.filename, dirname || '/' || files.filename AS filepath +FROM Tablespaces, pg_ls_dir(dirname) AS files(filename); + +GRANT SELECT ON __get_exist_files TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- __get_expect_files +-- +-- @doc: +-- Retrieve a list of expected data files in the database, +-- using the knowledge from catalogs. This does not include +-- any extended data files, nor does it include external, +-- foreign or virtual tables. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW __get_expect_files AS +SELECT s.reltablespace AS tablespace, s.relname, s.relstorage, + (CASE WHEN s.relfilenode != 0 THEN s.relfilenode ELSE pg_relation_filenode(s.oid) END)::text AS filename +FROM pg_class s +WHERE s.relstorage NOT IN ('x', 'v', 'f'); + +GRANT SELECT ON __get_expect_files TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- __get_expect_files_ext +-- +-- @doc: +-- Retrieve a list of expected data files in the database, +-- using the knowledge from catalogs. This includes all +-- the extended data files for AO/CO tables, nor does it +-- include external, foreign or virtual tables. +-- Also ignore AO segments w/ eof=0. They might be created just for +-- modcount whereas no data has ever been inserted to the seg. +-- Or, they could be created when a seg has only aborted rows. +-- In both cases, we can ignore these segs, because no matter +-- whether the data files exist or not, the rest of the system +-- can handle them gracefully. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW __get_expect_files_ext AS +SELECT s.reltablespace AS tablespace, s.relname, s.relstorage, + (CASE WHEN s.relfilenode != 0 THEN s.relfilenode ELSE pg_relation_filenode(s.oid) END)::text AS filename +FROM pg_class s +WHERE s.relstorage NOT IN ('x', 'v', 'f') +UNION +-- AO extended files +SELECT c.reltablespace AS tablespace, c.relname, c.relstorage, + format(c.relfilenode::text || '.' || s.segno::text) AS filename +FROM __get_ao_segno_list() s +JOIN pg_class c ON s.relid = c.oid +WHERE s.eof >0 AND c.relstorage NOT IN ('x', 'v', 'f') +UNION +-- CO extended files +SELECT c.reltablespace AS tablespace, c.relname, c.relstorage, + format(c.relfilenode::text || '.' || s.segno::text) AS filename +FROM __get_aoco_segno_list() s +JOIN pg_class c ON s.relid = c.oid +WHERE s.eof > 0 AND c.relstorage NOT IN ('x', 'v', 'f'); + +GRANT SELECT ON __get_expect_files_ext TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- __check_orphaned_files +-- +-- @doc: +-- Check orphaned data files on default and user tablespaces. +-- A file is considered orphaned if its main relfilenode is not expected +-- to exist. For example, '12345.1' is an orphaned file if there is no +-- table has relfilenode=12345, but not otherwise. +-- Therefore, this view counts for file extension as well and we do not +-- need a "_ext" view like the missing file view. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW __check_orphaned_files AS +SELECT f1.tablespace, f1.filename, f1.filepath, pg_catalog.gp_execution_segment() AS gp_segment_id +from __get_exist_files f1 +LEFT JOIN __get_expect_files f2 +ON f1.tablespace = f2.tablespace AND substring(f1.filename from '[0-9]+') = f2.filename +WHERE f2.tablespace IS NULL + AND f1.filename SIMILAR TO '[0-9]+(\.)?(\_)?%'; + + +GRANT SELECT ON __check_orphaned_files TO public; + +-------------------------------------------------------------------------------- +-- @function: +-- __gp_check_orphaned_files_func +-- +-- @in: +-- +-- @out: +-- gp_segment_id int - segment content ID +-- tablespace oid - tablespace OID +-- filename text - name of the orphaned file +-- filepath text - relative path of the orphaned file in data directory +-- +-- @doc: +-- (Internal UDF, shouldn't be exposed) +-- UDF to retrieve orphaned files and their paths +-- +-------------------------------------------------------------------------------- + +-- NOTE: this function does the same lock and checks as gp_move_orphaned_files(), +-- and it needs to be that way. +CREATE OR REPLACE FUNCTION __gp_check_orphaned_files_func() +RETURNS TABLE ( + gp_segment_id int, + tablespace oid, + filename text, + filepath text +) +LANGUAGE plpgsql AS $$ +BEGIN + BEGIN + -- lock pg_class so that no one will be adding/altering relfilenodes + LOCK TABLE pg_class IN SHARE MODE NOWAIT; + + -- make sure no other active/idle transaction is running + IF EXISTS ( + SELECT 1 + FROM (SELECT * from pg_stat_activity UNION ALL SELECT * FROM gp_dist_random('pg_stat_activity'))q + WHERE + sess_id <> -1 + AND sess_id <> current_setting('gp_session_id')::int -- Exclude the current session + ) THEN + RAISE EXCEPTION 'There is a client session running on one or more segment. Aborting...'; + END IF; + + -- force checkpoint to make sure we do not include files that are normally pending delete + CHECKPOINT; + + RETURN QUERY + SELECT v.gp_segment_id, v.tablespace, v.filename, v.filepath + FROM gp_dist_random('__check_orphaned_files') v + UNION ALL + SELECT -1 AS gp_segment_id, v.tablespace, v.filename, v.filepath + FROM __check_orphaned_files v; + EXCEPTION + WHEN lock_not_available THEN + RAISE EXCEPTION 'cannot obtain SHARE lock on pg_class'; + WHEN OTHERS THEN + RAISE; + END; + + RETURN; +END; +$$; + +GRANT EXECUTE ON FUNCTION __gp_check_orphaned_files_func() TO public; + +-------------------------------------------------------------------------------- +-- @function: +-- gp_move_orphaned_files +-- +-- @in: +-- target_location text - directory where we move the orphaned files to +-- +-- @out: +-- gp_segment_id int - segment content ID +-- move_success bool - whether the move attempt succeeded +-- oldpath text - filepath (name included) of the orphaned file before moving +-- newpath text - filepath (name included) of the orphaned file after moving +-- +-- @doc: +-- UDF to move orphaned files to a designated location +-- +-------------------------------------------------------------------------------- + +-- NOTE: this function does the same lock and checks as __gp_check_orphaned_files_func(), +-- and it needs to be that way. +CREATE FUNCTION gp_move_orphaned_files(target_location TEXT) RETURNS TABLE ( + gp_segment_id INT, + move_success BOOL, + oldpath TEXT, + newpath TEXT +) +LANGUAGE plpgsql AS $$ +BEGIN + -- lock pg_class so that no one will be adding/altering relfilenodes + LOCK TABLE pg_class IN SHARE MODE NOWAIT; + + -- make sure no other active/idle transaction is running + IF EXISTS ( + SELECT 1 + FROM (SELECT * from pg_stat_activity UNION ALL SELECT * FROM gp_dist_random('pg_stat_activity'))q + WHERE + sess_id <> -1 + AND sess_id <> current_setting('gp_session_id')::int -- Exclude the current session + ) THEN + RAISE EXCEPTION 'There is a client session running on one or more segment. Aborting...'; + END IF; + + -- force checkpoint to make sure we do not include files that are normally pending delete + CHECKPOINT; + + RETURN QUERY + SELECT + q.gp_segment_id, + q.move_success, + q.oldpath, + q.newpath + FROM ( + WITH OrphanedFiles AS ( + -- Coordinator + SELECT + o.gp_segment_id, + s.setting || '/' || o.filepath as oldpath, + target_location || '/seg' || o.gp_segment_id::text || '_' || REPLACE(o.filepath, '/', '_') as newpath + FROM __check_orphaned_files o, pg_settings s + WHERE s.name = 'data_directory' + UNION ALL + -- Segments + SELECT + o.gp_segment_id, + s.setting || '/' || o.filepath as oldpath, + target_location || '/seg' || o.gp_segment_id::text || '_' || REPLACE(o.filepath, '/', '_') as newpath + FROM gp_dist_random('__check_orphaned_files') o + JOIN (SELECT gp_execution_segment() as gp_segment_id, * FROM gp_dist_random('pg_settings')) s on o.gp_segment_id = s.gp_segment_id + WHERE s.name = 'data_directory' + ) + SELECT + OrphanedFiles.gp_segment_id, + OrphanedFiles.oldpath, + OrphanedFiles.newpath, + pg_file_rename(OrphanedFiles.oldpath, OrphanedFiles.newpath, NULL) AS move_success + FROM OrphanedFiles + ) q ORDER BY q.gp_segment_id, q.oldpath; +EXCEPTION + WHEN lock_not_available THEN + RAISE EXCEPTION 'cannot obtain SHARE lock on pg_class'; + WHEN OTHERS THEN + RAISE; +END; +$$; + +-------------------------------------------------------------------------------- +-- @view: +-- __check_missing_files +-- +-- @doc: +-- Check missing data files on default and user tablespaces, +-- not including extended files. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW __check_missing_files AS +SELECT f1.tablespace, f1.relname, f1.filename +from __get_expect_files f1 +LEFT JOIN __get_exist_files f2 +ON f1.tablespace = f2.tablespace AND f1.filename = f2.filename +WHERE f2.tablespace IS NULL + AND f1.filename SIMILAR TO '[0-9]+'; + +GRANT SELECT ON __check_missing_files TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- __check_missing_files_ext +-- +-- @doc: +-- Check missing data files on default and user tablespaces, +-- including extended files. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW __check_missing_files_ext AS +SELECT f1.tablespace, f1.relname, f1.filename +FROM __get_expect_files_ext f1 +LEFT JOIN __get_exist_files f2 +ON f1.tablespace = f2.tablespace AND f1.filename = f2.filename +WHERE f2.tablespace IS NULL + AND f1.filename SIMILAR TO '[0-9]+(\.[0-9]+)?'; + +GRANT SELECT ON __check_missing_files_ext TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- gp_check_orphaned_files +-- +-- @doc: +-- User-facing view of __check_orphaned_files. +-- Gather results from coordinator and all segments. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW gp_check_orphaned_files AS +SELECT * FROM __gp_check_orphaned_files_func(); + +GRANT SELECT ON gp_check_orphaned_files TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- gp_check_missing_files +-- +-- @doc: +-- User-facing view of __check_missing_files. +-- Gather results from coordinator and all segments. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW gp_check_missing_files AS +SELECT pg_catalog.gp_execution_segment() AS gp_segment_id, * +FROM gp_dist_random('__check_missing_files') +UNION ALL +SELECT -1 AS gp_segment_id, * +FROM __check_missing_files; + +GRANT SELECT ON gp_check_missing_files TO public; + +-------------------------------------------------------------------------------- +-- @view: +-- gp_check_missing_files_ext +-- +-- @doc: +-- User-facing view of __check_missing_files_ext. +-- Gather results from coordinator and all segments. +-- +-------------------------------------------------------------------------------- +CREATE OR REPLACE VIEW gp_check_missing_files_ext AS +SELECT pg_catalog.gp_execution_segment() AS gp_segment_id, * +FROM gp_dist_random('__check_missing_files_ext') +UNION ALL +SELECT -1 AS gp_segment_id, * +FROM __check_missing_files; -- not checking ext on coordinator + +GRANT SELECT ON gp_check_missing_files_ext TO public; + diff --git a/gpcontrib/gp_check_functions/gp_check_functions.c b/gpcontrib/gp_check_functions/gp_check_functions.c new file mode 100644 index 000000000000..6bd72747eeda --- /dev/null +++ b/gpcontrib/gp_check_functions/gp_check_functions.c @@ -0,0 +1,33 @@ +/*------------------------------------------------------------------------- + * + * gp_check_functions.c + * GPDB helper functions for checking various system fact/status. + * + * + * Copyright (c) 2022-Present VMware Software, Inc. + * + * + *------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "fmgr.h" +#include "funcapi.h" +#include "catalog/catalog.h" +#include "utils/builtins.h" + +Datum get_tablespace_version_directory_name(PG_FUNCTION_ARGS); + +PG_MODULE_MAGIC; +PG_FUNCTION_INFO_V1(get_tablespace_version_directory_name); + +/* + * get the GPDB-specific directory name for user tablespace + */ +Datum +get_tablespace_version_directory_name(PG_FUNCTION_ARGS) +{ + PG_RETURN_TEXT_P(CStringGetTextDatum(GP_TABLESPACE_VERSION_DIRECTORY)); +} + diff --git a/gpcontrib/gp_check_functions/gp_check_functions.control b/gpcontrib/gp_check_functions/gp_check_functions.control new file mode 100644 index 000000000000..84a55eea4825 --- /dev/null +++ b/gpcontrib/gp_check_functions/gp_check_functions.control @@ -0,0 +1,5 @@ +# gp_check_functions extension + +comment = 'various GPDB helper views/functions' +default_version = '1.1' +relocatable = true diff --git a/gpcontrib/gpcloud/test/s3restful_service_test.cpp b/gpcontrib/gpcloud/test/s3restful_service_test.cpp index f921c43a4359..39932b3323c1 100644 --- a/gpcontrib/gpcloud/test/s3restful_service_test.cpp +++ b/gpcontrib/gpcloud/test/s3restful_service_test.cpp @@ -25,7 +25,7 @@ TEST(S3RESTfulService, GetWithEmptyHeader) { EXPECT_EQ(RESPONSE_OK, resp.getStatus()); EXPECT_EQ("Success", resp.getMessage()); - EXPECT_EQ(true, resp.getRawData().size() > 10000); + EXPECT_EQ(true, resp.getRawData().size() > 1000); } TEST(S3RESTfulService, GetWithoutURL) { diff --git a/gpdb-doc/markdown/admin_guide/external/g-s3-protocol.html.md b/gpdb-doc/markdown/admin_guide/external/g-s3-protocol.html.md index 6eb12429afc7..09049d33c0c0 100644 --- a/gpdb-doc/markdown/admin_guide/external/g-s3-protocol.html.md +++ b/gpdb-doc/markdown/admin_guide/external/g-s3-protocol.html.md @@ -8,7 +8,7 @@ Amazon Simple Storage Service \(Amazon S3\) provides secure, durable, highly-sca You can define read-only external tables that use existing data files in the S3 bucket for table data, or writable external tables that store the data from INSERT operations to files in the S3 bucket. Greenplum Database uses the S3 URL and prefix specified in the protocol URL either to select one or more files for a read-only table, or to define the location and filename format to use when uploading S3 files for `INSERT` operations to writable tables. -The `s3` protocol also supports [Dell EMC Elastic Cloud Storage](https://www.emc.com/en-us/storage/ecs/index.htm) \(ECS\), an Amazon S3 compatible service. +The `s3` protocol also supports [Dell Elastic Cloud Storage](https://www.dell.com/en-us/dt/learn/data-storage/ecs.htm) \(ECS\), an Amazon S3 compatible service. > **Note** The `pxf` protocol can access data in S3 and other object store systems such as Azure, Google Cloud Storage, and Minio. The `pxf` protocol can also access data in external Hadoop systems \(HDFS, Hive, HBase\), and SQL databases. See [pxf:// Protocol](g-pxf-protocol.html). diff --git a/gpdb-doc/markdown/admin_guide/parallel_retrieve_cursor.html.md b/gpdb-doc/markdown/admin_guide/parallel_retrieve_cursor.html.md deleted file mode 100644 index 430bd8662f03..000000000000 --- a/gpdb-doc/markdown/admin_guide/parallel_retrieve_cursor.html.md +++ /dev/null @@ -1,361 +0,0 @@ ---- -title: Retrieving Query Results with a Parallel Retrieve Cursor ---- - -A *parallel retrieve cursor* is an enhanced cursor implementation that you can use to create a special kind of cursor on the Greenplum Database coordinator node, and retrieve query results, on demand and in parallel, directly from the Greenplum segments. - -## About Parallel Retrieve Cursors - -You use a cursor to retrieve a smaller number of rows at a time from a larger - query. When you declare a parallel retrieve cursor, the Greenplum - Database Query Dispatcher (QD) dispatches the query plan to each Query Executor - (QE), and creates an *endpoint* on each QE before it executes the query. - An endpoint is a query result source for a parallel retrieve cursor on a specific - QE. Instead of returning the query result to the QD, an endpoint retains the - query result for retrieval via a different process: a direct connection to the - endpoint. You open a special retrieve mode connection, called a *retrieve - session*, and use the new `RETRIEVE` SQL command to retrieve - query results from each parallel retrieve cursor endpoint. You can retrieve - from parallel retrieve cursor endpoints on demand and in parallel. - -You can use the following functions and views to examine and manage parallel retrieve cursors and endpoints: - -|Function, View Name|Description| -|-------------------|-----------| -|gp\_get\_endpoints\(\)

[gp\_endpoints](../ref_guide/system_catalogs/catalog_ref-views.html#gp_endpoints)|List the endpoints associated with all active parallel retrieve cursors declared by the current user in the current database. When the Greenplum Database superuser invokes this function, it returns a list of all endpoints for all parallel retrieve cursors declared by all users in the current database.| -|gp\_get\_session\_endpoints\(\)

[gp\_session\_endpoints](../ref_guide/system_catalogs/catalog_ref-views.html#gp_session_endpoints)|List the endpoints associated with all parallel retrieve cursors declared in the current session for the current user.| -|gp\_get\_segment\_endpoints\(\)

[gp\_segment\_endpoints](../ref_guide/system_catalogs/catalog_ref-views.html#gp_segment_endpoints)|List the endpoints created in the QE for all active parallel retrieve cursors declared by the current user. When the Greenplum Database superuser accesses this view, it returns a list of all endpoints on the QE created for all parallel retrieve cursors declared by all users.| -|gp\_wait\_parallel\_retrieve\_cursor\(cursorname text, timeout\_sec int4 \)|Return cursor status or block and wait for results to be retrieved from all endpoints associated with the specified parallel retrieve cursor.| - -
Each of these functions and views is located in the pg_catalog schema, and each RETURNS TABLE.
- -## Using a Parallel Retrieve Cursor - -You will perform the following tasks when you use a Greenplum Database parallel retrieve cursor to read query results in parallel from Greenplum segments: - -1. [Declare the parallel retrieve cursor](#declare_cursor). -1. [List the endpoints of the parallel retrieve cursor](#list_endpoints). -1. [Open a retrieve connection to each endpoint](#open_retrieve_conn). -1. [Retrieve data from each endpoint](#retrieve_data). -1. [Wait for data retrieval to complete](#wait). -1. [Handle data retrieval errors](#error_handling). -1. [Close the parallel retrieve cursor](#close). - -In addition to the above, you may optionally choose to [List all parallel retrieve cursors](#list_all_prc) in the system or [List segment-specific retrieve session information](#utility_endpoints). - -### Declaring a Parallel Retrieve Cursor - -You [DECLARE](../ref_guide/sql_commands/DECLARE.html#topic1) a cursor to retrieve a smaller number of rows at a time from a larger query. When you declare a parallel retrieve cursor, you can retrieve the query results directly from the Greenplum Database segments. - -The syntax for declaring a parallel retrieve cursor is similar to that of declaring a regular cursor; you must additionally include the `PARALLEL RETRIEVE` keywords in the command. You can declare a parallel retrieve cursor only within a transaction, and the cursor name that you specify when you declare the cursor must be unique within the transaction. - -For example, the following commands begin a transaction and declare a parallel retrieve cursor named `prc1` to retrieve the results from a specific query: - -``` sql -BEGIN; -DECLARE prc1 PARALLEL RETRIEVE CURSOR FOR query; -``` - -Greenplum Database creates the endpoint(s) on the QD or QEs, depending on the *query* parameters: - -- Greenplum Database creates an endpoint on the QD when the query results must be gathered by the coordinator. For example, this `DECLARE` statement requires that the coordinator gather the query results: - - ``` sql - DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1 ORDER BY a; - ``` -
You may choose to run the EXPLAIN command on the parallel retrieve cursor query to identify when motion is involved. Consider using a regular cursor for such queries.
- -- When the query involves direct dispatch to a segment (the query is filtered on the distribution key), Greenplum Database creates the endpoint(s) on specific segment host(s). For example, this `DECLARE` statement may result in the creation of single endpoint: - - ``` sql - DECLARE c2 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1 WHERE a=1; - ``` - -- Greenplum Database creates the endpoints on all segment hosts when all hosts contribute to the query results. This example `DECLARE` statement results in all segments contributing query results: - - ``` sql - DECLARE c3 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1; - ``` - -The `DECLARE` command returns when the endpoints are ready and query execution has begun. - -### Listing a Parallel Retrieve Cursor's Endpoints - -You can obtain the information that you need to initiate a retrieve - connection to an endpoint by invoking the `gp_get_endpoints()` - function or examining the `gp_endpoints` view in a session on - the Greenplum Database coordinator host: - -``` sql -SELECT * FROM gp_get_endpoints(); -SELECT * FROM gp_endpoints; -``` - -These commands return the list of endpoints in a table with the following columns: - -|Column Name|Description| -|-----------|-----------| -|gp\_segment\_id|The QE's endpoint `gp_segment_id`.| -|auth\_token|The authentication token for a retrieve session.| -|cursorname|The name of the parallel retrieve cursor.| -|sessionid|The identifier of the session in which the parallel retrieve cursor was created.| -|hostname|The name of the host from which to retrieve the data for the endpoint.| -|port|The port number from which to retrieve the data for the endpoint.| -|username|The name of the current user; *you must initiate the retrieve session as this user*.| -|state|The state of the endpoint; the valid states are:

READY: The endpoint is ready to be retrieved.

ATTACHED: The endpoint is attached to a retrieve connection.

RETRIEVING: A retrieve session is retrieving data from the endpoint at this moment.

FINISHED: The endpoint has been fully retrieved.

RELEASED: Due to an error, the endpoint has been released and the connection closed.| -|endpointname|The endpoint identifier; you provide this identifier to the `RETRIEVE` command.| - -Refer to the [gp_endpoints](../ref_guide/system_catalogs/catalog_ref-views.html#gp_endpoints) view reference page for more information about the endpoint attributes returned by these commands. - -You can similarly invoke the `gp_get_session_endpoints()` function or examine the `gp_session_endpoints` view to list the endpoints created for the parallel retrieve cursors declared in the current session and by the current user. - -### Opening a Retrieve Session - -After you declare a parallel retrieve cursor, you can open a retrieve session to each endpoint. Only a single retrieve session may be open to an endpoint at any given time. - -
A retrieve session is independent of the parallel retrieve cursor itself and the endpoints.
- -Retrieve session authentication does not depend on the `pg_hba.conf` file, but rather on an authentication token (`auth_token`) generated by Greenplum Database. - -
Because Greenplum Database skips pg_hba.conf-controlled authentication for a retrieve session, for security purposes you may invoke only the RETRIEVE command in the session.
- -When you initiate a retrieve session to an endpoint: - -- The user that you specify for the retrieve session must be the user that declared the parallel retrieve cursor (the `username` returned by `gp_endpoints`). This user must have Greenplum Database login privileges. - -- You specify the `hostname` and `port` returned by `gp_endpoints` for the endpoint. - -- You authenticate the retrieve session by specifying the `auth_token` returned for the endpoint via the `PGPASSWORD` environment variable, or when prompted for the retrieve session `Password`. - -- You must specify the [gp_retrieve_conn](../ref_guide/config_params/guc-list.html#gp_retrieve_conn) server configuration parameter on the connection request, and set the value to `true` . - -For example, if you are initiating a retrieve session via `psql`: - -``` shell -PGOPTIONS='-c gp_retrieve_conn=true' psql -h -p -U -d -``` - -To distinguish a retrieve session from other sessions running on a segment host, Greenplum Database includes the `[retrieve]` tag on the `ps` command output display for the process. - -### Retrieving Data From the Endpoint - -Once you establish a retrieve session, you retrieve the tuples associated with a query result on that endpoint using the [RETRIEVE](../ref_guide/sql_commands/RETRIEVE.html#topic1) command. - -You can specify a (positive) number of rows to retrieve, or `ALL` rows: - -``` sql -RETRIEVE 7 FROM ENDPOINT prc10000003300000003; -RETRIEVE ALL FROM ENDPOINT prc10000003300000003; -``` - -Greenplum Database returns an empty set if there are no more rows to retrieve from the endpoint. - -
You can retrieve from multiple parallel retrieve cursors from the same retrieve session only when their auth_tokens match.
- -### Waiting for Data Retrieval to Complete - -Use the `gp_wait_parallel_retrieve_cursor()` function to display the the status of data retrieval from a parallel retrieve cursor, or to wait for all endpoints to finishing retrieving the data. You invoke this function in the transaction block in which you declared the parallel retrieve cursor. - -`gp_wait_parallel_retrieve_cursor()` returns `true` only when all tuples are fully retrieved from all endpoints. In all other cases, the function returns `false` and may additionally throw an error. - -The function signatures of `gp_wait_parallel_retrieve_cursor()` follow: - -``` sql -gp_wait_parallel_retrieve_cursor( cursorname text ) -gp_wait_parallel_retrieve_cursor( cursorname text, timeout_sec int4 ) -``` - -You must identify the name of the cursor when you invoke this function. The timeout argument is optional: - -- The default timeout is `0` seconds: Greenplum Database checks the retrieval status of all endpoints and returns the result immediately. - -- A timeout value of `-1` seconds instructs Greenplum to block until all data from all endpoints has been retrieved, or block until an error occurs. - -- The function reports the retrieval status after a timeout occurs for any other positive timeout value that you specify. - -`gp_wait_parallel_retrieve_cursor()` returns when it encounters one of the following conditions: - -- All data has been retrieved from all endpoints. -- A timeout has occurred. -- An error has occurred. - -### Handling Data Retrieval Errors - -An error can occur in a retrieve sesson when: - -- You cancel or interrupt the retrieve operation. -- The endpoint is only partially retrieved when the retrieve session quits. - -When an error occurs in a specific retrieve session, Greenplum Database removes the endpoint from the QE. Other retrieve sessions continue to function as normal. - -If you close the transaction before fully retrieving from all endpoints, or if `gp_wait_parallel_retrieve_cursor()` returns an error, Greenplum Database terminates all remaining open retrieve sessions. - -### Closing the Cursor - -When you have completed retrieving data from the parallel retrieve cursor, close the cursor and end the transaction: - -``` sql -CLOSE prc1; -END; -``` - -
When you close a parallel retrieve cursor, Greenplum Database terminates any open retrieve sessions associated with the cursor.
- -On closing, Greenplum Database frees all resources associated with the parallel retrieve cursor and its endpoints. - -### Listing All Parallel Retrieve Cursors - -The [pg_cursors](../ref_guide/system_catalogs/catalog_ref-views.html#pg_cursors) view lists all declared cursors that are currently available in the system. You can obtain information about all parallel retrieve cursors by running the following command: - -``` sql -SELECT * FROM pg_cursors WHERE is_parallel = true; -``` - -### Listing Segment-Specific Retrieve Session Information - -You can obtain information about all retrieve sessions to a specific QE endpoint by invoking the `gp_get_segment_endpoints()` function or examining the `gp_segment_endpoints` view: - -``` sql -SELECT * FROM gp_get_segment_endpoints(); -SELECT * FROM gp_segment_endpoints; -``` - -These commands provide information about the retrieve sessions associated with a QE endpoint for all active parallel retrieve cursors declared by the current user. When the Greenplum Database superuser invokes the command, it returns the retrieve session information for all endpoints on the QE created for all parallel retrieve cursors declared by all users. - -You can obtain segment-specific retrieve session information in two ways: from the QD, or via a utility-mode connection to the endpoint: - -- QD example: - - ``` sql - SELECT * from gp_dist_random('gp_segment_endpoints'); - ``` - - Display the information filtered to a specific segment: - - ``` sql - SELECT * from gp_dist_random('gp_segment_endpoints') WHERE gp_segment_id = 0; - ``` - -- Example utilizing a utility-mode connection to the endpoint: - - ``` sql - $ PGOPTIONS='-c gp_session_role=utility' psql -h sdw3 -U localuser -p 6001 -d testdb - - testdb=> SELECT * FROM gp_segment_endpoints; - ``` - -The commands return endpoint and retrieve session information in a table with the following columns: - -|Column Name|Description| -|-----------|-----------| -|auth\_token|The authentication token for a the retrieve session.| -|databaseid|The identifier of the database in which the parallel retrieve cursor was created.| -|senderpid|The identifier of the process sending the query results.| -|receiverpid|The process identifier of the retrieve session that is receiving the query results.| -|state|The state of the endpoint; the valid states are:

READY: The endpoint is ready to be retrieved.

ATTACHED: The endpoint is attached to a retrieve connection.

RETRIEVING: A retrieve session is retrieving data from the endpoint at this moment.

FINISHED: The endpoint has been fully retrieved.

RELEASED: Due to an error, the endpoint has been released and the connection closed.| -|gp\_segment\_id|The QE's endpoint `gp_segment_id`.| -|sessionid|The identifier of the session in which the parallel retrieve cursor was created.| -|username|The name of the user that initiated the retrieve session.| -|endpointname|The endpoint identifier.| -|cursorname|The name of the parallel retrieve cursor.| - -Refer to the [gp_segment_endpoints](../ref_guide/system_catalogs/catalog_ref-views.html#gp_segment_endpoints) view reference page for more information about the endpoint attributes returned by these commands. - - -## Limiting the Number of Concurrently Open Cursors - -By default, Greenplum Database does not limit the number of parallel retrieve cursors that are active in the cluster \(up to the maximum value of 1024\). The Greenplum Database superuser can set the [gp\_max\_parallel\_cursors](../ref_guide/config_params/guc-list.html#gp_max_parallel_cursors) server configuration parameter to limit the number of open cursors. - - -## Known Issues and Limitations - -The parallel retrieve cursor implementation has the following limitations: - -- The VMware Greenplum Query Optimizer (GPORCA) does not support queries on a parallel retrieve cursor. -- Greenplum Database ignores the `BINARY` clause when you declare a parallel retrieve cursor. -- Parallel retrieve cursors cannot be declared `WITH HOLD`. -- Parallel retrieve cursors do not support the `FETCH` and `MOVE` cursor operations. -- Parallel retrieve cursors are not supported in SPI; you cannot declare a parallel retrieve cursor in a PL/pgSQL function. - - -## Additional Documentation - -Refer to the [README](https://github.com/greenplum-db/gpdb/tree/main/src/backend/cdb/endpoint/README) in the Greenplum Database `github` repository for additional information about the parallel retrieve cursor implementation. You can also find parallel retrieve cursor [programming examples](https://github.com/greenplum-db/gpdb/tree/main/src/test/examples/) in the repository. - - -## Example - -Create a parallel retrieve cursor and use it to pull query results from a Greenplum Database cluster: - -1. Open a `psql` session to the Greenplum Database coordinator host: - - ``` shell - psql -d testdb - ``` - -1. Start the transaction: - - ``` sql - BEGIN; - ``` - -1. Declare a parallel retrieve cursor named `prc1` for a `SELECT *` query on a table: - - ``` sql - DECLARE prc1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1; - ``` - -1. Obtain the endpoints for this parallel retrieve cursor: - - ``` sql - SELECT * FROM gp_endpoints WHERE cursorname='prc1'; - gp_segment_id | auth_token | cursorname | sessionid | hostname | port | username | state | endpointname - ---------------+----------------------------------+------------+-----------+----------+------+----------+-------+---------------------- - 2 | 39a2dc90a82fca668e04d04e0338f105 | prc1 | 51 | sdw1 | 6000 | bill | READY | prc10000003300000003 - 3 | 1a6b29f0f4cad514a8c3936f9239c50d | prc1 | 51 | sdw1 | 6001 | bill | READY | prc10000003300000003 - 4 | 1ae948c8650ebd76bfa1a1a9fa535d93 | prc1 | 51 | sdw2 | 6000 | bill | READY | prc10000003300000003 - 5 | f10f180133acff608275d87966f8c7d9 | prc1 | 51 | sdw2 | 6001 | bill | READY | prc10000003300000003 - 6 | dda0b194f74a89ed87b592b27ddc0e39 | prc1 | 51 | sdw3 | 6000 | bill | READY | prc10000003300000003 - 7 | 037f8c747a5dc1b75fb10524b676b9e8 | prc1 | 51 | sdw3 | 6001 | bill | READY | prc10000003300000003 - 8 | c43ac67030dbc819da9d2fd8b576410c | prc1 | 51 | sdw4 | 6000 | bill | READY | prc10000003300000003 - 9 | e514ee276f6b2863142aa2652cbccd85 | prc1 | 51 | sdw4 | 6001 | bill | READY | prc10000003300000003 - (8 rows) - ``` - -1. Wait until all endpoints are fully retrieved: - - ``` sql - SELECT gp_wait_parallel_retrieve_cursor( 'prc1', -1 ); - ``` - -1. For each endpoint: - - 1. Open a retrieve session. For example, to open a retrieve session to the segment instance running on `sdw3`, port number `6001`, run the following command in a *different terminal window*; when prompted for the password, provide the `auth_token` identified in row 7 of the `gp_endpoints` output: - - ``` sql - $ PGOPTIONS='-c gp_retrieve_conn=true' psql -h sdw3 -U localuser -p 6001 -d testdb - Password: - ```` - - 1. Retrieve data from the endpoint: - - ``` sql - -- Retrieve 7 rows of data from this session - RETRIEVE 7 FROM ENDPOINT prc10000003300000003 - -- Retrieve the remaining rows of data from this session - RETRIEVE ALL FROM ENDPOINT prc10000003300000003 - ``` - - 1. Exit the retrieve session: - - ``` sql - \q - ``` - -1. In the original `psql` session (the session in which you declared the parallel retrieve cursor), verify that the `gp_wait_parallel_retrieve_cursor()` function returned `t`. Then close the cursor and complete the transaction: - - ``` sql - CLOSE prc1; - END; - ``` - diff --git a/gpdb-doc/markdown/admin_guide/perf_intro.html.md b/gpdb-doc/markdown/admin_guide/perf_intro.html.md index c2a419aaad52..31c87e9b2de0 100644 --- a/gpdb-doc/markdown/admin_guide/perf_intro.html.md +++ b/gpdb-doc/markdown/admin_guide/perf_intro.html.md @@ -22,7 +22,7 @@ Several key performance factors influence database performance. Understanding th Database performance relies heavily on disk I/O and memory usage. To accurately set performance expectations, you need to know the baseline performance of the hardware on which your DBMS is deployed. Performance of hardware components such as CPUs, hard disks, disk controllers, RAM, and network interfaces will significantly affect how fast your database performs. -> **Caution** Do not install anti-virus software of any type on Greenplum Database hosts. VMware Greenplum is not supported for use with anti-virus software because the additional CPU and IO load interferes with Greenplum Database operations. +> **Note** If you use endpoint security software on your Greenplum Database hosts, it may affect your database performance and stability. See [About Endpoint Security Sofware](../security-guide/topics/preface.html#endpoint_security) for more information. ### Workload diff --git a/gpdb-doc/markdown/admin_guide/workload_mgmt_resgroups.html.md b/gpdb-doc/markdown/admin_guide/workload_mgmt_resgroups.html.md index 04c093b7362f..86c889eaa36d 100644 --- a/gpdb-doc/markdown/admin_guide/workload_mgmt_resgroups.html.md +++ b/gpdb-doc/markdown/admin_guide/workload_mgmt_resgroups.html.md @@ -8,24 +8,6 @@ When you assign a resource group to a role \(a role-based resource group\), the Similarly, when you assign a resource group to an external component, the group limits apply to all running instances of the component. For example, if you create a resource group for a PL/Container external component, the memory limit that you define for the group specifies the maximum memory usage for all running instances of each PL/Container runtime to which you assign the group. -This topic includes the following subtopics: - -- [Understanding Role and Component Resource Groups](#topic8339intro) -- [Resource Group Attributes and Limits](#topic8339introattrlim) - - [Memory Auditor](#topic8339777) - - [Transaction Concurrency Limit](#topic8339717179) - - [CPU Limits](#topic833971717) - - [Memory Limits](#topic8339717) -- [Using VMware Greenplum Command Center to Manage Resource Groups](#topic999) -- [Configuring and Using Resource Groups](#topic71717999) - - [Enabling Resource Groups](#topic8) - - [Creating Resource Groups](#topic10) - - [Configuring Automatic Query Termination Based on Memory Usage](#topic_jlz_hzg_pkb) - - [Assigning a Resource Group to a Role](#topic17) -- [Monitoring Resource Group Status](#topic22) -- [Moving a Query to a Different Resource Group](#moverg) -- [Resource Group Frequently Asked Questions](#topic777999) - **Parent topic:** [Managing Resources](wlmgmt.html) ## Understanding Role and Component Resource Groups @@ -282,29 +264,17 @@ Refer to the [Greenplum Command Center documentation](http://docs.vmware.com/en/ If you use RedHat 6 and the performance with resource groups is acceptable for your use case, upgrade your kernel to version 2.6.32-696 or higher to benefit from other fixes to the cgroups implementation. -### Prerequisite +### Prerequisites Greenplum Database resource groups use Linux Control Groups \(cgroups\) to manage CPU resources. Greenplum Database also uses cgroups to manage memory for resource groups for external components. With cgroups, Greenplum isolates the CPU and external component memory usage of your Greenplum processes from other processes on the node. This allows Greenplum to support CPU and external component memory usage restrictions on a per-resource-group basis. -> **Note** Redhat 8.x supports two versions of cgroups: cgroup v1 and cgroup v2. Greenplum Database only supports cgroup v1. Follow the steps below to make sure that your system is mounting the `cgroups-v1` filesystem at startup. +> **Note** Redhat 8.x/9.x supports two versions of cgroups: cgroup v1 and cgroup v2. Greenplum Database only supports cgroup v1. Follow the steps below to make sure that your system is mounting the `cgroups-v1` filesystem at startup. For detailed information about cgroups, refer to the Control Groups documentation for your Linux distribution. Complete the following tasks on each node in your Greenplum Database cluster to set up cgroups for use with resource groups: -1. If not already installed, install the Control Groups operating system package on each Greenplum Database node. The command that you run to perform this task will differ based on the operating system installed on the node. You must be the superuser or have `sudo` access to run the command: - - Redhat/CentOS 7.x/8.x systems: - - ``` - sudo yum install libcgroup-tools - ``` - - Redhat/CentOS 6.x systems: - - ``` - sudo yum install libcgroup - ``` - -1. If you are using Redhat 8.x, make sure that you configured the system to mount the `cgroups-v1` filesystem by default during system boot by running the following command: +1. If you are using Redhat 8.x/9.x, make sure that you configured the system to mount the `cgroups-v1` filesystem by default during system boot by running the following command: ``` stat -fc %T /sys/fs/cgroup/ @@ -325,14 +295,19 @@ Complete the following tasks on each node in your Greenplum Database cluster to Reboot the system for the changes to take effect. +1. Create the required cgroup hierarchies on each Greenplum Database node. Since the hierarchies are cleaned when the operating system rebooted, a service is applied to recreate them automatically on boot. Follow the below steps based on your operating system version. + +#### Redhat/CentOS 6.x/7.x/8.x + +These operating systems include the `libcgroup-tools` package (for Redhat/CentOS 7.x/8.x) or `libcgroup` (for Redhat/CentOS 6.x) 1. Locate the cgroups configuration file `/etc/cgconfig.conf`. You must be the superuser or have `sudo` access to edit this file: ``` - sudo vi /etc/cgconfig.conf + vi /etc/cgconfig.conf ``` -2. Add the following configuration information to the file: +1. Add the following configuration information to the file: ``` group gpdb { @@ -359,19 +334,38 @@ Complete the following tasks on each node in your Greenplum Database cluster to This content configures CPU, CPU accounting, CPU core set, and memory control groups managed by the `gpadmin` user. Greenplum Database uses the memory control group only for those resource groups created with the `cgroup` `MEMORY_AUDITOR`. -3. Start the cgroups service on each Greenplum Database node. The command that you run to perform this task will differ based on the operating system installed on the node. You must be the superuser or have `sudo` access to run the command: +1. Start the cgroups service on each Greenplum Database node. You must be the superuser or have `sudo` access to run the command: + - Redhat/CentOS 7.x/8.x systems: + + ``` + cgconfigparser -l /etc/cgconfig.conf + ``` + - Redhat/CentOS 6.x systems: + + ``` + service cgconfig start + ``` + +1. To automatically recreate Greenplum Database required cgroup hierarchies and parameters when your system is restarted, configure your system to enable the Linux cgroup service daemon `cgconfig.service` \(Redhat/CentOS 7.x/8.x\) or `cgconfig` \(Redhat/CentOS 6.x\) at node start-up. To ensure the configuration is persistent after reboot, run the following commands as user root: + - Redhat/CentOS 7.x/8.x systems: ``` - sudo cgconfigparser -l /etc/cgconfig.conf + systemctl enable cgconfig.service + ``` + + To start the service immediately \(without having to reboot\) enter: + + ``` + systemctl start cgconfig.service ``` - Redhat/CentOS 6.x systems: ``` - sudo service cgconfig start + chkconfig cgconfig on ``` -4. Identify the `cgroup` directory mount point for the node: +1. Identify the `cgroup` directory mount point for the node: ``` grep cgroup /proc/mounts @@ -379,7 +373,7 @@ Complete the following tasks on each node in your Greenplum Database cluster to The first line of output identifies the `cgroup` mount point. -5. Verify that you set up the Greenplum Database cgroups configuration correctly by running the following commands. Replace \ with the mount point that you identified in the previous step: +1. Verify that you set up the Greenplum Database cgroups configuration correctly by running the following commands. Replace \ with the mount point that you identified in the previous step: ``` ls -l /cpu/gpdb @@ -390,26 +384,41 @@ Complete the following tasks on each node in your Greenplum Database cluster to If these directories exist and are owned by `gpadmin:gpadmin`, you have successfully configured cgroups for Greenplum Database CPU resource management. -6. To automatically recreate Greenplum Database required cgroup hierarchies and parameters when your system is restarted, configure your system to enable the Linux cgroup service daemon `cgconfig.service` \(Redhat/CentOS 7.x/8.x\) or `cgconfig` \(Redhat/CentOS 6.x\) at node start-up. For example, configure one of the following cgroup service commands in your preferred service auto-start tool: - - Redhat/CentOS 7.x/8.x systems: - - ``` - sudo systemctl enable cgconfig.service - ``` - - To start the service immediately \(without having to reboot\) enter: - - ``` - sudo systemctl start cgconfig.service - ``` - - Redhat/CentOS 6.x systems: - - ``` - sudo chkconfig cgconfig on - ``` - - You may choose a different method to recreate the Greenplum Database resource group cgroup hierarchies. - +#### Redhat 9.x + +If you are using Redhat 9.x, the `libcgroup` and `libcgroup-tools` packages are not available with the operating system. In this scenario, you must manually create a service that automatically recreates the cgroup hierarchies after a system boot. Add the following bash script for systemd so it runs automatically during system startup. Perform the following steps as user root: + +1. Create `greenplum-cgroup-v1-config.service` + ``` + vim /etc/systemd/system/greenplum-cgroup-v1-config.service + ``` + +2. Write the following content into `greenplum-cgroup-v1-config.service`. If the user is not `gpadmin`, replace it with the appropriate user. + ``` + [Unit] + Description=Greenplum Cgroup v1 Configuration + + [Service] + Type=oneshot + RemainAfterExit=yes + WorkingDirectory=/sys/fs/cgroup + # set up hierarchies only if cgroup v1 mounted + ExecCondition=bash -c '[ xcgroupfs = x$(stat -fc "%%T" /sys/fs/cgroup/memory) ] || exit 1' + ExecStart=bash -ec '\ + for controller in cpu cpuacct cpuset memory;do \ + [ -e $controller/gpdb ] || mkdir $controller/gpdb; \ + chown -R gpadmin:gpadmin $controller/gpdb; \ + done' + + [Install] + WantedBy=basic.target + ``` + +3. Reload systemd daemon and enable the service: + ``` + systemctl daemon-reload + systemctl enable greenplum-cgroup-v1-config.service + ``` ### Procedure diff --git a/gpdb-doc/markdown/analytics/madlib.html.md b/gpdb-doc/markdown/analytics/madlib.html.md index 2104d9d0ae3e..42d2c23c9c7c 100644 --- a/gpdb-doc/markdown/analytics/madlib.html.md +++ b/gpdb-doc/markdown/analytics/madlib.html.md @@ -4,14 +4,6 @@ title: Machine Learning and Deep Learning using MADlib Apache MADlib is an open-source library for scalable in-database analytics. The Greenplum MADlib extension provides the ability to run machine learning and deep learning workloads in a Greenplum Database. -This chapter includes the following information: - -- [Installing MADlib](#topic3) -- [Upgrading MADlib](#topic_eqm_klx_hw) -- [Uninstalling MADlib](#topic6) -- [Examples](#topic9) -- [References](#topic10) - You can install it as an extension in a Greenplum Database system you can run data-parallel implementations of mathematical, statistical, graph, machine learning, and deep learning methods on structured and unstructured data. For Greenplum and MADlib version compatibility, refer to [MADlib FAQ](https://cwiki.apache.org/confluence/display/MADLIB/FAQ#FAQ-Q1-2WhatdatabaseplatformsdoesMADlibsupportandwhatistheupgradematrix?). MADlib’s suite of SQL-based algorithms run at scale within a single Greenplum Database engine without needing to transfer data between the database and other tools. @@ -53,9 +45,22 @@ For information about PivotalR, including supported MADlib functionality, see [h The R package for PivotalR can be found at [https://cran.r-project.org/web/packages/PivotalR/index.html](https://cran.r-project.org/web/packages/PivotalR/index.html). +## Prerequisites + +> **Important** Greenplum Database supports MADlib version 2.x for VMware Greenplum 6.x on RHEL8 platforms only. Upgrading from MADlib version 1.x to version 2.x is not supported. + +MADlib requires the `m4` macro processor version 1.4.13 or later. Ensure that you have access to, or superuser permissions to install, this package on each Greenplum Database host. + +MADlib 2.x requires Python 3. If you are installing version 2.x, you must also set up the Python 3 environment by registering the `python3u` extension in all databases that will use MADlib: + +``` +CREATE EXTENSION python3u; +``` + +You must register the extension before you install MADlib 2.x. + ## Installing MADlib -> **Note** MADlib requires the `m4` macro processor version 1.4.13 or later. To install MADlib on Greenplum Database, you first install a compatible Greenplum MADlib package and then install the MADlib function libraries on all databases that will use MADlib. @@ -65,23 +70,38 @@ If you have GPUs installed on some or across all hosts in the cluster, then the ### Installing the Greenplum Database MADlib Package -Before you install the MADlib package, make sure that your Greenplum database is running, you have sourced `greenplum_path.sh`, and that the`$MASTER_DATA_DIRECTORY` and `$GPHOME` variables are set. +Before you install the MADlib package, make sure that your Greenplum database is running, you have sourced `greenplum_path.sh`, and that the `$MASTER_DATA_DIRECTORY` and `$GPHOME` environment variables are set. -1. Download the MADlib extension package from [VMware Tanzu Network](https://network.pivotal.io/products/pivotal-gpdb). +1. Download the MADlib extension package from [VMware Tanzu Network](https://network.tanzu.vmware.com/products/vmware-greenplum/). 2. Copy the MADlib package to the Greenplum Database master host. 3. Follow the instructions in [Verifying the Greenplum Database Software Download](../install_guide/verify_sw.html) to verify the integrity of the **Greenplum Advanced Analytics MADlib** software. 4. Unpack the MADlib distribution package. For example: + + To unpack version 1.21: ``` $ tar xzvf madlib-1.21.0+1-gp6-rhel7-x86_64.tar.gz ``` + To unpack version 2.1.0: + + ``` + $ tar xzvf madlib-2.1.0-gp6-rhel8-x86_64.tar.gz + ``` + 5. Install the software package by running the `gppkg` command. For example: + To install version 1.21: + ``` $ gppkg -i ./madlib-1.21.0+1-gp6-rhel7-x86_64/madlib-1.21.0+1-gp6-rhel7-x86_64.gppkg ``` + To install version 2.1.0: + + ``` + $ gppkg -i ./madlib-2.1.0-gp6-rhel8-x86_64/madlib-2.1.0-gp6-rhel8-x86_64.gppkg + ``` ### Adding MADlib Functions to a Database @@ -107,25 +127,39 @@ $ madpack -s madlib -p greenplum -c gpadmin@mdw:5432/testdb install-check > **Note** The command `madpack -h` displays information for the utility. -## Upgrading MADlib +## Upgrading MADlib -You upgrade an installed MADlib package with the Greenplum Database `gppkg` utility and the MADlib `madpack` command. +> **Important** Greenplum Database does not support directly upgrading from MADlib 1.x to version 2.x. You must back up your MADlib models, uninstall version 1.x, install version 2.x, and reload the models. + +You upgrade an installed MADlib version 1.x or 2.x package with the Greenplum Database `gppkg` utility and the MADlib `madpack` command. For information about the upgrade paths that MADlib supports, see the MADlib support and upgrade matrix in the [MADlib FAQ page](https://cwiki.apache.org/confluence/display/MADLIB/FAQ#FAQ-Q1-2WhatdatabaseplatformsdoesMADlibsupportandwhatistheupgradematrix?). -### Upgrading a MADlib Package +### Upgrading a MADlib 1.x Package + +> **Important** Greenplum Database does not support upgrading from MADlib version 1.x to version 2.x. Use this procedure to upgrade from an older MADlib version 1.x release to a newer version 1.x release. -To upgrade MADlib, run the `gppkg` utility with the `-u` option. This command upgrades an installed MADlib package to MADlib 1.21.0+1. +To upgrade MADlib, run the `gppkg` utility with the `-u` option. This command upgrades an installed MADlib 1.x package to MADlib 1.21.0+1. ``` $ gppkg -u madlib-1.21.0+1-gp6-rhel7-x86_64.gppkg ``` +### Upgrading a MADlib 2.x Package + +> **Important** Greenplum Database does not support upgrading from MADlib version 1.x to version 2.x. Use this procedure to upgrade from an older MADlib version 2.x release to a newer version 2.x release. + +To upgrade MADlib, run the `gppkg` utility with the `-u` option. This command upgrades an installed MADlib 2.0.x package to MADlib 2.1.0: + +``` +$ gppkg -u madlib-2.1.0-gp6-rhel8-x86_64.gppkg +``` + ### Upgrading MADlib Functions -After you upgrade the MADlib package from one major version to another, run `madpack upgrade` to upgrade the MADlib functions in a database schema. +After you upgrade the MADlib package from one minor version to another, run `madpack upgrade` to upgrade the MADlib functions in a database schema. -> **Note** Use `madpack upgrade` only if you upgraded a major MADlib package version, for example from 1.19.0 to 1.21.0. You do not need to update the functions within a patch version upgrade, for example from 1.16+1 to 1.16+3. +> **Note** Use `madpack upgrade` only if you upgraded a minor MADlib package version, for example from 1.19.0 to 1.21.0, or from 2.0.0 to 2.1.0. You do not need to update the functions within a patch version upgrade, for example from 1.16+1 to 1.16+3. This example command upgrades the MADlib functions in the schema `madlib` of the Greenplum Database `test`. @@ -150,12 +184,20 @@ $ madpack -s madlib -p greenplum -c gpadmin@mdw:5432/testdb uninstall ### Uninstall the Greenplum Database MADlib Package -If no databases use the MADlib functions, use the Greenplum `gppkg` utility with the `-r` option to uninstall the MADlib package. When removing the package you must specify the package and version. This example uninstalls MADlib package version 1.21. +If no databases use the MADlib functions, use the Greenplum `gppkg` utility with the `-r` option to uninstall the MADlib package. When removing the package you must specify the package and version. For example: + +To uninstall MADlib package version 1.21.0: ``` $ gppkg -r madlib-1.21.0+1-gp6-rhel7-x86_64 ``` +To uninstall MADlib package version 2.1.0: + +``` +$ gppkg -r madlib-2.1.0-gp6-rhel8-x86_64 +``` + You can run the `gppkg` utility with the options `-q --all` to list the installed extensions and their versions. After you uninstall the package, restart the database. diff --git a/gpdb-doc/markdown/install_guide/install_modules.html.md b/gpdb-doc/markdown/install_guide/install_modules.html.md index 9a51da51f804..4f20b7702f6d 100644 --- a/gpdb-doc/markdown/install_guide/install_modules.html.md +++ b/gpdb-doc/markdown/install_guide/install_modules.html.md @@ -29,15 +29,16 @@ You can register the following modules in this manner:
  • diskquota
  • fuzzystrmatch
  • gp_array_agg
  • +
  • gp_check_functions
  • gp_parallel_retrieve_cursor
  • gp_percentile_agg
  • gp_sparse_vector
  • greenplum_fdw
  • +
  • hstore
    • -
    • hstore
    • ip4r
    • ltree
    • orafce (VMware Greenplum only)
    • diff --git a/gpdb-doc/markdown/install_guide/platform-requirements-overview.md.hbs b/gpdb-doc/markdown/install_guide/platform-requirements-overview.md.hbs index 2d6b0bbfbc29..576d7bca9406 100644 --- a/gpdb-doc/markdown/install_guide/platform-requirements-overview.md.hbs +++ b/gpdb-doc/markdown/install_guide/platform-requirements-overview.md.hbs @@ -6,9 +6,12 @@ This topic describes the Greenplum Database 6 platform and operating system soft Greenplum Database 6 runs on the following operating system platforms: +- Red Hat Enterprise Linux 64-bit 9.x - Red Hat Enterprise Linux 64-bit 8.7 or later (As of Greenplum Database version 6.20. See the following [Note](#rhel-issues)) - Red Hat Enterprise Linux 64-bit 7.x \(See the following [Note](#rhel-issues).\) - Red Hat Enterprise Linux 64-bit 6.x +- Rocky Linux 9.x +- Rocky Linux 8.7 or later - CentOS 64-bit 7.x - CentOS 64-bit 6.x - Ubuntu 18.04 LTS @@ -16,7 +19,7 @@ Greenplum Database 6 runs on the following operating system platforms: -> **Caution** Do not install anti-virus software of any type on Greenplum Database hosts. VMware Greenplum is not supported for use with anti-virus software because the additional CPU and IO load interferes with Greenplum Database operations. +> **Note** If you use endpoint security software on your Greenplum Database hosts, it may affect your database performance and stability. See [About Endpoint Security Sofware](../security-guide/topics/preface.html#endpoint_security) for more information. > **Caution** A kernel issue in Red Hat Enterprise Linux 8.5 and 8.6 can cause I/O freezes and synchronization problems with XFS filesystems. This issue is fixed in RHEL 8.7. See [RHEL8: xfs_buf deadlock between inode deletion and block allocation](https://access.redhat.com/solutions/6984334). > Significant Greenplum Database performance degradation has been observed when enabling resource group-based workload management on RedHat 6.x and CentOS 6.x systems. This issue is caused by a Linux cgroup kernel bug. This kernel bug has been fixed in CentOS 7.x and Red Hat 7.x/8.x systems. @@ -40,58 +43,76 @@ Greenplum Database 6 requires the following software packages on RHEL/CentOS 6/7 - bash - bzip2 - curl -- krb5 +- compat-openssl11 (RHEL/Rocky 9) +- iproute +- krb5-devel - libcgroup (RHEL/CentOS 6) -- libcgroup-tools (RHEL/CentOS 7) +- libcgroup-tools (RHEL/CentOS 7 and RHEL/Rocky 8) - libcurl -- libevent +- libevent (RHEL/CentOS 7 and RHEL/Rocky 8) +- libevent2 (RHEL/CentOS 6) +- libuuid - libxml2 - libyaml -- zlib +- libzstd (RHEL/Rocky 9) +- less +- net-tools (Debian/Fedora) - openldap +- openssh - openssh-client +- openssh-server - openssl -- openssl-libs \(RHEL7/Centos7\) +- openssl-libs (RHEL/CentOS 7 and RHEL/Rocky 8) - perl +- python3 (RHEL/Rocky 9) - readline - rsync -- R -- sed \(used by `gpinitsystem`\) +- sed - tar +- which - zip +- zlib VMware Greenplum Database 6 client software requires these operating system packages: - apr -- apr-util +- bzip2 +- libedit - libyaml -- libevent +- libevent (RHEL/CentOS 7 and RHEL/Rocky 8) +- libevent2 (RHEL/CentOS 6) +- openssh +- zlib On Ubuntu systems, Greenplum Database 6 requires the following software packages, which are installed automatically as dependencies when you install Greenplum Database with the Debian package installer: -- libapr1 -- libaprutil1 - bash - bzip2 +- iproute2 +- iputils-ping - krb5-multidev +- libapr1 +- libaprutil1 - libcurl3-gnutls - libcurl4 - libevent-2.1-6 +- libldap-2.4-2 +- libreadline7 or libreadline8 +- libuuid1 - libxml2 - libyaml-0-2 -- zlib1g -- libldap-2.4-2 +- less +- locales +- net-tools - openssh-client +- openssh-server - openssl - perl -- readline - rsync - sed - tar - zip -- net-tools -- less -- iproute2 +- zlib1g Greenplum Database 6 uses Python 2.7.18, which is included with the product installation \(and not installed as a package dependency\). @@ -191,7 +212,7 @@ This table lists the versions of the Greenplum Extensions that are compatible wi MADlib Machine Learning -1.21, 1.20, 1.19, 1.18, 1.17, 1.16 +2.1, 2.0, 1.21, 1.20, 1.19, 1.18, 1.17, 1.16 Support matrix at MADlib FAQ. diff --git a/gpdb-doc/markdown/install_guide/prep_os.html.md b/gpdb-doc/markdown/install_guide/prep_os.html.md index d8577dfbbf5d..69ccfbc175fc 100644 --- a/gpdb-doc/markdown/install_guide/prep_os.html.md +++ b/gpdb-doc/markdown/install_guide/prep_os.html.md @@ -4,8 +4,6 @@ title: Configuring Your Systems Describes how to prepare your operating system environment for Greenplum Database software installation. -> **Caution** Do not install anti-virus software of any type on Greenplum Database hosts. VMware Greenplum is not supported for use with anti-virus software because the additional CPU and IO load interferes with Greenplum Database operations. - Perform the following tasks in order: 1. Make sure your host systems meet the requirements described in [Platform Requirements](platform-requirements-overview.html). @@ -64,7 +62,7 @@ If you choose to enable SELinux in `Enforcing` mode, then Greenplum processes an ## Deactivate or Configure Firewall Software -You should also deactivate firewall software such as `iptables` \(on systems such as RHEL 6.x and CentOS 6.x \), `firewalld` \(on systems such as RHEL 7.x and CentOS 7.x\), or `ufw` \(on Ubuntu systems, deactivated by default\). If firewall software is not deactivated, you must instead configure your software to allow required communication between Greenplum hosts. +You should also deactivate firewall software such as `iptables` \(on systems such as RHEL 6.x and CentOS 6.x \), `firewalld` \(on systems such as RHEL 7.x and CentOS 7.x and later\), or `ufw` \(on Ubuntu systems, deactivated by default\). If firewall software is not deactivated, you must instead configure your software to allow required communication between Greenplum hosts. To deactivate `iptables`: @@ -302,7 +300,7 @@ Set the following parameters in the `/etc/security/limits.conf` file: * hard nproc 131072 ``` -For Red Hat Enterprise Linux \(RHEL\) and CentOS systems, parameter values in the `/etc/security/limits.d/90-nproc.conf` file \(RHEL/CentOS 6\) or `/etc/security/limits.d/20-nproc.conf` file \(RHEL/CentOS 7\) override the values in the `limits.conf` file. Ensure that any parameters in the override file are set to the required value. The Linux module `pam_limits` sets user limits by reading the values from the `limits.conf` file and then from the override file. For information about PAM and user limits, see the documentation on PAM and `pam_limits`. +For Red Hat Enterprise Linux \(RHEL\) and CentOS systems, parameter values in the `/etc/security/limits.d/90-nproc.conf` file \(RHEL/CentOS 6\) or `/etc/security/limits.d/20-nproc.conf` file \(RHEL/CentOS 7 and later\) override the values in the `limits.conf` file. Ensure that any parameters in the override file are set to the required value. The Linux module `pam_limits` sets user limits by reading the values from the `limits.conf` file and then from the override file. For information about PAM and user limits, see the documentation on PAM and `pam_limits`. Run the `ulimit -u` command on each segment host to display the maximum number of processes that are available to each user. Validate that the return value is 131072. @@ -335,7 +333,7 @@ XFS is the preferred data storage file system on Linux platforms. Use the `mount rw,nodev,noatime,nobarrier,inode64 ``` -The `nobarrier` option is not supported on RHEL 8 or Ubuntu systems. Use only the options: +The `nobarrier` option is not supported on RHEL 8 or Ubuntu systems or later. Use only the options: ``` rw,nodev,noatime,inode64 @@ -414,7 +412,7 @@ The XFS options can also be set in the `/etc/fstab` file. This example entry fro Non-Volatile Memory Express (NVMe) - RHEL 7
      RHEL 8
      Ubuntu + RHEL 7
      RHEL 8
      RHEL 9
      Ubuntu none @@ -423,7 +421,7 @@ The XFS options can also be set in the `/etc/fstab` file. This example entry fro noop - RHEL 8
      Ubuntu + RHEL 8
      RHEL 9
      Ubuntu none @@ -432,7 +430,7 @@ The XFS options can also be set in the `/etc/fstab` file. This example entry fro deadline - RHEL 8
      Ubuntu + RHEL 8
      RHEL 9
      Ubuntu mq-deadline @@ -452,7 +450,7 @@ The XFS options can also be set in the `/etc/fstab` file. This example entry fro > **Note** Using the `echo` command to set the disk I/O scheduler policy is not persistent; you must ensure that you run the command whenever the system reboots. How to run the command will vary based on your system. - To specify the I/O scheduler at boot time on systems that use `grub2` such as RHEL 7.x or CentOS 7.x, use the system utility `grubby`. This command adds the parameter when run as `root`: + To specify the I/O scheduler at boot time on systems that use `grub2` such as RHEL 7.x or CentOS 7.x and later, use the system utility `grubby`. This command adds the parameter when run as `root`: ``` # grubby --update-kernel=ALL --args="elevator=deadline" @@ -466,9 +464,9 @@ The XFS options can also be set in the `/etc/fstab` file. This example entry fro # grubby --info=ALL ``` - Refer to your operating system documentation for more information about the `grubby` utility. If you used the `grubby` command to configure the disk scheduler on a RHEL or CentOS 7.x system and it does not update the kernels, see the [Note](#grubby_note) at the end of the section. + Refer to your operating system documentation for more information about the `grubby` utility. If you used the `grubby` command to configure the disk scheduler on a RHEL or CentOS 7.x system and later and it does not update the kernels, see the [Note](#grubby_note) at the end of the section. - For additional information about configuring the disk scheduler, refer to the RedHat Enterprise Linux documentation for [RHEL 7](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-storage_and_file_systems-configuration_tools#sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-Configuration_tools-Setting_the_default_IO_scheduler) or [RHEL 8](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-the-disk-scheduler_monitoring-and-managing-system-status-and-performance). The Ubuntu wiki [IOSchedulers](https://wiki.ubuntu.com/Kernel/Reference/IOSchedulers) topic describes the I/O schedulers available on Ubuntu systems. + For additional information about configuring the disk scheduler, refer to the RedHat Enterprise Linux documentation for [RHEL 7](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-storage_and_file_systems-configuration_tools#sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-Configuration_tools-Setting_the_default_IO_scheduler), [RHEL 8](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-the-disk-scheduler_monitoring-and-managing-system-status-and-performance), or [RHEL 9](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/monitoring_and_managing_system_status_and_performance/setting-the-disk-scheduler_monitoring-and-managing-system-status-and-performance). The Ubuntu wiki [IOSchedulers](https://wiki.ubuntu.com/Kernel/Reference/IOSchedulers) topic describes the I/O schedulers available on Ubuntu systems. ### Networking @@ -498,7 +496,7 @@ kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/ initrd /initrd-2.6.18-274.3.1.el5.img ``` -On systems that use `grub2` such as RHEL 7.x or CentOS 7.x, use the system utility `grubby`. This command adds the parameter when run as root. +On systems that use `grub2` such as RHEL 7.x or CentOS 7.x and later, use the system utility `grubby`. This command adds the parameter when run as root. ``` # grubby --update-kernel=ALL --args="transparent_hugepage=never" diff --git a/gpdb-doc/markdown/ref_guide/config_params/guc-list.html.md b/gpdb-doc/markdown/ref_guide/config_params/guc-list.html.md index eed6fe66d375..a70d7e6c14ac 100644 --- a/gpdb-doc/markdown/ref_guide/config_params/guc-list.html.md +++ b/gpdb-doc/markdown/ref_guide/config_params/guc-list.html.md @@ -1093,6 +1093,14 @@ communication. In these cases, you must configure this parameter to use a wildca |-----------|-------|-------------------| |wildcard,unicast|wildcard|local, system, reload| +## gp_interconnect_cursor_ic_table_size + +Specifies the size of the Cursor History Table for UDP interconnect. Although it is not usually necessary, you may increase it if running a user-defined function which contains many concurrent cursor queries hangs. The default value is 128. + +|Value Range|Default|Set Classifications| +|-----------|-------|-------------------| +|128-102400|128|master, session, reload| + ## gp_interconnect_debug_retry_interval Specifies the interval, in seconds, to log Greenplum Database interconnect debugging messages when the server configuration parameter [gp\_log\_interconnect](#gp_log_interconnect) is set to `DEBUG`. The default is 10 seconds. @@ -3259,6 +3267,18 @@ The value of [wal\_sender\_timeout](#replication_timeout) controls the time that |-----------|-------|-------------------| |integer 0- INT\_MAX/1000|10 sec|master, system, reload, superuser| +## work_mem + +Sets the maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. If this value is specified without units, it is taken as kilobytes. The default value is 32 MB. Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. In addition, several running sessions may be performing such operations concurrently. Therefore, the total memory used could be many times the value of `work_mem`; keep this fact in mind when choosing the value for this parameter. Sort operations are used for `ORDER BY`, `DISTINCT`, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of `IN` subqueries. Apart from sorting and hashing, bitmap index scans also rely on `work_mem`. Operations relying on tuplestores such as function scans, CTEs, PL/pgSQL and administration UDFs also rely on `work_mem`. + +Apart from assigning memory to specific execution operators, setting `work_mem` also influences certain query plans over others, when the Postgres-based planner is used as the optimizer. + +`work_mem` is a distinct memory management concept that does not interact with resource queue or resource group memory controls, which are imposed at the query level. + +|Value Range|Default|Set Classifications| +|-----------|-------|-------------------| +|number of kilobytes|32MB|coordinator, session, reload| + ## writable\_external\_table\_bufsize Size of the buffer that Greenplum Database uses for network communication, such as the `gpfdist` utility and external web tables \(that use http\). Valid units are `KB` \(as in `128KB`\), `MB`, `GB`, and `TB`. Greenplum Database stores data in the buffer before writing the data out. For information about `gpfdist`, see the *Greenplum Database Utility Guide*. diff --git a/gpdb-doc/markdown/ref_guide/config_params/guc_category-list.html.md b/gpdb-doc/markdown/ref_guide/config_params/guc_category-list.html.md index d667d948fbe7..2cd945dc6557 100644 --- a/gpdb-doc/markdown/ref_guide/config_params/guc_category-list.html.md +++ b/gpdb-doc/markdown/ref_guide/config_params/guc_category-list.html.md @@ -76,6 +76,7 @@ These parameters control system memory usage. - [max_stack_depth](guc-list.html#max_stack_depth) - [shared_buffers](guc-list.html#shared_buffers) - [temp_buffers](guc-list.html#temp_buffers) +- [work_mem](guc-list.html#work_mem) ### OS Resource Parameters @@ -475,6 +476,7 @@ The parameters in this topic control the configuration of the Greenplum Database ### Interconnect Configuration Parameters - [gp_interconnect_address_type](guc-list.html#gp_interconnect_address_type) +- [gp_interconnect_cursor_ic_table_size](guc-list.html#gp_interconnect_cursor_ic_table_size) - [gp_interconnect_fc_method](guc-list.html#gp_interconnect_fc_method) - [gp_interconnect_proxy_addresses](guc-list.html#gp_interconnect_proxy_addresses) - [gp_interconnect_queue_depth](guc-list.html#gp_interconnect_queue_depth) diff --git a/gpdb-doc/markdown/ref_guide/gp_toolkit.html.md b/gpdb-doc/markdown/ref_guide/gp_toolkit.html.md index 61104782fa91..970a691f62a3 100644 --- a/gpdb-doc/markdown/ref_guide/gp_toolkit.html.md +++ b/gpdb-doc/markdown/ref_guide/gp_toolkit.html.md @@ -874,6 +874,58 @@ This external table runs the `df` \(disk free\) command on the active segment ho |dfdevice|The device name| |dfspace|Free disk space in the segment file system in kilobytes| +## Checking for Missing and Orphaned Data Files + +Greenplum Database considers a relation data file that is present in the catalog, but not on disk, to be missing. Conversely, when Greenplum encounters an unexpected data file on disk that is not referenced in any relation, it considers that file to be orphaned. + +Greenplum Database provides the following views to help identify if missing or orphaned files exist in the current database: + +- [gp_check_orphaned_files](#mf_orphaned) +- [gp_check_missing_files](#mf_missing) +- [gp_check_missing_files_ext](#mf_missing_ext) + +Consider it a best practice to check for these conditions prior to expanding the cluster or before offline maintenance. + +By default, the views identified in this section are available to `PUBLIC`. + +### gp_check_orphaned_files + +The `gp_check_orphaned_files` view scans the default and user-defined tablespaces for orphaned data files. Greenplum Database considers normal data files, files with an underscore (`_`) in the name, and extended numbered files (files that contain a `.` in the name) in this check. `gp_check_orphaned_files` gathers results from the Greenplum Database MASTER and all segments. + +|Column|Description| +|------|-----------| +| gp_segment_id | The Greenplum Database segment identifier. | +| tablespace | The identifier of the tablespace in which the orphaned file resides. | +| filename | The file name of the orphaned data file. | +| filepath | The file system path of the orphaned data file, relative to `$MASTER_DATA_DIRECTORY`. | + +> **Caution** Use this view as one of many data points to identify orphaned data files. Do not delete files based solely on results from querying this view. + + +### gp_check_missing_files + +The `gp_check_missing_files` view scans heap and append-optimized, column-oriented tables for missing data files. Greenplum considers only normal data files (files that do not contain a `.` or an `_` in the name) in this check. `gp_check_missing_files` gathers results from the Greenplum Database master and all segments. + +|Column|Description| +|------|-----------| +| gp_segment_id | The Greenplum Database segment identifier. | +| tablespace | The identifier of the tablespace in which the table resides. | +| relname | The name of the table that has a missing data file(s). | +| filename | The file name of the missing data file. | + + +### gp_check_missing_files_ext + +The `gp_check_missing_files_ext` view scans only append-optimized, column-oriented tables for missing extended data files. Greenplum Database considers both normal data files and extended numbered files (files that contain a `.` in the name) in this check. Files that contain an `_` in the name are not considered. `gp_check_missing_files_ext` gathers results from the Greenplum Database segments only. + +|Column|Description| +|------|-----------| +| gp_segment_id | The Greenplum Database segment identifier. | +| tablespace | The identifier of the tablespace in which the table resides. | +| relname | The name of the table that has a missing extended data file(s). | +| filename | The file name of the missing extended data file. | + + ## Checking for Uneven Data Distribution All tables in Greenplum Database are distributed, meaning their data is divided across all of the segments in the system. If the data is not distributed evenly, then query processing performance may decrease. The following views can help diagnose if a table has uneven data distribution: diff --git a/gpdb-doc/markdown/ref_guide/modules/diskquota.html.md b/gpdb-doc/markdown/ref_guide/modules/diskquota.html.md index 6be439dcf729..a1af4cac603a 100644 --- a/gpdb-doc/markdown/ref_guide/modules/diskquota.html.md +++ b/gpdb-doc/markdown/ref_guide/modules/diskquota.html.md @@ -140,6 +140,8 @@ Views available in the `diskquota` module include: - [diskquota.hard\_limit](#hardlimit) - Activates or deactivates the hard limit enforcement of disk usage. - [diskquota.max\_workers](#maxworkers) - Specifies the maximum number of diskquota worker processes that may be running at any one time. - [diskquota.max\_table\_segments](#maxtableseg) - Specifies the maximum number of *table segments* in the cluster. +- [diskquota.max_quota_probes](#maxquotaprobes) - Specifies the maximum number of of quota probes pre-allocated at the cluster level. +- [diskquota.max_monitored_databases](#maxmonitoreddatabases) - Specifies the maximum number of database that the module can monitor. You use the `gpconfig` command to set these parameters in the same way that you would set any Greenplum Database server configuration parameter. @@ -193,6 +195,23 @@ A Greenplum table \(including a partitioned table’s child tables\) is distribu The runtime value of `diskquota.max_table_segments` equals the maximum number of tables multiplied by \(number\_of\_segments + 1\). The default value is `10 * 1024 * 1024`. +### Specifying the Maximum Number of Quota Probes + +The `diskquota.max_quota_probes` server configuration parameter specifies the number of quota probes allowed at the cluster level. `diskquota` requires thousands of probes to collect different quota usage in the cluster, and each quota probe is only used to monitor a specific quota usage, such as how much disk space a role uses on a certain tablespace in a certain database. Even if you do not define its corresponding disk quota rule, its corresponding quota probe runs in the background. For example, if you have 100 roles in a cluster, but you only defined disk quota rules for 10 of the roles' disk usage, Greenplum still requires quota probes for the 100 roles in the cluster. + +You may calculate the number of maximum active probes for a cluster using the following formula: + +``` +role_num * database_num + schema_num + role_num * tablespace_num * database_num + schema_num * tablespace_num +``` + +where `role_num` is the number of roles in the cluster, `tablespace_number` is the number of tablespaces in the cluster, and `schema_num` is the total number of schemas in all databases. + +You must set `diskquota.max_quota_probes` to a number greater than the calculated maximum number of active quota probes: the higher the value, the more memory is used. The memory used by the probes can be calculated as `diskquota.max_quota_probes * 48` (in bytes). The default value of `diskquota.max_quota_probes` is `1048576`, which means that the memory used by the probes by default is `1048576 * 48`, which is approximately 50MB. + +### Specifying the Maximum Number of Databases + +The `diskquota.max_monitored_databases` server configuration parameter specifies the maximum number of databases that can be monitored by `diskquota`. The default value is 50 and the maximum value is 1024. ## Using the diskquota Module @@ -446,7 +465,7 @@ The `diskquota` module has the following limitations and known issues: ## Notes -The `diskquota` module can detect a newly created table inside of an uncommitted transaction. The size of the new table is included in the disk usage calculated for the corresponding schema or role. Hard limit enforcement of disk usage must enabled for a quota-exceeding operation to trigger a `quota exceeded` error in this scenario. +The `diskquota` module can detect a newly created table inside of an uncommitted transaction. The size of the new table is included in the disk usage calculated for its corresponding schema or role. Hard limit enforcement of disk usage must enabled for a quota-exceeding operation to trigger a `quota exceeded` error in this scenario. Deleting rows or running `VACUUM` on a table does not release disk space, so these operations cannot alone remove a schema or role from the `diskquota` denylist. The disk space used by a table can be reduced by running `VACUUM FULL` or `TRUNCATE TABLE`. diff --git a/gpdb-doc/markdown/ref_guide/modules/gp_check_functions.html.md b/gpdb-doc/markdown/ref_guide/modules/gp_check_functions.html.md new file mode 100644 index 000000000000..db0c04b409cc --- /dev/null +++ b/gpdb-doc/markdown/ref_guide/modules/gp_check_functions.html.md @@ -0,0 +1,123 @@ +# gp_check_functions + +The `gp_check_functions` module implements views that identify missing and orphaned relation files. The module also exposes a user-defined function that you can use to move orphaned files. + +The `gp_check_functions` module is a Greenplum Database extension. + +## Installing and Registering the Module + +The `gp_check_functions` module is installed when you install Greenplum Database. Before you can use the views defined in the module, you must register the `gp_check_functions` extension in each database in which you want to use the views: +o + +``` +CREATE EXTENSION gp_check_functions; +``` + +Refer to [Installing Additional Supplied Modules](../../install_guide/install_modules.html) for more information. + + +## Checking for Missing and Orphaned Data Files + +Greenplum Database considers a relation data file that is present in the catalog, but not on disk, to be missing. Conversely, when Greenplum encounters an unexpected data file on disk that is not referenced in any relation, it considers that file to be orphaned. + +Greenplum Database provides the following views to help identify if missing or orphaned files exist in the current database: + +- [gp_check_orphaned_files](#orphaned) +- [gp_check_missing_files](#missing) +- [gp_check_missing_files_ext](#missing_ext) + +Consider it a best practice to check for these conditions prior to expanding the cluster or before offline maintenance. + +By default, the views in this module are available to `PUBLIC`. + +### gp_check_orphaned_files + +The `gp_check_orphaned_files` view scans the default and user-defined tablespaces for orphaned data files. Greenplum Database considers normal data files, files with an underscore (`_`) in the name, and extended numbered files (files that contain a `.` in the name) in this check. `gp_check_orphaned_files` gathers results from the Greenplum Database master and all segments. + +|Column|Description| +|------|-----------| +| gp_segment_id | The Greenplum Database segment identifier. | +| tablespace | The identifier of the tablespace in which the orphaned file resides. | +| filename | The file name of the orphaned data file. | +| filepath | The file system path of the orphaned data file, relative to the data directory of the master or segment. | + +> **Caution** Use this view as one of many data points to identify orphaned data files. Do not delete files based solely on results from querying this view. + + +### gp_check_missing_files + +The `gp_check_missing_files` view scans heap and append-optimized, column-oriented tables for missing data files. Greenplum considers only normal data files (files that do not contain a `.` or an `_` in the name) in this check. `gp_check_missing_files` gathers results from the Greenplum Database master and all segments. + +|Column|Description| +|------|-----------| +| gp_segment_id | The Greenplum Database segment identifier. | +| tablespace | The identifier of the tablespace in which the table resides. | +| relname | The name of the table that has a missing data file(s). | +| filename | The file name of the missing data file. | + + +### gp_check_missing_files_ext + +The `gp_check_missing_files_ext` view scans only append-optimized, column-oriented tables for missing extended data files. Greenplum Database considers both normal data files and extended numbered files (files that contain a `.` in the name) in this check. Files that contain an `_` in the name, and `.fsm`, `.vm`, and other supporting files, are not considered. `gp_check_missing_files_ext` gathers results from the Greenplum Database segments only. + +|Column|Description| +|------|-----------| +| gp_segment_id | The Greenplum Database segment identifier. | +| tablespace | The identifier of the tablespace in which the table resides. | +| relname | The name of the table that has a missing extended data file(s). | +| filename | The file name of the missing extended data file. | + + +## Moving Orphaned Data Files + +The `gp_move_orphaned_files()` user-defined function (UDF) moves orphaned files found by the [gp_check_orphaned_files](#orphaned) view into a file system location that you specify. + +The function signature is: `gp_move_orphaned_files( TEXT )`. + +`` must exist on all segment hosts before you move the files, and the specified directory must be accessible by the `gpadmin` user. If you specify a relative path for ``, it is considered relative to the data directory of the master or segment. + +Greenplum Database renames each moved data file to one that reflects the original location of the file in the data directory. The file name format differs depending on the tablespace in which the orphaned file resides: + +| Tablespace | Renamed File Format| +|------|-----------| +| default | `seg_base__` | +| global | `seg_global_` | +| user-defined | `seg_pg_tblspc____` | + +For example, if a file named `12345` in the default tablespace is orphaned on primary segment 2, + +``` +SELECT * FROM gp_move_orphaned_files('/home/gpadmin/orphaned'); +``` + +moves and renames the file as follows: + +| Original Location | New Location and File Name | +|------|-----------| +| `/base/13700/12345` | `/home/gpadmin/orphaned/seg2_base_13700_12345` | + +`gp_move_orphaned_files()` returns both the original and the new file system locations for each file that it moves, and also provides an indication of the success or failure of the move operation. + +Once you move the orphaned files, you may choose to remove them or to back them up. + +## Examples + +Check for missing and orphaned non-extended files: + +``` sql +SELECT * FROM gp_check_missing_files; +SELECT * FROM gp_check_orphaned_files; +``` + +Check for missing extended data files for append-optimized, column-oriented tables: + +``` sql +SELECT * FROM gp_check_missing_files_ext; +``` + +Move orphaned files to the `/home/gpadmin/orphaned` directory: + +``` sql +SELECT * FROM gp_move_orphaned_files('/home/gpadmin/orphaned'); +``` + diff --git a/gpdb-doc/markdown/ref_guide/modules/gp_parallel_retrieve_cursor.html.md b/gpdb-doc/markdown/ref_guide/modules/gp_parallel_retrieve_cursor.html.md index f798b83121d1..f6d37c8c2529 100644 --- a/gpdb-doc/markdown/ref_guide/modules/gp_parallel_retrieve_cursor.html.md +++ b/gpdb-doc/markdown/ref_guide/modules/gp_parallel_retrieve_cursor.html.md @@ -34,9 +34,9 @@ The `gp_parallel_retrieve_cursor` module provides the following functions and vi |Function, View Name|Description| |-------------------|-----------| -|gp\_get\_endpoints\(\)

      [gp\_endpoints](../system_catalogs/gp_endpoints.html#topic1)|List the endpoints associated with all active parallel retrieve cursors declared by the current session user in the current database. When the Greenplum Database superuser invokes this function, it returns a list of all endpoints for all parallel retrieve cursors declared by all users in the current database.| -|gp\_get\_session\_endpoints\(\)

      [gp\_session\_endpoints](../system_catalogs/gp_session_endpoints.html#topic1)|List the endpoints associated with all parallel retrieve cursors declared in the current session for the current session user.| -|gp\_get\_segment\_endpoints\(\)

      [gp\_segment\_endpoints](../system_catalogs/gp_segment_endpoints.html#topic1)|List the endpoints created in the QE for all active parallel retrieve cursors declared by the current session user. When the Greenplum Database superuser accesses this view, it returns a list of all endpoints on the QE created for all parallel retrieve cursors declared by all users.| +|gp\_get\_endpoints\(\)

      [gp\_endpoints](../system_catalogs/gp_endpoints.html#topic1)|List the endpoints associated with all active parallel retrieve cursors declared by the current user in the current database. When the Greenplum Database superuser invokes this function, it returns a list of all endpoints for all parallel retrieve cursors declared by all users in the current database.| +|gp\_get\_session\_endpoints\(\)

      [gp\_session\_endpoints](../system_catalogs/gp_session_endpoints.html#topic1)|List the endpoints associated with all parallel retrieve cursors declared in the current session for the current user.| +|gp\_get\_segment\_endpoints\(\)

      [gp\_segment\_endpoints](../system_catalogs/gp_segment_endpoints.html#topic1)|List the endpoints created in the QE for all active parallel retrieve cursors declared by the current user. When the Greenplum Database superuser accesses this view, it returns a list of all endpoints on the QE created for all parallel retrieve cursors declared by all users.| |gp\_wait\_parallel\_retrieve\_cursor\(cursorname text, timeout\_sec int4 \)|Return cursor status or block and wait for results to be retrieved from all endpoints associated with the specified parallel retrieve cursor.| > **Note** Each of these functions and views is located in the `pg_catalog` schema, and each `RETURNS TABLE`. @@ -112,7 +112,7 @@ These commands return the list of endpoints in a table with the following column |sessionid|The identifier of the session in which the parallel retrieve cursor was created.| |hostname|The name of the host from which to retrieve the data for the endpoint.| |port|The port number from which to retrieve the data for the endpoint.| -|username|The name of the session user \(not the current user\); *you must initiate the retrieve session as this user*.| +|username|The name of the current user; *you must initiate the retrieve session as this user*.| |state|The state of the endpoint; the valid states are:

      READY: The endpoint is ready to be retrieved.

      ATTACHED: The endpoint is attached to a retrieve connection.

      RETRIEVING: A retrieve session is retrieving data from the endpoint at this moment.

      FINISHED: The endpoint has been fully retrieved.

      RELEASED: Due to an error, the endpoint has been released and the connection closed.| |endpointname|The endpoint identifier; you provide this identifier to the `RETRIEVE` command.| @@ -132,7 +132,7 @@ Retrieve session authentication does not depend on the `pg_hba.conf` file, but r When you initiate a retrieve session to an endpoint: -- The user that you specify for the retrieve session must be the session user that declared the parallel retrieve cursor \(the `username` returned by `gp_endpoints`\). This user must have Greenplum Database login privileges. +- The user that you specify for the retrieve session must be the user that declared the parallel retrieve cursor \(the `username` returned by `gp_endpoints`\). This user must have Greenplum Database login privileges. - You specify the `hostname` and `port` returned by `gp_endpoints` for the endpoint. - You authenticate the retrieve session by specifying the `auth_token` returned for the endpoint via the `PGPASSWORD` environment variable, or when prompted for the retrieve session `Password`. - You must specify the [gp\_retrieve\_conn](../config_params/guc-list.html#gp_retrieve_conn) server configuration parameter on the connection request, and set the value to `true` . @@ -218,7 +218,7 @@ SELECT * FROM gp_get_segment_endpoints(); SELECT * FROM gp_segment_endpoints; ``` -These commands provide information about the retrieve sessions associated with a QE endpoint for all active parallel retrieve cursors declared by the current session user. When the Greenplum Database superuser invokes the command, it returns the retrieve session information for all endpoints on the QE created for all parallel retrieve cursors declared by all users. +These commands provide information about the retrieve sessions associated with a QE endpoint for all active parallel retrieve cursors declared by the current user. When the Greenplum Database superuser invokes the command, it returns the retrieve session information for all endpoints on the QE created for all parallel retrieve cursors declared by all users. You can obtain segment-specific retrieve session information in two ways: from the QD, or via a utility-mode connection to the endpoint: @@ -254,7 +254,7 @@ The commands return endpoint and retrieve session information in a table with th |state|The state of the endpoint; the valid states are:

      READY: The endpoint is ready to be retrieved.

      ATTACHED: The endpoint is attached to a retrieve connection.

      RETRIEVING: A retrieve session is retrieving data from the endpoint at this moment.

      FINISHED: The endpoint has been fully retrieved.

      RELEASED: Due to an error, the endpoint has been released and the connection closed.| |gp\_segment\_id|The QE's endpoint `gp_segment_id`.| |sessionid|The identifier of the session in which the parallel retrieve cursor was created.| -|username|The name of the session user that initiated the retrieve session.| +|username|The name of the user that initiated the retrieve session.| |endpointname|The endpoint identifier.| |cursorname|The name of the parallel retrieve cursor.| diff --git a/gpdb-doc/markdown/ref_guide/modules/greenplum_fdw.html.md b/gpdb-doc/markdown/ref_guide/modules/greenplum_fdw.html.md index 125d010624aa..5ff18f371993 100644 --- a/gpdb-doc/markdown/ref_guide/modules/greenplum_fdw.html.md +++ b/gpdb-doc/markdown/ref_guide/modules/greenplum_fdw.html.md @@ -74,7 +74,7 @@ num_segments option, the default value is the number of segments on the local Greenplum Database cluster. -The following example command creates a server named `gpc1_testdb` that will be used to access tables residing in the database named `testdb` on the remote `8`-segment Greenplum Database cluster whose master is running on the host `gpc1_master`, port `5432`: +The following example command creates a server named `gpc1_testdb` that will be used to access tables residing in the database named `testdb` on the remote 8-segment Greenplum Database cluster whose master is running on the host `gpc1_master`, port `5432`: ``` CREATE SERVER gpc1_testdb FOREIGN DATA WRAPPER greenplum_fdw @@ -164,6 +164,55 @@ Setting this option at the foreign table-level overrides a foreign server-level `greenplum_fdw` manages transactions as described in the [Transaction Management](https://www.postgresql.org/docs/9.4/postgres-fdw.html) topic in the PostgreSQL `postgres_fdw` documentation. +## About Using Resource Groups to Limit Concurrency + +You can create a dedicated user and resource group to manage `greenplum_fdw` concurrency on the remote Greenplum clusters. In the following example scenario, local cluster 2 reads data from remote cluster 1. + +Remote cluster (1) configuration: + +1. Create a dedicated Greenplum Database user/role to represent the `greenplum_fdw` users on cluster 2 that initiate queries. For example, to create a role named `gpcluster2_users`: + + ``` + CREATE ROLE gpcluster2_users; + ``` + +1. Create a dedicated resource group to manage resources for these users: + + ``` + CREATE RESOURCE GROUP rg_gpcluster2_users with (concurrency=2, cpu_rate_limit=20, memory_limit=10); + ALTER ROLE gpcluster2_users RESOURCE GROUP rg_gpcluster2_users; + ``` + + When you configure the remote cluster as described above, the `rg_gpcluster2_users` resource group manages the resources used by all queries that are initiated by `gpcluster2_users`. + +Local cluster (2) configuration: + +1. Create a `greenplum_fdw` foreign server to access the remote cluster. For example, to create a server named `gpc1_testdb` that accesses the `testdb` database: + + ``` + CREATE SERVER gpc1_testdb FOREIGN DATA WRAPPER greenplum_fdw + OPTIONS (host 'gpc1_master', port '5432', dbname 'testdb', mpp_execute 'all segments', ); + ``` + +1. Map local users of the `greenplum_fdw` foreign server to the remote role. For example, to map specific users of the `gpc1_testdb` server on the local cluster to the `gpcluster2_users` role on the remote cluster: + + ``` + CREATE USER MAPPING FOR greenplum_fdw_user1 SERVER gpc1_testdb + OPTIONS (user ‘gpcluster2_users’, password ‘changeme’); + CREATE USER MAPPING FOR greenplum_fdw_user2 SERVER gpc1_testdb + OPTIONS (user ‘gpcluster2_users’, password ‘changeme’); + ``` + +1. Create a foreign table referencing a table on the remote cluster. For example to create a foreign table that references table `t1` on the remote cluster: + + ``` + CREATE FOREIGN TABLE table_on_cluster1 ( tc1 int ) + SERVER gpc1_testdb + OPTIONS (schema_name 'public', table_name 't1', mpp_execute 'all segments'); + ``` + +All local queries on foreign table `table_on_cluster1` are bounded on the remote cluster by the `rg_gpcluster2_users` resource group limits. + ## Known Issues and Limitations The `greenplum_fdw` module has the following known issues and limitations: diff --git a/gpdb-doc/markdown/ref_guide/modules/intro.html.md b/gpdb-doc/markdown/ref_guide/modules/intro.html.md index a1893a14d54c..68cd5fce7c25 100644 --- a/gpdb-doc/markdown/ref_guide/modules/intro.html.md +++ b/gpdb-doc/markdown/ref_guide/modules/intro.html.md @@ -16,6 +16,7 @@ The following Greenplum Database and PostgreSQL `contrib` modules are installed; - [diskquota](diskquota.html) - Allows administrators to set disk usage quotas for Greenplum Database roles and schemas. - [fuzzystrmatch](fuzzystrmatch.html) - Determines similarities and differences between strings. - [gp\_array\_agg](gp_array_agg.html) - Implements a parallel `array_agg()` aggregate function for Greenplum Database. +- [gp\_check\_functions](gp_check_functions.html) - Provides views to check for orphaned and missing relation files and a user-defined function to move orphaned files. - [gp\_legacy\_string\_agg](gp_legacy_string_agg.html) - Implements a legacy, single-argument `string_agg()` aggregate function that was present in Greenplum Database 5. - [gp\_parallel\_retrieve\_cursor](gp_parallel_retrieve_cursor.html) - Provides extended cursor functionality to retrieve data, in parallel, directly from Greenplum Database segments. - [gp\_percentile\_agg](gp_percentile_agg.html) - Improves GPORCA performance for ordered-set aggregate functions. diff --git a/gpdb-doc/markdown/ref_guide/modules/timestamp9.html.md b/gpdb-doc/markdown/ref_guide/modules/timestamp9.html.md index 6d7e6a48ee7a..89cbaf587111 100644 --- a/gpdb-doc/markdown/ref_guide/modules/timestamp9.html.md +++ b/gpdb-doc/markdown/ref_guide/modules/timestamp9.html.md @@ -408,6 +408,61 @@ testdb=# SELECT now()::timestamp9; (1 row) ``` +## Support For Date/Time Functions + +The `timestamp9` module defines two server configuration parameters that you set to enable date/time functions defined in the `pg_catalog` schema on `timestamp` types. Visit the [PostgreSQL Documentation](https://www.postgresql.org/docs/12/functions-datetime.html#:~:text=Table%C2%A09.31.%C2%A0Date/Time%20Functions) for a list of the supported date/time functions. The parameters are: + +- `timestamp9.enable_implicit_cast_timestamp9_ltz_to_timestamptz`: when enabled, casting a `timestamp9_ltz` value to `timestamp with time zone` becomes implicit. +- `timestamp9.enable_implicit_cast_timestamp9_ntz_to_timestamp`: when enabled, casting a `timestamp9_ntz` value to `timestamp without time zone` becomes implicit. + +The default value for both configuration parameters is `off`. For example, if you try use the `date` function with `timestamp9` and `timestamp9.enable_implicit_cast_timestamp9_ltz_to_timestamptz` is set to `off`: + +``` +postgres=# SELECT date('2022-01-01'::timestamp9_ltz); +ERROR: implicitly cast timestamp9_ltz to timestamptz is not allowed +HINT: either set 'timestamp9.enable_implicit_cast_timestamp9_ltz_to_timestamptz' to 'on' or do it explicitly +``` +Enable the configuration parameter in order to use the `date` function: + +``` +postgres=# SET timestamp9.enable_implicit_cast_timestamp9_ltz_to_timestamptz TO 'ON'; +SET +postgres=# SELECT date('2022-01-01'::timestamp9_ltz); + date +------------ + 01-01-2022 +(1 row) +``` + +Note that enabling these configuration parameters will also result in multiple casting paths from `timestamp9` types and built-in `timestamp` types. You may encounter error messages such as: + +``` +postgres=# select '2019-09-19'::timestamp9_ltz <= '2019-09-20'::timestamptz; +ERROR: operator is not unique: timestamp9_ltz <= timestamp with time zone +LINE 1: select '2019-09-19'::timestamp9_ltz <= '2019-09-20'::timesta... +HINT: Could not choose a best candidate operator. You might need to add explicit type casts. +``` + +In this situation, cast the type explicitly: + +``` +postgres=# select '2019-09-19'::timestamp9_ntz <= '2019-09-20'::timestamptz::timestamp9_ntz; +?column? +---------- + t +(1 row) +``` + +Alternatively, cast the `timestamp9_ntz` value to the `timestamptz` value: + +``` +postgres=# select '2019-09-19'::timestamp9_ntz::timestamptz <= '2019-09-20'::timestamptz; +?column? +---------- + t +(1 row) +``` + ## Examples ### `TIMESTAMP9_LTZ` Examples diff --git a/gpdb-doc/markdown/ref_guide/sql_commands/ALTER_TABLE.html.md b/gpdb-doc/markdown/ref_guide/sql_commands/ALTER_TABLE.html.md index c86c0e209166..c070049b2c26 100644 --- a/gpdb-doc/markdown/ref_guide/sql_commands/ALTER_TABLE.html.md +++ b/gpdb-doc/markdown/ref_guide/sql_commands/ALTER_TABLE.html.md @@ -210,9 +210,12 @@ where storage\_parameter is: - **SET WITHOUT OIDS** — Removes the OID system column from the table. - > **Caution** VMware does not support using `SET WITH OIDS` or `oids=TRUE` to assign an OID system column.On large tables, such as those in a typical Greenplum Database system, using OIDs for table rows can cause wrap-around of the 32-bit OID counter. Once the counter wraps around, OIDs can no longer be assumed to be unique, which not only makes them useless to user applications, but can also cause problems in the Greenplum Database system catalog tables. In addition, excluding OIDs from a table reduces the space required to store the table on disk by 4 bytes per row, slightly improving performance. You cannot create OIDS on a partitioned or column-oriented table \(an error is displayed\). This syntax is deprecated and will be removed in a future Greenplum release. + You cannot create OIDS on a partitioned or column-oriented table \(an error is displayed\). This syntax is deprecated and will be removed in a future Greenplum release. + + > **Caution** VMware does not support using `SET WITH OIDS` or `oids=TRUE` to assign an OID system column. On large tables, such as those in a typical Greenplum Database system, using OIDs for table rows can cause the 32-bit counter to wrap-around. After the counter wraps around, OIDs can no longer be assumed to be unique, which not only makes them useless to user applications, but can also cause problems in the Greenplum Database system catalog tables. In addition, excluding OIDs from a table reduces the space required to store the table on disk by 4 bytes per row, slightly improving performance. - **SET \( FILLFACTOR = value\) / RESET \(FILLFACTOR\)** — Changes the fillfactor for the table. The fillfactor for a table is a percentage between 10 and 100. 100 \(complete packing\) is the default. When a smaller fillfactor is specified, `INSERT` operations pack table pages only to the indicated percentage; the remaining space on each page is reserved for updating rows on that page. This gives `UPDATE` a chance to place the updated copy of a row on the same page as the original, which is more efficient than placing it on a different page. For a table whose entries are never updated, complete packing is the best choice, but in heavily updated tables smaller fillfactors are appropriate. Note that the table contents will not be modified immediately by this command. You will need to rewrite the table to get the desired effects. That can be done with [VACUUM](VACUUM.html) or one of the forms of `ALTER TABLE` that forces a table rewrite. For information about the forms of `ALTER TABLE` that perform a table rewrite, see [Notes](#section5). + - **SET DISTRIBUTED** — Changes the distribution policy of a table. Changing a hash distribution policy, or changing to or from a replicated policy, will cause the table data to be physically redistributed on disk, which can be resource intensive. *Greenplum Database does not permit changing the distribution policy of a writable external table.* - **INHERIT parent\_table / NO INHERIT parent\_table** — Adds or removes the target table as a child of the specified parent table. Queries against the parent will include records of its child table. To be added as a child, the target table must already contain all the same columns as the parent \(it could have additional columns, too\). The columns must have matching data types, and if they have `NOT NULL` constraints in the parent then they must also have `NOT NULL` constraints in the child. There must also be matching child-table constraints for all `CHECK` constraints of the parent, except those marked non-inheritable \(that is, created with `ALTER TABLE ... ADD CONSTRAINT ... NO INHERIT`\) in the parent, which are ignored; all child-table constraints matched must not be marked non-inheritable. Currently `UNIQUE`, `PRIMARY KEY`, and `FOREIGN KEY` constraints are not considered, but this may change in the future. - OF type\_name — This form links the table to a composite type as though `CREATE TABLE OF` had formed it. The table's list of column names and types must precisely match that of the composite type; the presence of an `oid` system column is permitted to differ. The table must not inherit from any other table. These restrictions ensure that `CREATE TABLE OF` would permit an equivalent table definition. @@ -282,6 +285,9 @@ index\_name FILLFACTOR : Set the fillfactor percentage for a table. +: The fillfactor option is valid only for heap tables (`appendoptimized=false`). + + value : The new value for the `FILLFACTOR` parameter, which is a percentage between 10 and 100. 100 is the default. diff --git a/gpdb-doc/markdown/ref_guide/sql_commands/CREATE_TABLE.html.md b/gpdb-doc/markdown/ref_guide/sql_commands/CREATE_TABLE.html.md index 4d707b9b3d37..af12821d6953 100644 --- a/gpdb-doc/markdown/ref_guide/sql_commands/CREATE_TABLE.html.md +++ b/gpdb-doc/markdown/ref_guide/sql_commands/CREATE_TABLE.html.md @@ -387,6 +387,8 @@ WITH \( storage\_parameter=value \) : **fillfactor** — The fillfactor for a table is a percentage between 10 and 100. 100 \(complete packing\) is the default. When a smaller fillfactor is specified, `INSERT` operations pack table pages only to the indicated percentage; the remaining space on each page is reserved for updating rows on that page. This gives `UPDATE` a chance to place the updated copy of a row on the same page as the original, which is more efficient than placing it on a different page. For a table whose entries are never updated, complete packing is the best choice, but in heavily updated tables smaller fillfactors are appropriate. This parameter cannot be set for TOAST tables. +: The fillfactor option is valid only for heap tables (`appendoptimized=FALSE`). + : **analyze_hll_non_part_table** — Set this storage parameter to `true` to force collection of HLL statistics even if the table is not part of a partitioned table. This is useful if the table will be exchanged or added to a partitioned table, so that the table does not need to be re-analyzed. The default is `false`. : **oids=FALSE** — This setting is the default, and it ensures that rows do not have object identifiers assigned to them. VMware does not support using `WITH OIDS` or `oids=TRUE` to assign an OID system column.On large tables, such as those in a typical Greenplum Database system, using OIDs for table rows can cause wrap-around of the 32-bit OID counter. Once the counter wraps around, OIDs can no longer be assumed to be unique, which not only makes them useless to user applications, but can also cause problems in the Greenplum Database system catalog tables. In addition, excluding OIDs from a table reduces the space required to store the table on disk by 4 bytes per row, slightly improving performance. You cannot create OIDS on a partitioned or column-oriented table \(an error is displayed\). This syntax is deprecated and will be removed in a future Greenplum release. diff --git a/gpdb-doc/markdown/ref_guide/system_catalogs/pg_stat_indexes.html.md b/gpdb-doc/markdown/ref_guide/system_catalogs/pg_stat_indexes.html.md index 742cdfee8f12..99ba30da5b6d 100644 --- a/gpdb-doc/markdown/ref_guide/system_catalogs/pg_stat_indexes.html.md +++ b/gpdb-doc/markdown/ref_guide/system_catalogs/pg_stat_indexes.html.md @@ -56,7 +56,7 @@ FROM SELECT * FROM pg_stat_all_indexes WHERE relid < 16384) m, pg_stat_all_indexes s -WHERE m.relid = s.relid; +WHERE m.relid = s.relid AND m.indexrelid = s.indexrelid; CREATE VIEW pg_stat_sys_indexes_gpdb6 AS diff --git a/gpdb-doc/markdown/ref_guide/toc.md b/gpdb-doc/markdown/ref_guide/toc.md index 6f466d059533..41240f545026 100644 --- a/gpdb-doc/markdown/ref_guide/toc.md +++ b/gpdb-doc/markdown/ref_guide/toc.md @@ -177,6 +177,7 @@ Doc Index - [diskquota](./modules/diskquota.md) - [fuzzystrmatch](./modules/fuzzystrmatch.md) - [gp\_array\_agg](./modules/gp_array_agg.md) + - [gp\_check\_functions](./modules/gp_check_functions.md) - [gp\_legacy\_string\_agg](./modules/gp_legacy_string_agg.md) - [gp\_parallel\_retrieve\_cursor (Beta)](./modules/gp_parallel_retrieve_cursor.md) - [gp\_percentile\_agg](./modules/gp_percentile_agg.md) diff --git a/gpdb-doc/markdown/security-guide/topics/Authenticate.html.md b/gpdb-doc/markdown/security-guide/topics/Authenticate.html.md index 2f00b575ef48..ba2e7e1d5b57 100644 --- a/gpdb-doc/markdown/security-guide/topics/Authenticate.html.md +++ b/gpdb-doc/markdown/security-guide/topics/Authenticate.html.md @@ -369,7 +369,6 @@ For more details on how to create your server private key and certificate, refer The following Server settings need to be specified in the `postgresql.conf` configuration file: - `ssl` *boolean*. Enables SSL connections. -- `ssl_renegotiation_limit` *integer*. Specifies the data limit before key renegotiation. - `ssl_ciphers` *string*. Configures the list SSL ciphers that are allowed. `ssl_ciphers` *overrides* any ciphers string specified in `/etc/openssl.cnf`. The default value `ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH` enables all ciphers except for ADH, LOW, EXP, and MD5 ciphers, and prioritizes ciphers by their strength.
      > **Note** With TLS 1.2 some ciphers in MEDIUM and HIGH strength still use NULL encryption \(no encryption for transport\), which the default `ssl_ciphers` string allows. To bypass NULL ciphers with TLS 1.2 use a string such as `TLSv1.2:!eNULL:!aNULL`. diff --git a/gpdb-doc/markdown/security-guide/topics/preface.html.md b/gpdb-doc/markdown/security-guide/topics/preface.html.md index 5dcb79813a48..704e600cf57b 100644 --- a/gpdb-doc/markdown/security-guide/topics/preface.html.md +++ b/gpdb-doc/markdown/security-guide/topics/preface.html.md @@ -25,3 +25,9 @@ Describes how to encrypt data at rest in the database or in transit over the net - **[Security Best Practices](../topics/BestPractices.html)** Describes basic security best practices that you should follow to ensure the highest level of system security.  +## About Endpoint Security Software + +If you install any endpoint security software on your Greenplum Database hosts, such as anti-virus, data protection, network security, or other security related software, the additional CPU, IO, network or memory load can interfere with Greenplum Database operations and may affect database performance and stability. + +Refer to your endpoint security vendor and perform careful testing in a non-production environment to ensure it does not have any negative impact on Greenplum Database operations. + diff --git a/gpdb-doc/markdown/utility_guide/ref/gpcheckperf.html.md b/gpdb-doc/markdown/utility_guide/ref/gpcheckperf.html.md index 8ed4a55ddc9c..9a682b49f1c2 100644 --- a/gpdb-doc/markdown/utility_guide/ref/gpcheckperf.html.md +++ b/gpdb-doc/markdown/utility_guide/ref/gpcheckperf.html.md @@ -7,7 +7,7 @@ Verifies the baseline hardware performance of the specified hosts. ``` gpcheckperf -d [-d ...]     {-f  | - h [-h hostname ...]} -    [-r ds] [-B ] [-S ] [-D] [-v|-V] +    [-r ds] [-B ] [-S ] [--buffer-size ] [-D] [-v|-V] gpcheckperf -d     {-f  | - h [-h< hostname> ...]} @@ -37,6 +37,9 @@ Before using `gpcheckperf`, you must have a trusted host setup between the hosts -B block\_size : Specifies the block size \(in KB or MB\) to use for disk I/O test. The default is 32KB, which is the same as the Greenplum Database page size. The maximum block size is 1 MB. +--buffer-size buffer_size +: Specifies the size of the send buffer in kilobytes. Default size is 32 kilobytes. + -d test\_directory : For the disk I/O test, specifies the file system directory locations to test. You must have write access to the test directory on all hosts involved in the performance test. You can use the `-d` option multiple times to specify multiple test directories \(for example, to test disk I/O of your primary and mirror data directories\). diff --git a/gpdb-doc/markdown/utility_guide/ref/gpfdist.html.md b/gpdb-doc/markdown/utility_guide/ref/gpfdist.html.md index 934b3a1c5e22..baceff9c2661 100644 --- a/gpdb-doc/markdown/utility_guide/ref/gpfdist.html.md +++ b/gpdb-doc/markdown/utility_guide/ref/gpfdist.html.md @@ -62,7 +62,7 @@ Most likely, you will want to run `gpfdist` on your ETL machines rather than the : Sets the number of seconds that `gpfdist` waits before cleaning up the session when there are no `POST` requests from the segments. Default is 300. Allowed values are 300 to 86400. You may increase its value when experiencing heavy network traffic. -m max\_length -: Sets the maximum allowed data row length in bytes. Default is 32768. Should be used when user data includes very wide rows \(or when `line too long` error message occurs\). Should not be used otherwise as it increases resource allocation. Valid range is 32K to 256MB. \(The upper limit is 1MB on Windows systems.\) +: Sets the maximum allowed data row length in bytes. Default is 32768. Should be used when user data includes very wide rows \(or when `line too long` error message occurs\). Should not be used otherwise as it increases resource allocation. Valid range is 32K to 256MB. The upper limit is 1MB on Windows systems. : > **Note** Memory issues might occur if you specify a large maximum row length and run a large number of `gpfdist` concurrent connections. For example, setting this value to the maximum of 256MB with 96 concurrent `gpfdist` processes requires approximately 24GB of memory \(`(96 + 1) x 246MB`\). diff --git a/gpdb-doc/markdown/utility_guide/ref/gpload.html.md b/gpdb-doc/markdown/utility_guide/ref/gpload.html.md index 752182a66aae..5b265eca42e9 100644 --- a/gpdb-doc/markdown/utility_guide/ref/gpload.html.md +++ b/gpdb-doc/markdown/utility_guide/ref/gpload.html.md @@ -231,7 +231,7 @@ GPLOAD : Required when `TRANSFORM` is specified. Specifies the location of the transformation configuration file that is specified in the `TRANSFORM` parameter, above. MAX\_LINE\_LENGTH - : Optional. An integer that specifies the maximum length of a line in the XML transformation data passed to `gpload`. + : Optional. Sets the maximum allowed data row length in bytes. Default is 32768. Should be used when user data includes very wide rows (or when `line too long` error message occurs). Should not be used otherwise as it increases resource allocation. Valid range is 32K to 256MB. The upper limit is 1MB on Windows systems. FORMAT : Optional. Specifies the format of the source data file\(s\) - either plain text \(`TEXT`\) or comma separated values \(`CSV`\) format. Defaults to `TEXT` if not specified. For more information about the format of the source data, see [Loading and Unloading Data](../../admin_guide/load/topics/g-loading-and-unloading-data.html). diff --git a/gpdb-doc/markdown/utility_guide/ref/gprecoverseg.html.md b/gpdb-doc/markdown/utility_guide/ref/gprecoverseg.html.md index 067321228c7c..32c3f574aa5b 100644 --- a/gpdb-doc/markdown/utility_guide/ref/gprecoverseg.html.md +++ b/gpdb-doc/markdown/utility_guide/ref/gprecoverseg.html.md @@ -10,7 +10,7 @@ gprecoverseg [[-p [,...]] | -i ] [-d ] [--no-progress] [-l ] -gprecoverseg -r +gprecoverseg -r [--replay-lag ] gprecoverseg -o               [-p [,...]] @@ -168,6 +168,9 @@ The recovery process marks the segment as up again in the Greenplum Database sys -r \(rebalance segments\) : After a segment recovery, segment instances may not be returned to the preferred role that they were given at system initialization time. This can leave the system in a potentially unbalanced state, as some segment hosts may have more active segments than is optimal for top system performance. This option rebalances primary and mirror segments by returning them to their preferred roles. All segments must be valid and resynchronized before running `gprecoverseg -r`. If there are any in progress queries, they will be cancelled and rolled back. +--replay-lag +: Replay lag(in GBs) allowed on mirror when rebalancing the segments. If the replay_lag (flush_lsn-replay_lsn) is more than the value provided with this option then rebalance will be aborted. + -s \(sequential progress\) : Show `pg_basebackup` or `pg_rewind` progress sequentially instead of in-place. Useful when writing to a file, or if a tty does not support escape sequences. The default is to show progress in-place. diff --git a/gpdb-doc/markdown/utility_guide/utility-programs.html.md b/gpdb-doc/markdown/utility_guide/utility-programs.html.md index ca829dcfadd2..1d77b730bb6f 100644 --- a/gpdb-doc/markdown/utility_guide/utility-programs.html.md +++ b/gpdb-doc/markdown/utility_guide/utility-programs.html.md @@ -41,7 +41,7 @@ Greenplum Database provides the following utility programs. Superscripts identif - [gpmovemirrors](ref/gpmovemirrors.html) - [gpmt](ref/gpmt.html) - [gppkg](ref/gppkg.html) -- [gpcr](https://docs.vmware.com/en/VMware-Greenplum-Cluster-Recovery/1.0/greenplum-cluster-recovery/GUID-ref-gpcr.html) +- [gpdr](https://docs.vmware.com/en/VMware-Greenplum-Disaster-Recovery/1.0/greenplum-disaster-recovery/ref-gpdr.html) - [gprecoverseg](ref/gprecoverseg.html) - [gpreload](ref/gpreload.html) - [gprestore](https://docs.vmware.com/en/VMware-Greenplum-Backup-and-Restore/index.html)1 diff --git a/src/backend/access/appendonly/appendonlywriter.c b/src/backend/access/appendonly/appendonlywriter.c index b59b48616d39..9a9486150524 100644 --- a/src/backend/access/appendonly/appendonlywriter.c +++ b/src/backend/access/appendonly/appendonlywriter.c @@ -140,8 +140,6 @@ InitAppendOnlyWriter(void) errmsg("not enough shared memory for append only writer"))); ereport(DEBUG1, (errmsg("initialized append only writer"))); - - return; } /* @@ -731,7 +729,6 @@ DeregisterSegnoForCompactionDrop(Oid relid, List *compactedSegmentFileList) } release_lightweight_lock(); - return; } void @@ -776,7 +773,6 @@ RegisterSegnoForCompactionDrop(Oid relid, List *compactedSegmentFileList) } release_lightweight_lock(); - return; } /* diff --git a/src/backend/access/bitmap/bitmaputil.c b/src/backend/access/bitmap/bitmaputil.c index a0e73bf424d4..7ee30fdcc25f 100644 --- a/src/backend/access/bitmap/bitmaputil.c +++ b/src/backend/access/bitmap/bitmaputil.c @@ -405,32 +405,36 @@ _bitmap_catchup_to_next_tid(BMBatchWords *words, BMIterateResult *result) /* reset next tid to skip all empty words */ if (words->firstTid > result->nextTid) result->nextTid = words->firstTid; + continue; } - else + + if (fillLength > 0) { - while (fillLength > 0 && words->firstTid < result->nextTid) - { - /* update fill word to reflect expansion */ - words->cwords[result->lastScanWordNo]--; - words->firstTid += BM_HRL_WORD_SIZE; - fillLength--; - } + /* update fill word to reflect expansion */ - /* comsume all the fill words, try to fetch next words */ - if (fillLength == 0) - { - words->nwords--; - continue; - } + uint64 fillToUse = (result->nextTid - words->firstTid) / BM_HRL_WORD_SIZE + 1; + if (fillToUse > fillLength) + fillToUse = fillLength; - /* - * Catch up the next tid to search, but there still fill words. - * Return current state. - */ - if (words->firstTid >= result->nextTid) - return; + words->cwords[result->lastScanWordNo] -= fillToUse; + words->firstTid += fillToUse * BM_HRL_WORD_SIZE; + fillLength -= fillToUse; } + + /* comsume all the fill words, try to fetch next words */ + if (fillLength == 0) + { + words->nwords--; + continue; + } + + /* + * Catch up the next tid to search, but there still fill words. + * Return current state. + */ + if (words->firstTid >= result->nextTid) + return; } else { diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7f9782d34ae7..87c579e007fa 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6537,7 +6537,7 @@ StartupXLOG(void) XLogCtlInsert *Insert; CheckPoint checkPoint; bool wasShutdown; - bool reachedStopPoint = false; + bool reachedRecoveryTarget = false; bool haveBackupLabel = false; XLogRecPtr RecPtr, checkPointLoc, @@ -7364,7 +7364,7 @@ StartupXLOG(void) */ if (recoveryStopsBefore(record)) { - reachedStopPoint = true; /* see below */ + reachedRecoveryTarget = true; break; } @@ -7540,7 +7540,7 @@ StartupXLOG(void) /* Exit loop if we reached inclusive recovery target */ if (recoveryStopsAfter(record)) { - reachedStopPoint = true; + reachedRecoveryTarget = true; break; } @@ -7552,7 +7552,7 @@ StartupXLOG(void) * end of main redo apply loop */ - if (reachedStopPoint) + if (reachedRecoveryTarget) { if (!reachedConsistency) ereport(FATAL, @@ -7608,7 +7608,18 @@ StartupXLOG(void) /* there are no WAL records following the checkpoint */ ereport(LOG, (errmsg("redo is not required"))); + } + + /* + * This check is intentionally after the above log messages that + * indicate how far recovery went. + */ + if (ArchiveRecoveryRequested && + recoveryTarget != RECOVERY_TARGET_UNSET && + !reachedRecoveryTarget) + ereport(FATAL, + (errmsg("recovery ended before configured recovery target was reached"))); } else { diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 6a3c8b20a4a1..736e379d04ec 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -1580,6 +1580,9 @@ RemoveRoleFromObjectACL(Oid roleid, Oid classid, Oid objid) case ForeignDataWrapperRelationId: istmt.objtype = ACL_OBJECT_FDW; break; + case ExtprotocolRelationId: + istmt.objtype = ACL_OBJECT_EXTPROTOCOL; + break; default: elog(ERROR, "unexpected object class %u", classid); break; diff --git a/src/backend/cdb/cdbhash.c b/src/backend/cdb/cdbhash.c index 114ed0cd1e93..dfd1a2afb57b 100644 --- a/src/backend/cdb/cdbhash.c +++ b/src/backend/cdb/cdbhash.c @@ -131,6 +131,7 @@ makeCdbHash(int numsegs, int natts, Oid *hashfuncs) CdbHash * makeCdbHashForRelation(Relation rel) { + CdbHash *h; GpPolicy *policy = rel->rd_cdbpolicy; Oid *hashfuncs; int i; @@ -149,7 +150,20 @@ makeCdbHashForRelation(Relation rel) hashfuncs[i] = cdb_hashproc_in_opfamily(opfamily, typeoid); } - return makeCdbHash(policy->numsegments, policy->nattrs, hashfuncs); + h = makeCdbHash(policy->numsegments, policy->nattrs, hashfuncs); + pfree(hashfuncs); + return h; +} + +/* release all memory of CdbHash */ +void freeCdbHash(CdbHash *hash) +{ + if (hash) + { + if (hash->hashfuncs) + pfree(hash->hashfuncs); + pfree(hash); + } } /* diff --git a/src/backend/cdb/cdbvars.c b/src/backend/cdb/cdbvars.c index aca9dd71db21..5ec110bc510b 100644 --- a/src/backend/cdb/cdbvars.c +++ b/src/backend/cdb/cdbvars.c @@ -104,6 +104,8 @@ int gp_reject_percent_threshold; /* SREH reject % kicks off only bool gp_select_invisible = false; /* debug mode to allow select to * see "invisible" rows */ +bool gp_detect_data_correctness; /* Detect if the current data distribution is correct */ + /* * Configurable timeout for snapshot add: exceptionally busy systems may take * longer than our old hard-coded version -- so here is a tuneable version. @@ -197,6 +199,7 @@ int Gp_interconnect_queue_depth = 4; /* max number of messages * waiting in rx-queue before * we drop. */ int Gp_interconnect_snd_queue_depth = 2; +int Gp_interconnect_cursor_ic_table_size = 128; int Gp_interconnect_timer_period = 5; int Gp_interconnect_timer_checking_period = 20; int Gp_interconnect_default_rtt = 20; @@ -312,6 +315,12 @@ int gp_workfile_limit_per_query = 0; /* Maximum number of workfiles to be created by a query */ int gp_workfile_limit_files_per_query = 0; +/* + * The overhead memory (kB) used by all compressed workfiles of a single + * workfile_set + */ +int gp_workfile_compression_overhead_limit = 0; + /* Gpmon */ bool gp_enable_gpperfmon = false; int gp_gpperfmon_send_interval = 1; diff --git a/src/backend/cdb/endpoint/cdbendpointutils.c b/src/backend/cdb/endpoint/cdbendpointutils.c index c484cc81be67..9ed3bcbbc4c5 100644 --- a/src/backend/cdb/endpoint/cdbendpointutils.c +++ b/src/backend/cdb/endpoint/cdbendpointutils.c @@ -24,6 +24,7 @@ #include "cdbendpoint_private.h" #include "cdb/cdbutil.h" #include "cdb/cdbvars.h" +#include "utils/timeout.h" /* @@ -167,4 +168,51 @@ generate_endpoint_name(char *name, const char *cursorName) len += ENDPOINT_NAME_COMMANDID_LEN; name[len] = '\0'; -} \ No newline at end of file +} + +/* + * Check every parallel retrieve cursor status and cancel QEs if it has error. + * + * Also return true if it has error. + */ +bool +gp_check_parallel_retrieve_cursor_error(void) +{ + List *portals; + ListCell *lc; + bool has_error = false; + EState *estate = NULL; + + portals = GetAllParallelRetrieveCursorPortals(); + + foreach(lc, portals) + { + Portal portal = (Portal)lfirst(lc); + + estate = portal->queryDesc->estate; + + if (estate->dispatcherState->primaryResults->errcode) + has_error = true; + else + has_error = cdbdisp_checkForCancel(estate->dispatcherState); + } + + /* free the list to avoid memory leak */ + list_free(portals); + + return has_error; +} + +/* + * Enable the timeout of parallel retrieve cursor check if not yet + */ +void +enable_parallel_retrieve_cursor_check_timeout(void) +{ + if (Gp_role == GP_ROLE_DISPATCH && + !get_timeout_active(GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT)) + { + enable_timeout_after(GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT, + GP_PARALLEL_RETRIEVE_CURSOR_CHECK_PERIOD_MS); + } +} diff --git a/src/backend/cdb/motion/ic_proxy_backend.c b/src/backend/cdb/motion/ic_proxy_backend.c index 7269372b4545..40b07d2a35f5 100644 --- a/src/backend/cdb/motion/ic_proxy_backend.c +++ b/src/backend/cdb/motion/ic_proxy_backend.c @@ -34,6 +34,7 @@ #include "cdb/cdbvars.h" #include "cdb/ml_ipc.h" #include "executor/execdesc.h" +#include "storage/shmem.h" #include "ic_proxy.h" #include "ic_proxy_backend.h" diff --git a/src/backend/cdb/motion/ic_proxy_backend.h b/src/backend/cdb/motion/ic_proxy_backend.h index 3006837bdb3c..579441572e11 100644 --- a/src/backend/cdb/motion/ic_proxy_backend.h +++ b/src/backend/cdb/motion/ic_proxy_backend.h @@ -13,7 +13,7 @@ #define IC_PROXY_BACKEND_H #include "postgres.h" - +#include "port/atomics.h" #include "cdb/cdbinterconnect.h" #include @@ -38,6 +38,8 @@ typedef struct ICProxyBackendContext ChunkTransportState *transportState; } ICProxyBackendContext; +extern pg_atomic_uint32 *ic_proxy_peer_listener_failed; + extern void ic_proxy_backend_connect(ICProxyBackendContext *context, ChunkTransportStateEntry *pEntry, MotionConn *conn, bool isSender); diff --git a/src/backend/cdb/motion/ic_proxy_bgworker.c b/src/backend/cdb/motion/ic_proxy_bgworker.c index 5427e85bc867..6c5de0ba4b43 100644 --- a/src/backend/cdb/motion/ic_proxy_bgworker.c +++ b/src/backend/cdb/motion/ic_proxy_bgworker.c @@ -16,6 +16,7 @@ #include "postgres.h" #include "storage/ipc.h" +#include "storage/shmem.h" #include "cdb/ic_proxy_bgworker.h" #include "ic_proxy_server.h" @@ -35,3 +36,28 @@ ICProxyMain(Datum main_arg) /* main loop */ proc_exit(ic_proxy_server_main()); } + +/* + * the size of ICProxy SHM structure + */ +Size +ICProxyShmemSize(void) +{ + Size size = 0; + size = add_size(size, sizeof(*ic_proxy_peer_listener_failed)); + return size; +} + +/* + * initialize ICProxy's SHM structure: only one flag variable + */ +void +ICProxyShmemInit(void) +{ + bool found; + ic_proxy_peer_listener_failed = ShmemInitStruct("IC_PROXY Listener Failure Flag", + sizeof(*ic_proxy_peer_listener_failed), + &found); + if (!found) + pg_atomic_init_u32(ic_proxy_peer_listener_failed, 0); +} \ No newline at end of file diff --git a/src/backend/cdb/motion/ic_proxy_main.c b/src/backend/cdb/motion/ic_proxy_main.c index 027041897f88..84b1163ebda4 100644 --- a/src/backend/cdb/motion/ic_proxy_main.c +++ b/src/backend/cdb/motion/ic_proxy_main.c @@ -18,6 +18,8 @@ #include "storage/ipc.h" #include "utils/guc.h" #include "utils/memutils.h" +#include "storage/shmem.h" +#include "port/atomics.h" #include "ic_proxy_server.h" #include "ic_proxy_addr.h" @@ -36,6 +38,8 @@ static uv_timer_t ic_proxy_server_timer; static uv_tcp_t ic_proxy_peer_listener; static bool ic_proxy_peer_listening; +/* flag (in SHM) for incidaing if peer listener bind/listen failed */ +pg_atomic_uint32 *ic_proxy_peer_listener_failed; static uv_pipe_t ic_proxy_client_listener; static bool ic_proxy_client_listening; @@ -144,8 +148,12 @@ ic_proxy_server_peer_listener_init(uv_loop_t *loop) if (ic_proxy_addrs == NIL) return; + Assert(ic_proxy_peer_listener_failed != NULL); if (ic_proxy_peer_listening) + { + Assert(pg_atomic_read_u32(ic_proxy_peer_listener_failed) == 0); return; + } /* Get the addr from the gp_interconnect_proxy_addresses */ addr = ic_proxy_get_my_addr(); @@ -185,6 +193,7 @@ ic_proxy_server_peer_listener_init(uv_loop_t *loop) { elog(WARNING, "ic-proxy: tcp: fail to bind: %s", uv_strerror(ret)); + pg_atomic_exchange_u32(ic_proxy_peer_listener_failed, 1); return; } @@ -194,6 +203,7 @@ ic_proxy_server_peer_listener_init(uv_loop_t *loop) { elog(WARNING, "ic-proxy: tcp: fail to listen: %s", uv_strerror(ret)); + pg_atomic_exchange_u32(ic_proxy_peer_listener_failed, 1); return; } @@ -201,6 +211,7 @@ ic_proxy_server_peer_listener_init(uv_loop_t *loop) elogif(gp_log_interconnect >= GPVARS_VERBOSITY_VERBOSE, LOG, "ic-proxy: tcp: listening on socket %d", fd); + pg_atomic_exchange_u32(ic_proxy_peer_listener_failed, 0); ic_proxy_peer_listening = true; } @@ -431,10 +442,10 @@ int ic_proxy_server_main(void) { char path[MAXPGPATH]; - elogif(gp_log_interconnect >= GPVARS_VERBOSITY_TERSE, LOG, "ic-proxy: server setting up"); + pg_atomic_exchange_u32(ic_proxy_peer_listener_failed, 0); ic_proxy_pkt_cache_init(IC_PROXY_MAX_PKT_SIZE); uv_loop_init(&ic_proxy_server_loop); diff --git a/src/backend/cdb/motion/ic_tcp.c b/src/backend/cdb/motion/ic_tcp.c index 4f05d79516c6..6d4c5a523146 100644 --- a/src/backend/cdb/motion/ic_tcp.c +++ b/src/backend/cdb/motion/ic_tcp.c @@ -1277,6 +1277,16 @@ SetupTCPInterconnect(EState *estate) interconnect_context->doSendStopMessage = doSendStopMessageTCP; #ifdef ENABLE_IC_PROXY + /* check if current Segment's ICProxy listener failed */ + if (pg_atomic_read_u32(ic_proxy_peer_listener_failed) > 0) + { + ereport(ERROR, + (errcode(ERRCODE_GP_INTERCONNECTION_ERROR), + errmsg("Failed to setup ic_proxy interconnect"), + errdetail("The ic_proxy process failed to bind or listen."), + errhint("Please check the server log for related WARNING messages."))); + } + ic_proxy_backend_init_context(interconnect_context); #endif /* ENABLE_IC_PROXY */ diff --git a/src/backend/cdb/motion/ic_udpifc.c b/src/backend/cdb/motion/ic_udpifc.c index f007bb8c94c7..9cce2e356a6b 100644 --- a/src/backend/cdb/motion/ic_udpifc.c +++ b/src/backend/cdb/motion/ic_udpifc.c @@ -186,17 +186,6 @@ struct ConnHashTable (a)->srcPid == (b)->srcPid && \ (a)->dstPid == (b)->dstPid && (a)->icId == (b)->icId)) - -/* - * Cursor IC table definition. - * - * For cursor case, there may be several concurrent interconnect - * instances on QD. The table is used to track the status of the - * instances, which is quite useful for "ACK the past and NAK the future" paradigm. - * - */ -#define CURSOR_IC_TABLE_SIZE (128) - /* * CursorICHistoryEntry * @@ -229,8 +218,9 @@ struct CursorICHistoryEntry typedef struct CursorICHistoryTable CursorICHistoryTable; struct CursorICHistoryTable { + uint32 size; uint32 count; - CursorICHistoryEntry *table[CURSOR_IC_TABLE_SIZE]; + CursorICHistoryEntry **table; }; /* @@ -280,6 +270,13 @@ struct ReceiveControlInfo /* Cursor history table. */ CursorICHistoryTable cursorHistoryTable; + + /* + * Last distributed transaction id when SetupUDPInterconnect is called. + * Coupled with cursorHistoryTable, it is used to handle multiple + * concurrent cursor cases. + */ + DistributedTransactionId lastDXatId; }; /* @@ -914,8 +911,13 @@ dumpTransProtoStats() static void initCursorICHistoryTable(CursorICHistoryTable *t) { + MemoryContext old; t->count = 0; - memset(t->table, 0, sizeof(t->table)); + t->size = Gp_interconnect_cursor_ic_table_size; + + old = MemoryContextSwitchTo(ic_control_info.memContext); + t->table = palloc0(sizeof(struct CursorICHistoryEntry *) * t->size); + MemoryContextSwitchTo(old); } /* @@ -927,7 +929,7 @@ addCursorIcEntry(CursorICHistoryTable *t, uint32 icId, uint32 cid) { MemoryContext old; CursorICHistoryEntry *p; - uint32 index = icId % CURSOR_IC_TABLE_SIZE; + uint32 index = icId % t->size; old = MemoryContextSwitchTo(ic_control_info.memContext); p = palloc0(sizeof(struct CursorICHistoryEntry)); @@ -957,7 +959,7 @@ static void updateCursorIcEntry(CursorICHistoryTable *t, uint32 icId, uint8 status) { struct CursorICHistoryEntry *p; - uint8 index = icId % CURSOR_IC_TABLE_SIZE; + uint8 index = icId % t->size; for (p = t->table[index]; p; p = p->next) { @@ -978,7 +980,7 @@ static CursorICHistoryEntry * getCursorIcEntry(CursorICHistoryTable *t, uint32 icId) { struct CursorICHistoryEntry *p; - uint8 index = icId % CURSOR_IC_TABLE_SIZE; + uint8 index = icId % t->size; for (p = t->table[index]; p; p = p->next) { @@ -1000,7 +1002,7 @@ pruneCursorIcEntry(CursorICHistoryTable *t, uint32 icId) { uint8 index; - for (index = 0; index < CURSOR_IC_TABLE_SIZE; index++) + for (index = 0; index < t->size; index++) { struct CursorICHistoryEntry *p, *q; @@ -1049,7 +1051,7 @@ purgeCursorIcEntry(CursorICHistoryTable *t) { uint8 index; - for (index = 0; index < CURSOR_IC_TABLE_SIZE; index++) + for (index = 0; index < t->size; index++) { struct CursorICHistoryEntry *trash; @@ -1446,6 +1448,7 @@ InitMotionUDPIFC(int *listenerSocketFd, uint16 *listenerPort) /* allocate a buffer for sending disorder messages */ rx_control_info.disorderBuffer = palloc0(MIN_PACKET_SIZE); + rx_control_info.lastDXatId = InvalidTransactionId; rx_control_info.lastTornIcId = 0; initCursorICHistoryTable(&rx_control_info.cursorHistoryTable); @@ -3077,34 +3080,61 @@ SetupUDPIFCInterconnect_Internal(SliceTable *sliceTable) set_test_mode(); #endif + /* Prune the QD's history table if it is too large */ if (Gp_role == GP_ROLE_DISPATCH) { - /* - * Prune the history table if it is too large - * - * We only keep history of constant length so that - * - The history table takes only constant amount of memory. - * - It is long enough so that it is almost impossible to receive - * packets from an IC instance that is older than the first one - * in the history. - */ - if (rx_control_info.cursorHistoryTable.count > (2 * CURSOR_IC_TABLE_SIZE)) - { - uint32 prune_id = sliceTable->ic_instance_id - CURSOR_IC_TABLE_SIZE; + CursorICHistoryTable *ich_table = &rx_control_info.cursorHistoryTable; + DistributedTransactionId distTransId = getDistributedTransactionId(); - /* - * Only prune if we didn't underflow -- also we want the prune id - * to be newer than the limit (hysteresis) + if (ich_table->count > (2 * ich_table->size)) + { + /* + * distTransId != lastDXatId + * Means the last transaction is finished, it's ok to make a prune. */ - if (prune_id < sliceTable->ic_instance_id) + if (distTransId != rx_control_info.lastDXatId) { if (gp_log_interconnect >= GPVARS_VERBOSITY_DEBUG) - elog(DEBUG1, "prune cursor history table (count %d), icid %d", rx_control_info.cursorHistoryTable.count, sliceTable->ic_instance_id); - pruneCursorIcEntry(&rx_control_info.cursorHistoryTable, prune_id); + elog(DEBUG1, "prune cursor history table (count %d), icid %d, prune_id %d", + ich_table->count, sliceTable->ic_instance_id, sliceTable->ic_instance_id); + pruneCursorIcEntry(ich_table, sliceTable->ic_instance_id); + } + } + /* + * distTransId == lastDXatId and they are not InvalidTransactionId(0) + * Means current Non Read-Only transaction isn't finished, MUST not prune. + */ + else if (rx_control_info.lastDXatId != InvalidTransactionId) + { + ; + } + /* + * distTransId == lastDXatId and they are InvalidTransactionId(0) + * Means both are Read-Only transactions or the same transaction. + */ + else + { + if (ich_table->count > (2 * ich_table->size)) + { + uint32 prune_id = sliceTable->ic_instance_id - ich_table->size; + + /* + * Only prune if we didn't underflow -- also we want the prune id + * to be newer than the limit (hysteresis) + */ + if (prune_id < sliceTable->ic_instance_id) + { + if (gp_log_interconnect >= GPVARS_VERBOSITY_DEBUG) + elog(DEBUG1, "prune cursor history table (count %d), icid %d, prune_id %d", + ich_table->count, sliceTable->ic_instance_id, prune_id); + pruneCursorIcEntry(ich_table, prune_id); + } } } - addCursorIcEntry(&rx_control_info.cursorHistoryTable, sliceTable->ic_instance_id, gp_command_count); + addCursorIcEntry(ich_table, sliceTable->ic_instance_id, gp_command_count); + /* save the latest transaction id. */ + rx_control_info.lastDXatId = distTransId; } /* now we'll do some setup for each of our Receiving Motion Nodes. */ diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index 619dcaa7fd7a..355240f8745f 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -174,6 +174,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Oid save_userid; int save_sec_context; int save_nestlevel; + bool createAoBlockDirectory; RefreshClause *refreshClause; /* MATERIALIZED_VIEW_FIXME: Refresh MatView is not MPP-fied. */ @@ -332,13 +333,16 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, else tableSpace = matviewRel->rd_rel->reltablespace; + /* If an AO temp table has index, we need to create it. */ + createAoBlockDirectory = matviewRel->rd_rel->relhasindex; + /* * Create the transient table that will receive the regenerated data. Lock * it against access by any other process until commit (by which time it * will be gone). */ OIDNewHeap = make_new_heap(matviewOid, tableSpace, concurrent, - ExclusiveLock, false, true); + ExclusiveLock, createAoBlockDirectory, true); LockRelationOid(OIDNewHeap, AccessExclusiveLock); dest = CreateTransientRelDestReceiver(OIDNewHeap, matviewOid, concurrent, stmt->skipData); @@ -496,6 +500,7 @@ transientrel_init(QueryDesc *queryDesc) Oid OIDNewHeap; bool concurrent; LOCKMODE lockmode; + bool createAoBlockDirectory; RefreshClause *refreshClause; refreshClause = queryDesc->plannedstmt->refreshClause; @@ -526,13 +531,17 @@ transientrel_init(QueryDesc *queryDesc) { tableSpace = matviewRel->rd_rel->reltablespace; } + + /* If an AO temp table has index, we need to create it. */ + createAoBlockDirectory = matviewRel->rd_rel->relhasindex; + /* * Create the transient table that will receive the regenerated data. Lock * it against access by any other process until commit (by which time it * will be gone). */ OIDNewHeap = make_new_heap(matviewOid, tableSpace, concurrent, - ExclusiveLock, false, false); + ExclusiveLock, createAoBlockDirectory, false); LockRelationOid(OIDNewHeap, AccessExclusiveLock); queryDesc->dest = CreateTransientRelDestReceiver(OIDNewHeap, matviewOid, concurrent, diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c index 565a803e1475..1dc8ea246e5f 100644 --- a/src/backend/commands/portalcmds.c +++ b/src/backend/commands/portalcmds.c @@ -191,8 +191,13 @@ PerformCursorOpen(PlannedStmt *stmt, ParamListInfo params, Assert(portal->strategy == PORTAL_ONE_SELECT); if (PortalIsParallelRetrieveCursor(portal)) + { WaitEndpointsReady(portal->queryDesc->estate); + /* Enable the check error timer if the alarm is not active */ + enable_parallel_retrieve_cursor_check_timeout(); + } + /* * We're done; the query won't actually be run until PerformPortalFetch is * called. diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 3dcdbb6f9578..39b30393d4f3 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -4591,6 +4591,7 @@ targetid_get_partition(Oid targetid, EState *estate, bool openIndices) { int natts; Relation resultRelation; + Oid parentRelid = estate->es_result_partitions->part->parrelid; natts = parentInfo->ri_RelationDesc->rd_att->natts; /* in base relation */ @@ -4603,10 +4604,27 @@ targetid_get_partition(Oid targetid, EState *estate, bool openIndices) if (openIndices) ExecOpenIndices(childInfo); - map_part_attrs(parentInfo->ri_RelationDesc, - childInfo->ri_RelationDesc, - &(childInfo->ri_partInsertMap), - TRUE); /* throw on error, so result not needed */ + /* + * es_result_relations does not always represent the parent relation. + * E.g. planner's UPDATE command on parent partition leads to multiple + * subplans and result relations due to preceding inheritance planning. + * In this case es_result_relations points to one of the partitions, not + * to the parent. Thus, the descriptor mapping should be performed only + * for the case if es_result_relations really corresponds to the parent. + * Otherwise, there is a chance to reconstruct already valid tuple and + * get the wrong results (e.g. target partition relation descriptor is + * different from parentInfo's, but it's UPDATE (legacy planner) and + * parentInfo represents another partition, which is not the true + * parent). Moreover, if we are initially modify a leaf partition, + * i.e we called a DML command straight on child partition, or it's + * inheritance plan execution, the tuple descriptor already matches + * the partition's, and the extra mapping is unnecessary. + */ + if (RelationGetRelid(parentInfo->ri_RelationDesc) == parentRelid) + map_part_attrs(parentInfo->ri_RelationDesc, + childInfo->ri_RelationDesc, + &(childInfo->ri_partInsertMap), + TRUE); /* throw on error, so result not needed */ } return childInfo; } @@ -4631,22 +4649,60 @@ values_get_partition(Datum *values, bool *nulls, TupleDesc tupdesc, ResultRelInfo * slot_get_partition(TupleTableSlot *slot, EState *estate) { - ResultRelInfo *resultRelInfo; - AttrNumber max_attr; + ResultRelInfo *resultRelInfo = estate->es_result_relation_info; + TupleDesc tupdesc; Datum *values; bool *nulls; Assert(PointerIsValid(estate->es_result_partitions)); - max_attr = estate->es_partition_state->max_partition_attr; + /* + * If we previously found out that we need to map attribute numbers + * (in case if child part has physically-different attribute numbers from + * parent's), we must extract slot values according to that mapping. + */ + if (resultRelInfo->ri_PartCheckMap != NULL) + { + Datum *slot_values; + bool *slot_nulls; + Relation parentRel = resultRelInfo->ri_PartitionParent; + AttrMap *map; + + Assert(parentRel != NULL); + tupdesc = RelationGetDescr(parentRel); + + slot_getallattrs(slot); + slot_values = slot_get_values(slot); + slot_nulls = slot_get_isnull(slot); + values = palloc(tupdesc->natts * sizeof(Datum)); + nulls = palloc0(tupdesc->natts * sizeof(bool)); + + /* Now we have values/nulls in parent's view. */ + map = resultRelInfo->ri_PartCheckMap; + reconstructTupleValues(map, slot_values, slot_nulls, slot->tts_tupleDescriptor->natts, + values, nulls, tupdesc->natts); + } + else + { + AttrNumber max_attr = estate->es_partition_state->max_partition_attr; - slot_getsomeattrs(slot, max_attr); - values = slot_get_values(slot); - nulls = slot_get_isnull(slot); + slot_getsomeattrs(slot, max_attr); + /* values/nulls pointing to partslot's array. */ + values = slot_get_values(slot); + nulls = slot_get_isnull(slot); + tupdesc = slot->tts_tupleDescriptor; + } - resultRelInfo = get_part(estate, values, nulls, slot->tts_tupleDescriptor, + resultRelInfo = get_part(estate, values, nulls, tupdesc, true); + /* Free up if we allocated mapped attributes. */ + if (values != slot_get_values(slot)) + pfree(values); + + if (nulls != slot_get_isnull(slot)) + pfree(nulls); + return resultRelInfo; } @@ -5024,9 +5080,19 @@ FillSliceGangInfo(Slice *slice, int numsegments) slice->segments = list_make1_int(-1); break; case GANGTYPE_SINGLETON_READER: - slice->gangSize = 1; - slice->segments = list_make1_int(gp_session_id % numsegments); - break; + { + int gp_segment_count = getgpsegmentCount(); + slice->gangSize = 1; + /* + * numsegments might be larger than the number of gpdb actual segments for foreign table. + * For example, for gp2gp, when remote gpdb cluster has more segments than local gpdb, + * numsegments will be larger than getgpsegmentCount(). + * + * So we need to use the minimum of numsegments and getgpsegmentCount() here. + */ + slice->segments = list_make1_int(gp_session_id % Min(numsegments, gp_segment_count)); + break; + } default: elog(ERROR, "unexpected gang type"); } diff --git a/src/backend/executor/nodeDML.c b/src/backend/executor/nodeDML.c index 33b2edf5387f..d5dc775273ad 100644 --- a/src/backend/executor/nodeDML.c +++ b/src/backend/executor/nodeDML.c @@ -85,6 +85,38 @@ ExecDML(DMLState *node) /* remove 'junk' columns from tuple */ node->cleanedUpSlot = ExecFilterJunk(node->junkfilter, projectedSlot); + /* + * If we are modifying a leaf partition we have to ensure that partition + * selection operation will consider leaf partition's attributes as + * coherent with root partition's attribute numbers, because partition + * selection is performed using root's attribute numbers (all partition + * rules are based on the parent relation's tuple descriptor). In case + * when child partition has different attribute numbers from root's due to + * dropped columns, the partition selection may go wrong without extra + * validation. + */ + if (node->ps.state->es_result_partitions) + { + ResultRelInfo *relInfo = node->ps.state->es_result_relations; + + /* + * The DML is done on a leaf partition. In order to reuse the map, + * it will be allocated at es_result_relations. + */ + if (RelationGetRelid(relInfo->ri_RelationDesc) != + node->ps.state->es_result_partitions->part->parrelid) + makePartitionCheckMap(node->ps.state, relInfo); + + /* + * DML node always performs partition selection, and if we want to + * reuse the map built in makePartitionCheckMap, we are allowed to + * reassign es_result_relation_info, because ExecInsert, ExecDelete + * changes it with target partition anyway. Moreover, without + * inheritance plan (ORCA never builds such plans) the + * es_result_relations will contain the only relation. + */ + node->ps.state->es_result_relation_info = relInfo; + } /* GPDB_91_MERGE_FIXME: * This kind of node is used by ORCA only. If in the future ORCA still uses * DML node, canSetTag should be saved in DML plan node and init-ed by diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index ee2e32f9fa60..325af303ad0c 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -62,6 +62,8 @@ ExecHashTableExplainBatches(HashJoinTable hashtable, int ibatch_end, const char *title); +static inline void ResetWorkFileSetStatsInfo(HashJoinTable hashtable); + /* ---------------------------------------------------------------- * ExecHash * @@ -340,6 +342,8 @@ ExecHashTableCreate(HashState *hashState, HashJoinState *hjstate, List *hashOper hashtable->hjstate = hjstate; hashtable->first_pass = true; + ResetWorkFileSetStatsInfo(hashtable); + /* * Create temporary memory contexts in which to keep the hashtable working * storage. See notes in executor/hashjoin.h. @@ -1502,6 +1506,16 @@ ExecHashTableExplainEnd(PlanState *planstate, struct StringInfoData *buf) hashtable->nbatch_outstart, hashtable->nbatch, "Secondary Overflow"); + + appendStringInfo(buf, + "Work file set: %u files (%u compressed), " + "avg file size %lu, " + "compression buffer size %lu bytes \n", + hashtable->workset_num_files, + hashtable->workset_num_files_compressed, + hashtable->workset_avg_file_size, + hashtable->workset_compression_buf_total); + ResetWorkFileSetStatsInfo(hashtable); } /* Report hash chain statistics. */ @@ -2099,3 +2113,11 @@ ExecHashRemoveNextSkewBucket(HashState *hashState, HashJoinTable hashtable) hashtable->spaceUsedSkew = 0; } } + +static inline void ResetWorkFileSetStatsInfo(HashJoinTable hashtable) +{ + hashtable->workset_num_files = 0; + hashtable->workset_num_files_compressed = 0; + hashtable->workset_avg_file_size = 0; + hashtable->workset_compression_buf_total = 0; +} diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index d65daee42ba0..bb0451d2805d 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -63,6 +63,8 @@ static void SpillCurrentBatch(HashJoinState *node); static bool ExecHashJoinReloadHashTable(HashJoinState *hjstate); static void ExecEagerFreeHashJoin(HashJoinState *node); +static inline void SaveWorkFileSetStatsInfo(HashJoinTable hashtable); + /* ---------------------------------------------------------------- * ExecHashJoin * @@ -287,6 +289,16 @@ ExecHashJoin_guts(HashJoinState *node) } else node->hj_JoinState = HJ_NEED_NEW_BATCH; + + /* + * When all the tuples of outer table have been read, + * and we are ready to process the first batch, it means + * a good time to collect statistic info of all temp + * files. + */ + if (hashtable->curbatch == 0) + SaveWorkFileSetStatsInfo(hashtable); + continue; } @@ -1480,4 +1492,16 @@ ExecHashJoinReloadHashTable(HashJoinState *hjstate) return true; } +static inline void SaveWorkFileSetStatsInfo(HashJoinTable hashtable) +{ + workfile_set *work_set = hashtable->work_set; + if (work_set) + { + hashtable->workset_num_files = work_set->num_files; + hashtable->workset_num_files_compressed = work_set->num_files_compressed; + hashtable->workset_avg_file_size = work_set->total_bytes / work_set->num_files; + hashtable->workset_compression_buf_total = work_set->compression_buf_total; + } +} + /* EOF */ diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 3cba0ee49db4..db983735af40 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -314,6 +314,55 @@ ExecInsert(TupleTableSlot *parentslot, rel_is_aorows = RelationIsAoRows(resultRelationDesc); rel_is_external = RelationIsExternal(resultRelationDesc); + /* + * if we set the GUC gp_detect_data_correctness to true, we just verify the data belongs + * to current partition and segment, we'll not insert the data really, so just return NULL. + * + * Above has already checked the partition correctness, so we just need check distribution + * correctness. + */ + if (gp_detect_data_correctness) + { + /* Initialize hash function and structure */ + CdbHash *hash; + GpPolicy *policy = resultRelationDesc->rd_cdbpolicy; + MemTuple memTuple = ExecFetchSlotMemTuple(parentslot); + + /* Skip randomly and replicated distributed relation */ + if (!GpPolicyIsHashPartitioned(policy)) + return NULL; + + hash = makeCdbHashForRelation(resultRelationDesc); + + cdbhashinit(hash); + + /* Add every attribute in the distribution policy to the hash */ + for (int i = 0; i < policy->nattrs; i++) + { + int attnum = policy->attrs[i]; + bool isNull; + Datum attr; + + attr = memtuple_getattr(memTuple, parentslot->tts_mt_bind, + attnum, &isNull); + + cdbhash(hash, i + 1, attr, isNull); + } + + /* End check if one tuple is in the wrong segment */ + if (cdbhashreduce(hash) != GpIdentity.segindex) + { + ereport(ERROR, + (errcode(ERRCODE_CHECK_VIOLATION), + errmsg("trying to insert row into wrong segment"))); + } + + freeCdbHash(hash); + + /* Do nothing */ + return NULL; + } + /* * Prepare the right kind of "insert desc". */ @@ -1132,7 +1181,6 @@ checkPartitionUpdate(EState *estate, TupleTableSlot *partslot, Datum *values = NULL; bool *nulls = NULL; TupleDesc tupdesc = NULL; - Oid parentRelid; Oid targetid; Assert(estate->es_partition_state != NULL && @@ -1144,81 +1192,12 @@ checkPartitionUpdate(EState *estate, TupleTableSlot *partslot, Assert(PointerIsValid(estate->es_result_partitions)); /* - * As opposed to INSERT, resultRelation here is the same child part - * as scan origin. However, the partition selection is done with the - * parent partition's attribute numbers, so if this result (child) part - * has physically-different attribute numbers due to dropped columns, - * we should map the child attribute numbers to the parent's attribute - * numbers to perform the partition selection. - * EState doesn't have the parent relation information at the moment, - * so we have to do a hard job here by opening it and compare the - * tuple descriptors. If we find we need to map attribute numbers, - * max_partition_attr could also be bogus for this child part, - * so we end up materializing the whole columns using slot_getallattrs(). - * The purpose of this code is just to prevent the tuple from - * incorrectly staying in default partition that has no constraint - * (parts with constraint will throw an error if the tuple is changing - * partition keys to out of part value anyway.) It's a bit overkill - * to do this complicated logic just for this purpose, which is necessary - * with our current partitioning design, but I hope some day we can - * change this so that we disallow phyisically-different tuple descriptor - * across partition. + * If we find we need to map attribute numbers (in case if child part has + * physically-different attribute numbers from parent's, the mapping is + * performed inside the makePartitionCheckMap function) + * max_partition_attr could also be bogus for this child part, so we end + * up materializing the whole columns using slot_getallattrs(). */ - parentRelid = estate->es_result_partitions->part->parrelid; - - /* - * I don't believe this is the case currently, but we check the parent relid - * in case the updating partition has changed since the last time we opened it. - */ - if (resultRelInfo->ri_PartitionParent && - parentRelid != RelationGetRelid(resultRelInfo->ri_PartitionParent)) - { - resultRelInfo->ri_PartCheckTupDescMatch = 0; - if (resultRelInfo->ri_PartCheckMap != NULL) - pfree(resultRelInfo->ri_PartCheckMap); - if (resultRelInfo->ri_PartitionParent) - relation_close(resultRelInfo->ri_PartitionParent, AccessShareLock); - } - - /* - * Check this at the first pass only to avoid repeated catalog access. - */ - if (resultRelInfo->ri_PartCheckTupDescMatch == 0 && - parentRelid != RelationGetRelid(resultRelInfo->ri_RelationDesc)) - { - Relation parentRel; - TupleDesc resultTupdesc, parentTupdesc; - - /* - * We are on a child part, let's see the tuple descriptor looks like - * the parent's one. Probably this won't cause deadlock because - * DML should have opened the parent table with appropriate lock. - */ - parentRel = relation_open(parentRelid, AccessShareLock); - resultTupdesc = RelationGetDescr(resultRelationDesc); - parentTupdesc = RelationGetDescr(parentRel); - if (!equalTupleDescs(resultTupdesc, parentTupdesc, false)) - { - AttrMap *map; - MemoryContext oldcontext; - - /* Tuple looks different. Construct attribute mapping. */ - oldcontext = MemoryContextSwitchTo(estate->es_query_cxt); - map_part_attrs(resultRelationDesc, parentRel, &map, true); - MemoryContextSwitchTo(oldcontext); - - /* And save it for later use. */ - resultRelInfo->ri_PartCheckMap = map; - - resultRelInfo->ri_PartCheckTupDescMatch = -1; - } - else - resultRelInfo->ri_PartCheckTupDescMatch = 1; - - resultRelInfo->ri_PartitionParent = parentRel; - /* parentRel will be closed as part of ResultRelInfo cleanup */ - } - if (resultRelInfo->ri_PartCheckMap != NULL) { Datum *parent_values; @@ -1927,6 +1906,19 @@ ExecModifyTable(ModifyTableState *node) slot = ExecFilterJunk(junkfilter, slot); } + /* + * We have to ensure that partition selection in INSERT or UPDATE will + * consider leaf partition's attributes as coherent with root + * partition's attribute numbers, because partition selection is + * performed using root's attribute numbers (all partition rules are + * based on the parent relation's tuple descriptor). In case when + * child partition has different attribute numbers from parent's + * due to dropped columns, the partition selection may go wrong without + * extra validation. + */ + if (operation != CMD_DELETE && estate->es_result_partitions) + makePartitionCheckMap(estate, estate->es_result_relation_info); + switch (operation) { case CMD_INSERT: @@ -2506,3 +2498,87 @@ ExecSquelchModifyTable(ModifyTableState *node) break; } } + +/* + * Build a attribute mapping between child partition and the root partition in + * case if child partition has physically-different attribute numbers from + * root's due to dropped columns. + */ +void +makePartitionCheckMap(EState *estate, ResultRelInfo *resultRelInfo) +{ + Relation resultRelationDesc = resultRelInfo->ri_RelationDesc; + Oid parentRelid; + + Assert(PointerIsValid(estate->es_result_partitions)); + + /* + * The partition selection operation is done with the parent partition's + * attribute numbers, so if child partition has physically-different + * attribute numbers due to dropped columns, we should map the child + * attribute numbers to the parent's attribute numbers to perform the + * partition selection. EState may not have the parent relation + * information at the moment, so we have to do a hard job here by opening + * it and compare the tuple descriptors. The purpose of this code is to + * prevent the tuple from being incorrectly interpreted during partition + * selection, that can be performed in ExecInsert, ExecDelete and + * checkPartitionUpdate functions when we work with the leaf partition as + * result relation. + */ + parentRelid = estate->es_result_partitions->part->parrelid; + + /* + * I don't believe this is the case currently, but we check the parent + * relid in case the updating partition has changed since the last time we + * opened it. + */ + if (resultRelInfo->ri_PartitionParent && + parentRelid != RelationGetRelid(resultRelInfo->ri_PartitionParent)) + { + resultRelInfo->ri_PartCheckTupDescMatch = 0; + if (resultRelInfo->ri_PartCheckMap != NULL) + pfree(resultRelInfo->ri_PartCheckMap); + if (resultRelInfo->ri_PartitionParent) + relation_close(resultRelInfo->ri_PartitionParent, AccessShareLock); + } + + /* + * Check this at the first pass only to avoid repeated catalog access. + */ + if (resultRelInfo->ri_PartCheckTupDescMatch == 0 && + parentRelid != RelationGetRelid(resultRelInfo->ri_RelationDesc)) + { + Relation parentRel; + TupleDesc resultTupdesc, + parentTupdesc; + + /* + * We are on a child part, let's see the tuple descriptor looks like + * the parent's one. Probably this won't cause deadlock because DML + * should have opened the parent table with appropriate lock. + */ + parentRel = relation_open(parentRelid, AccessShareLock); + resultTupdesc = RelationGetDescr(resultRelationDesc); + parentTupdesc = RelationGetDescr(parentRel); + if (!equalTupleDescs(resultTupdesc, parentTupdesc, false)) + { + AttrMap *map; + MemoryContext oldcontext; + + /* Tuple looks different. Construct attribute mapping. */ + oldcontext = MemoryContextSwitchTo(estate->es_query_cxt); + map_part_attrs(resultRelationDesc, parentRel, &map, true); + MemoryContextSwitchTo(oldcontext); + + /* And save it for later use. */ + resultRelInfo->ri_PartCheckMap = map; + + resultRelInfo->ri_PartCheckTupDescMatch = -1; + } + else + resultRelInfo->ri_PartCheckTupDescMatch = 1; + + resultRelInfo->ri_PartitionParent = parentRel; + /* parentRel will be closed as part of ResultRelInfo cleanup */ + } +} diff --git a/src/backend/gpopt/gpdbwrappers.cpp b/src/backend/gpopt/gpdbwrappers.cpp index 8f3ed749997b..f09c652aa8a5 100644 --- a/src/backend/gpopt/gpdbwrappers.cpp +++ b/src/backend/gpopt/gpdbwrappers.cpp @@ -31,6 +31,7 @@ extern "C" { #include "catalog/pg_collation.h" #include "utils/memutils.h" +#include "utils/snapmgr.h" } #define GP_WRAP_START \ sigjmp_buf local_sigjmp_buf; \ @@ -2509,6 +2510,13 @@ static bool mdcache_invalidation_counter_registered = false; static int64 mdcache_invalidation_counter = 0; static int64 last_mdcache_invalidation_counter = 0; +// If we have cached a relation without an index, because that index cannot +// be used in the current snapshot (for more info see +// src/backend/access/heap/README.HOT), we save TransactionXmin. If +// TransactionXmin changes later, the cache will be reset and the relation will +// be reloaded with that index. +static TransactionId mdcache_transaction_xmin = InvalidTransactionId; + static void mdsyscache_invalidation_counter_callback(Datum arg, int cacheid, uint32 hashvalue) @@ -2590,7 +2598,8 @@ register_mdcache_invalidation_callbacks(void) (Datum) 0); } -// Has there been any catalog changes since last call? +// We reset the cache in case of a catalog change or if TransactionXmin changed +// from that we save in mdcache_transaction_xmin. bool gpdb::MDCacheNeedsReset(void) { @@ -2602,7 +2611,11 @@ gpdb::MDCacheNeedsReset(void) mdcache_invalidation_counter_registered = true; } if (last_mdcache_invalidation_counter == mdcache_invalidation_counter) - return false; + { + return TransactionIdIsValid(mdcache_transaction_xmin) && + !TransactionIdEquals(TransactionXmin, + mdcache_transaction_xmin); + } else { last_mdcache_invalidation_counter = mdcache_invalidation_counter; @@ -2614,6 +2627,42 @@ gpdb::MDCacheNeedsReset(void) return true; } +bool +gpdb::MDCacheSetTransientState(Relation index_rel) +{ + GP_WRAP_START; + { + bool result = + index_rel->rd_index->indcheckxmin && + !TransactionIdPrecedes( + HeapTupleHeaderGetXmin(index_rel->rd_indextuple->t_data), + TransactionXmin); + if (result) + mdcache_transaction_xmin = TransactionXmin; + return result; + } + GP_WRAP_END; + // ignore index if we can't check it visibility for some reason + return true; +} + +void +gpdb::MDCacheResetTransientState(void) +{ + mdcache_transaction_xmin = InvalidTransactionId; +} + +bool +gpdb::MDCacheInTransientState(void) +{ + GP_WRAP_START; + { + return TransactionIdIsValid(mdcache_transaction_xmin); + } + GP_WRAP_END; + return false; +} + // returns true if a query cancel is requested in GPDB bool gpdb::IsAbortRequested(void) diff --git a/src/backend/gpopt/translate/CDXLTranslateContext.cpp b/src/backend/gpopt/translate/CDXLTranslateContext.cpp index 7c870054c343..6ccc12d6abcb 100644 --- a/src/backend/gpopt/translate/CDXLTranslateContext.cpp +++ b/src/backend/gpopt/translate/CDXLTranslateContext.cpp @@ -27,8 +27,9 @@ using namespace gpos; // //--------------------------------------------------------------------------- CDXLTranslateContext::CDXLTranslateContext(CMemoryPool *mp, - BOOL is_child_agg_node) - : m_mp(mp), m_is_child_agg_node(is_child_agg_node) + BOOL is_child_agg_node, + const Query *query) + : m_mp(mp), m_is_child_agg_node(is_child_agg_node), m_query(query) { // initialize hash table m_colid_to_target_entry_map = GPOS_NEW(m_mp) ULongToTargetEntryMap(m_mp); @@ -46,7 +47,7 @@ CDXLTranslateContext::CDXLTranslateContext(CMemoryPool *mp, CDXLTranslateContext::CDXLTranslateContext(CMemoryPool *mp, BOOL is_child_agg_node, ULongToColParamMap *original) - : m_mp(mp), m_is_child_agg_node(is_child_agg_node) + : m_mp(mp), m_is_child_agg_node(is_child_agg_node), m_query(NULL) { m_colid_to_target_entry_map = GPOS_NEW(m_mp) ULongToTargetEntryMap(m_mp); m_colid_to_paramid_map = GPOS_NEW(m_mp) ULongToColParamMap(m_mp); diff --git a/src/backend/gpopt/translate/CQueryMutators.cpp b/src/backend/gpopt/translate/CQueryMutators.cpp index 37cd1dc34411..689ab6440712 100644 --- a/src/backend/gpopt/translate/CQueryMutators.cpp +++ b/src/backend/gpopt/translate/CQueryMutators.cpp @@ -689,14 +689,12 @@ CQueryMutators::RunExtractAggregatesMutator(Node *node, { if (var->varlevelsup >= context->m_agg_levels_up) { - // We previously started to mutate the Aggref, that references - // the top level query. This Aggref is going to be moved to the - // derived query (see comments in Aggref if-case above). - // Therefore, if we are mutating Vars inside the Aggref, and - // these Vars reference the top level query (varlevelsup = m_current_query_level) - // as well, we must change their varlevelsup field in order to preserve - // correct reference level. i.e these Vars are pulled up as the part of - // the Aggref by the m_agg_levels_up. + // If Var references the top level query (varlevelsup = m_current_query_level) + // inside an Aggref that also references top level query, the Aggref is moved + // to the derived query (see comments in Aggref if-case above). + // And, therefore, if we are mutating such Vars inside the Aggref, we must + // change their varlevelsup field in order to preserve correct reference level. + // i.e these Vars are pulled up as the part of the Aggref by the m_agg_levels_up. // e.g: // select (select max((select foo.a))) from foo; // is transformed into @@ -705,7 +703,17 @@ CQueryMutators::RunExtractAggregatesMutator(Node *node, // Here the foo.a inside max referenced top level RTE foo at // varlevelsup = 2 inside the Aggref at agglevelsup 1. Then the // Aggref is brought up to the top-query-level of fnew and foo.a - // inside Aggref is decreased by original Aggref's level. + // inside Aggref is bumped up by original Aggref's level. + // We may visualize that logic with the following diagram: + // Query <------┐ <--------------------┐ + // | | + // | m_agg_levels_up = 1 | + // | | + // Aggref --┘ | varlevelsup = 2 + // | + // | + // | + // Var -------------------------┘ var->varlevelsup -= context->m_agg_levels_up; return (Node *) var; } @@ -817,6 +825,8 @@ CQueryMutators::RunExtractAggregatesMutator(Node *node, if (IsA(node, Query)) { + // Mutate Query tree and ignore rtable subqueries in order to modify + // m_current_query_level properly when mutating them below. Query *query = gpdb::MutateQueryTree( (Query *) node, (MutatorWalkerFn) RunExtractAggregatesMutator, context, QTW_IGNORE_RT_SUBQUERIES); diff --git a/src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp b/src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp index 1cc796434dac..46d35e256c99 100644 --- a/src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp +++ b/src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp @@ -210,11 +210,12 @@ CTranslatorDXLToPlStmt::InitTranslators() //--------------------------------------------------------------------------- PlannedStmt * CTranslatorDXLToPlStmt::GetPlannedStmtFromDXL(const CDXLNode *dxlnode, + const Query *orig_query, bool can_set_tag) { GPOS_ASSERT(NULL != dxlnode); - CDXLTranslateContext dxl_translate_ctxt(m_mp, false); + CDXLTranslateContext dxl_translate_ctxt(m_mp, false, orig_query); CDXLTranslationContextArray *ctxt_translation_prev_siblings = GPOS_NEW(m_mp) CDXLTranslationContextArray(m_mp); @@ -307,6 +308,8 @@ CTranslatorDXLToPlStmt::GetPlannedStmtFromDXL(const CDXLNode *dxlnode, } } + planned_stmt->transientPlan = gpdb::MDCacheInTransientState(); + return planned_stmt; } @@ -4724,6 +4727,51 @@ CTranslatorDXLToPlStmt::TranslateDXLTblDescrToRangeTblEntry( return rte; } +//--------------------------------------------------------------------------- +// @function: +// update_unknown_locale_walker +// +// @doc: +// Given an expression tree and a TargetEntry pointer context, look for a +// matching target entry in the expression tree and overwrite the given +// TargetEntry context's resname with the original found in the expression +// tree. +// +//--------------------------------------------------------------------------- +static bool +update_unknown_locale_walker(Node *node, void *context) +{ + if (node == NULL) + { + return false; + } + + TargetEntry *unknown_target_entry = (TargetEntry *) context; + + if (IsA(node, TargetEntry)) + { + TargetEntry *te = (TargetEntry *) node; + + if (te->resorigtbl == unknown_target_entry->resorigtbl && + te->resno == unknown_target_entry->resno) + { + unknown_target_entry->resname = te->resname; + return false; + } + } + else if (IsA(node, Query)) + { + Query *query = (Query *) node; + + return gpdb::WalkExpressionTree( + (Node *) query->targetList, + (bool (*)()) update_unknown_locale_walker, (void *) context); + } + + return gpdb::WalkExpressionTree( + node, (bool (*)()) update_unknown_locale_walker, (void *) context); +} + //--------------------------------------------------------------------------- // @function: // CTranslatorDXLToPlStmt::TranslateDXLProjList @@ -4828,6 +4876,21 @@ CTranslatorDXLToPlStmt::TranslateDXLProjList( } target_entry->resorigtbl = pteOriginal->resorigtbl; target_entry->resorigcol = pteOriginal->resorigcol; + + // ORCA represents strings using wide characters. That can + // require converting from multibyte characters using + // vswprintf(). However, vswprintf() is dependent on the system + // locale which is set at the database level. When that locale + // cannot interpret the string correctly, it fails. ORCA + // bypasses the failure by using a generic "UNKNOWN" string. + // When that happens, the following code translates it back to + // the original multibyte string. + if (strcmp(target_entry->resname, "UNKNOWN") == 0) + { + update_unknown_locale_walker( + (Node *) output_context->GetQuery(), + (void *) target_entry); + } } } diff --git a/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp b/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp index 9e8f7acaff93..3ef6c8ba5d61 100644 --- a/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp +++ b/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp @@ -308,7 +308,11 @@ CTranslatorRelcacheToDXL::RetrieveRelIndexInfoForPartTable(CMemoryPool *mp, GPOS_TRY { - if (IsIndexSupported(index_rel)) + // If the index is supported, but cannot yet be used, ignore it; but + // mark the plan we are generating and cache as transient. + // See src/backend/access/heap/README.HOT for discussion. + if (IsIndexSupported(index_rel) && + !gpdb::MDCacheSetTransientState(index_rel)) { CMDIdGPDB *mdid_index = GPOS_NEW(mp) CMDIdGPDB(IMDId::EmdidInd, index_oid); @@ -364,7 +368,11 @@ CTranslatorRelcacheToDXL::RetrieveRelIndexInfoForNonPartTable(CMemoryPool *mp, GPOS_TRY { - if (IsIndexSupported(index_rel)) + // If the index is supported, but cannot yet be used, ignore it; but + // mark the plan we are generating and cache as transient. + // See src/backend/access/heap/README.HOT for discussion. + if (IsIndexSupported(index_rel) && + !gpdb::MDCacheSetTransientState(index_rel)) { CMDIdGPDB *mdid_index = GPOS_NEW(mp) CMDIdGPDB(IMDId::EmdidInd, index_oid); diff --git a/src/backend/gpopt/utils/COptTasks.cpp b/src/backend/gpopt/utils/COptTasks.cpp index f435ece2c273..7b7564172d6c 100644 --- a/src/backend/gpopt/utils/COptTasks.cpp +++ b/src/backend/gpopt/utils/COptTasks.cpp @@ -285,8 +285,9 @@ COptTasks::LogExceptionMessageAndDelete(CHAR *err_buf, ULONG severity_level) //--------------------------------------------------------------------------- PlannedStmt * COptTasks::ConvertToPlanStmtFromDXL( - CMemoryPool *mp, CMDAccessor *md_accessor, const CDXLNode *dxlnode, - bool can_set_tag, DistributionHashOpsKind distribution_hashops) + CMemoryPool *mp, CMDAccessor *md_accessor, const Query *orig_query, + const CDXLNode *dxlnode, bool can_set_tag, + DistributionHashOpsKind distribution_hashops) { GPOS_ASSERT(NULL != md_accessor); GPOS_ASSERT(NULL != dxlnode); @@ -305,8 +306,8 @@ COptTasks::ConvertToPlanStmtFromDXL( // translate DXL -> PlannedStmt CTranslatorDXLToPlStmt dxl_to_plan_stmt_translator( mp, md_accessor, &dxl_to_plan_stmt_ctxt, gpdb::GetGPSegmentCount()); - return dxl_to_plan_stmt_translator.GetPlannedStmtFromDXL(dxlnode, - can_set_tag); + return dxl_to_plan_stmt_translator.GetPlannedStmtFromDXL( + dxlnode, orig_query, can_set_tag); } @@ -497,11 +498,13 @@ COptTasks::OptimizeTask(void *ptr) { CMDCache::Init(); CMDCache::SetCacheQuota(optimizer_mdcache_size * 1024L); + gpdb::MDCacheResetTransientState(); } else if (reset_mdcache) { CMDCache::Reset(); CMDCache::SetCacheQuota(optimizer_mdcache_size * 1024L); + gpdb::MDCacheResetTransientState(); } else if (CMDCache::ULLGetCacheQuota() != (ULLONG) optimizer_mdcache_size * 1024L) @@ -603,7 +606,8 @@ COptTasks::OptimizeTask(void *ptr) // that may not have the correct can_set_tag opt_ctxt->m_plan_stmt = (PlannedStmt *) gpdb::CopyObject(ConvertToPlanStmtFromDXL( - mp, &mda, plan_dxl, opt_ctxt->m_query->canSetTag, + mp, &mda, opt_ctxt->m_query, plan_dxl, + opt_ctxt->m_query->canSetTag, query_to_dxl_translator->GetDistributionHashOpsKind())); } diff --git a/src/backend/gporca/data/dxl/minidump/JoinOnReplicatedUniversal.mdp b/src/backend/gporca/data/dxl/minidump/JoinOnReplicatedUniversal.mdp new file mode 100644 index 000000000000..cef2133ef16b --- /dev/null +++ b/src/backend/gporca/data/dxl/minidump/JoinOnReplicatedUniversal.mdp @@ -0,0 +1,294 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/backend/gporca/data/dxl/minidump/SubqueryOuterRefTVF.mdp b/src/backend/gporca/data/dxl/minidump/SubqueryOuterRefTVF.mdp index 3f1c092385db..edabce95f90c 100644 --- a/src/backend/gporca/data/dxl/minidump/SubqueryOuterRefTVF.mdp +++ b/src/backend/gporca/data/dxl/minidump/SubqueryOuterRefTVF.mdp @@ -274,7 +274,7 @@ - + @@ -290,7 +290,7 @@ - + diff --git a/src/backend/gporca/libgpopt/src/operators/CLogicalRowTrigger.cpp b/src/backend/gporca/libgpopt/src/operators/CLogicalRowTrigger.cpp index 01cb44d1a59b..7793a9816527 100644 --- a/src/backend/gporca/libgpopt/src/operators/CLogicalRowTrigger.cpp +++ b/src/backend/gporca/libgpopt/src/operators/CLogicalRowTrigger.cpp @@ -33,7 +33,8 @@ CLogicalRowTrigger::CLogicalRowTrigger(CMemoryPool *mp) m_rel_mdid(NULL), m_type(0), m_pdrgpcrOld(NULL), - m_pdrgpcrNew(NULL) + m_pdrgpcrNew(NULL), + m_efs(IMDFunction::EfsImmutable) { m_fPattern = true; } @@ -53,7 +54,8 @@ CLogicalRowTrigger::CLogicalRowTrigger(CMemoryPool *mp, IMDId *rel_mdid, m_rel_mdid(rel_mdid), m_type(type), m_pdrgpcrOld(pdrgpcrOld), - m_pdrgpcrNew(pdrgpcrNew) + m_pdrgpcrNew(pdrgpcrNew), + m_efs(IMDFunction::EfsImmutable) { GPOS_ASSERT(rel_mdid->IsValid()); GPOS_ASSERT(0 != type); diff --git a/src/backend/gporca/libgpopt/src/operators/CPhysicalJoin.cpp b/src/backend/gporca/libgpopt/src/operators/CPhysicalJoin.cpp index b92ce281a8f5..3267296e50aa 100644 --- a/src/backend/gporca/libgpopt/src/operators/CPhysicalJoin.cpp +++ b/src/backend/gporca/libgpopt/src/operators/CPhysicalJoin.cpp @@ -458,11 +458,13 @@ CPhysicalJoin::PdsDerive(CMemoryPool *mp, CExpressionHandle &exprhdl) const CDistributionSpec *pds; - if (CDistributionSpec::EdtStrictReplicated == pdsOuter->Edt() || - CDistributionSpec::EdtTaintedReplicated == pdsOuter->Edt() || - CDistributionSpec::EdtUniversal == pdsOuter->Edt()) + if ((CDistributionSpec::EdtStrictReplicated == pdsOuter->Edt() || + CDistributionSpec::EdtTaintedReplicated == pdsOuter->Edt() || + CDistributionSpec::EdtUniversal == pdsOuter->Edt()) && + CDistributionSpec::EdtUniversal != pdsInner->Edt()) { - // if outer is replicated/universal, return inner distribution + // if outer is replicated/universal and inner is not universal + // then return inner distribution pds = pdsInner; } else diff --git a/src/backend/gporca/libgpopt/src/xforms/CXformJoin2IndexApplyGeneric.cpp b/src/backend/gporca/libgpopt/src/xforms/CXformJoin2IndexApplyGeneric.cpp index 4221f74e0907..83d5094b1d12 100644 --- a/src/backend/gporca/libgpopt/src/xforms/CXformJoin2IndexApplyGeneric.cpp +++ b/src/backend/gporca/libgpopt/src/xforms/CXformJoin2IndexApplyGeneric.cpp @@ -266,9 +266,12 @@ CXformJoin2IndexApplyGeneric::Transform(CXformContext *pxfctxt, pexprGet = pexprCurrInnerChild; if (NULL != groupingColsToCheck.Value() && - !groupingColsToCheck->ContainsAll(distributionCols)) + (!groupingColsToCheck->ContainsAll(distributionCols) || + ptabdescInner->GetRelDistribution() == + IMDRelation::EreldistrRandom)) { - // the grouping columns are not a superset of the distribution columns + // the grouping columns are not a superset of the distribution columns, + // or distribution columns are empty when the table is randomly distributed return; } } @@ -281,6 +284,16 @@ CXformJoin2IndexApplyGeneric::Transform(CXformContext *pxfctxt, ptabdescInner = popDynamicGet->Ptabdesc(); distributionCols = popDynamicGet->PcrsDist(); pexprGet = pexprCurrInnerChild; + + if (NULL != groupingColsToCheck.Value() && + (!groupingColsToCheck->ContainsAll(distributionCols) || + ptabdescInner->GetRelDistribution() == + IMDRelation::EreldistrRandom)) + { + // the grouping columns are not a superset of the distribution columns, + // or distribution columns are empty when the table is randomly distributed + return; + } } break; diff --git a/src/backend/gporca/libgpos/include/gpos/error/CException.h b/src/backend/gporca/libgpos/include/gpos/error/CException.h index d616de5415ec..83d2ba9cd3cd 100644 --- a/src/backend/gporca/libgpos/include/gpos/error/CException.h +++ b/src/backend/gporca/libgpos/include/gpos/error/CException.h @@ -135,9 +135,6 @@ class CException // unknown exception ExmiUnhandled, - // illegal byte sequence - ExmiIllegalByteSequence, - ExmiSentinel }; diff --git a/src/backend/gporca/libgpos/server/src/unittest/gpos/string/CWStringTest.cpp b/src/backend/gporca/libgpos/server/src/unittest/gpos/string/CWStringTest.cpp index a216a82160b6..483b8a47e07f 100644 --- a/src/backend/gporca/libgpos/server/src/unittest/gpos/string/CWStringTest.cpp +++ b/src/backend/gporca/libgpos/server/src/unittest/gpos/string/CWStringTest.cpp @@ -178,30 +178,23 @@ CWStringTest::EresUnittest_AppendFormatInvalidLocale() CAutoMemoryPool amp(CAutoMemoryPool::ElcExc); CMemoryPool *mp = amp.Pmp(); + CWStringDynamic *expected = + GPOS_NEW(mp) CWStringDynamic(mp, GPOS_WSZ_LIT("UNKNOWN")); + CHAR *oldLocale = setlocale(LC_CTYPE, NULL); CWStringDynamic *pstr1 = GPOS_NEW(mp) CWStringDynamic(mp); GPOS_RESULT eres = GPOS_OK; setlocale(LC_CTYPE, "C"); - GPOS_TRY - { - pstr1->AppendFormat(GPOS_WSZ_LIT("%s"), (CHAR *) "ÃË", 123); - - eres = GPOS_FAILED; - } - GPOS_CATCH_EX(ex) - { - GPOS_ASSERT(GPOS_MATCH_EX(ex, CException::ExmaSystem, - CException::ExmiIllegalByteSequence)); + pstr1->AppendFormat(GPOS_WSZ_LIT("%s"), (CHAR *) "ÃË", 123); - GPOS_RESET_EX; - } - GPOS_CATCH_END; + pstr1->Equals(expected); // cleanup setlocale(LC_CTYPE, oldLocale); GPOS_DELETE(pstr1); + GPOS_DELETE(expected); return eres; } diff --git a/src/backend/gporca/libgpos/src/common/clibwrapper.cpp b/src/backend/gporca/libgpos/src/common/clibwrapper.cpp index f5dabb720a68..dc516515e1c5 100644 --- a/src/backend/gporca/libgpos/src/common/clibwrapper.cpp +++ b/src/backend/gporca/libgpos/src/common/clibwrapper.cpp @@ -358,7 +358,11 @@ gpos::clib::Vswprintf(WCHAR *wcstr, SIZE_T max_len, const WCHAR *format, { // Invalid multibyte character encountered. This can happen if the byte sequence does not // match with the server encoding. - GPOS_RAISE(CException::ExmaSystem, CException::ExmiIllegalByteSequence); + // + // Rather than fail/fall-back here, ORCA uses a generic "UNKNOWN" + // string. During DXL to PlStmt translation this will be translated + // back using the original query tree (see TranslateDXLProjList) + res = swprintf(wcstr, max_len, format, "UNKNOWN"); } return res; diff --git a/src/backend/gporca/libgpos/src/error/CMessage.cpp b/src/backend/gporca/libgpos/src/error/CMessage.cpp index 34d6a2b568d2..d9337dc564c8 100644 --- a/src/backend/gporca/libgpos/src/error/CMessage.cpp +++ b/src/backend/gporca/libgpos/src/error/CMessage.cpp @@ -272,16 +272,6 @@ CMessage::GetMessage(ULONG index) CException(CException::ExmaUnhandled, CException::ExmiUnhandled), CException::ExsevError, GPOS_WSZ_WSZLEN("Unhandled exception"), 0, GPOS_WSZ_WSZLEN("Unhandled exception")), - - CMessage( - CException(CException::ExmaSystem, - CException::ExmiIllegalByteSequence), - CException::ExsevError, - GPOS_WSZ_WSZLEN( - "Invalid multibyte character for locale encountered in metadata name"), - 0, - GPOS_WSZ_WSZLEN( - "Invalid multibyte character for locale encountered in metadata name")), }; return &msg[index]; diff --git a/src/backend/gporca/server/CMakeLists.txt b/src/backend/gporca/server/CMakeLists.txt index 1fb38754c06b..eb9f9677d1e1 100644 --- a/src/backend/gporca/server/CMakeLists.txt +++ b/src/backend/gporca/server/CMakeLists.txt @@ -336,7 +336,8 @@ ReplicatedJoinRandomDistributedTable ReplicatedLOJHashDistributedTable ReplicatedLOJRandomDistributedTable ReplicatedLOJReplicated ReplicatedNLJReplicated ReplicatedTableAggregate ReplicatedTableCTE ReplicatedTableGroupBy ReplicatedJoinPartitionedTable -ReplicatedTableInClause ReplicatedTableSequenceInsert; +ReplicatedTableInClause ReplicatedTableSequenceInsert +JoinOnReplicatedUniversal; CTaintedReplicatedTest: InsertNonSingleton NonSingleton TaintedReplicatedAgg TaintedReplicatedWindowAgg TaintedReplicatedLimit TaintedReplicatedFilter diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 4319d5973827..ab8610d5fe6e 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -1717,21 +1717,6 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, /* XXX rel->onerow = ??? */ } - if (rel->subplan->flow->locustype == CdbLocusType_General && - (contain_volatile_functions((Node *) rel->subplan->targetlist) || - contain_volatile_functions(subquery->havingQual))) - { - rel->subplan->flow->locustype = CdbLocusType_SingleQE; - rel->subplan->flow->flotype = FLOW_SINGLETON; - } - - if (rel->subplan->flow->locustype == CdbLocusType_SegmentGeneral && - (contain_volatile_functions((Node *) rel->subplan->targetlist) || - contain_volatile_functions(subquery->havingQual))) - { - rel->subplan = (Plan *) make_motion_gather(subroot, rel->subplan, NIL, CdbLocusType_SingleQE); - } - rel->subroot = subroot; /* Isolate the params needed by this specific subplan */ @@ -2104,7 +2089,7 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) * subplan will not be used by InitPlans, so that they can be shared * if this CTE is referenced multiple times (excluding in InitPlans). */ - if (cteplaninfo->shared_plan == NULL) + if (cteplaninfo->subplan == NULL) { PlannerConfig *config = CopyPlannerConfig(root->config); @@ -2125,15 +2110,44 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) subplan = subquery_planner(cteroot->glob, subquery, cteroot, cte->cterecursive, tuple_fraction, &subroot, config); - cteplaninfo->shared_plan = prepare_plan_for_sharing(cteroot, subplan); + /* + * Sharing General and SegmentGeneral subplan may lead to deadlock + * when executed with 1-gang and joined with N-gang. + */ + if (CdbPathLocus_IsGeneral(*subplan->flow) || + CdbPathLocus_IsSegmentGeneral(*subplan->flow)) + { + cteplaninfo->subplan = subplan; + } + else + { + cteplaninfo->subplan = prepare_plan_for_sharing(cteroot, subplan); + } + cteplaninfo->subroot = subroot; } /* * Create another ShareInputScan to reference the already-created - * subplan. + * subplan if not avoiding sharing for General and SegmentGeneral + * subplans. */ - subplan = share_prepared_plan(cteroot, cteplaninfo->shared_plan); + if (CdbPathLocus_IsGeneral(*cteplaninfo->subplan->flow) || + CdbPathLocus_IsSegmentGeneral(*cteplaninfo->subplan->flow)) + { + /* + * If we are not sharing and subplan was created just now, use it. + * Otherwise, make a copy of it to avoid construction of DAG + * instead of a tree. + */ + if (subplan == NULL) + subplan = (Plan *) copyObject(cteplaninfo->subplan); + } + else + { + subplan = share_prepared_plan(cteroot, cteplaninfo->subplan); + } + subroot = cteplaninfo->subroot; } diff --git a/src/backend/optimizer/plan/orca.c b/src/backend/optimizer/plan/orca.c index 057f192bc667..49cdcfe76805 100644 --- a/src/backend/optimizer/plan/orca.c +++ b/src/backend/optimizer/plan/orca.c @@ -241,7 +241,6 @@ optimize_query(Query *parse, ParamListInfo boundParams) result->relationOids = glob->relationOids; result->invalItems = glob->invalItems; result->oneoffPlan = glob->oneoffPlan; - result->transientPlan = glob->transientPlan; return result; } diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 215d43f7b60c..e0b998b9671f 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -923,6 +923,24 @@ subquery_planner(PlannerGlobal *glob, Query *parse, SS_finalize_plan(root, plan, true); } + /* + * If plan contains volatile functions in the target list, then we need + * bring it to SingleQE + */ + if (plan->flow->locustype == CdbLocusType_General && + (contain_volatile_functions((Node *) plan->targetlist) || + contain_volatile_functions(parse->havingQual))) + { + plan->flow->locustype = CdbLocusType_SingleQE; + plan->flow->flotype = FLOW_SINGLETON; + } + else if (plan->flow->locustype == CdbLocusType_SegmentGeneral && + (contain_volatile_functions((Node *) plan->targetlist) || + contain_volatile_functions(parse->havingQual))) + { + plan = (Plan *) make_motion_gather(root, plan, NIL, CdbLocusType_SingleQE); + } + /* Return internal info if caller wants it */ if (subroot) *subroot = root; diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index ceb711d5e93e..1d7253ec35d5 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -669,21 +669,6 @@ make_subplan(PlannerInfo *root, Query *orig_subquery, SubLinkType subLinkType, &subroot, config); - if (plan->flow->locustype == CdbLocusType_General && - (contain_volatile_functions((Node *) plan->targetlist) || - contain_volatile_functions(subquery->havingQual))) - { - plan->flow->locustype = CdbLocusType_SingleQE; - plan->flow->flotype = FLOW_SINGLETON; - } - - if (plan->flow->locustype == CdbLocusType_SegmentGeneral && - (contain_volatile_functions((Node *) plan->targetlist) || - contain_volatile_functions(subquery->havingQual))) - { - plan = (Plan *) make_motion_gather(subroot, plan, NIL, CdbLocusType_SingleQE); - } - /* Isolate the params needed by this specific subplan */ plan_params = root->plan_params; root->plan_params = NIL; diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 21f7345fc2f4..a5666f901484 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -657,13 +657,34 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt) * separate from the subquery's tlist because we may add columns, * insert datatype coercions, etc.) * - * Const and Param nodes of type UNKNOWN in the SELECT's targetlist - * no longer need special treatment here. They'll be assigned proper - * types later by coerce_type() upon assignment to the target columns. - * Otherwise this fails: INSERT INTO foo SELECT 'bar', ... FROM baz + * HACK: unknown-type constants and params in the SELECT's targetlist + * are copied up as-is rather than being referenced as subquery + * outputs. This is to ensure that when we try to coerce them to + * the target column's datatype, the right things happen (see + * special cases in coerce_type). Otherwise, this fails: + * INSERT INTO foo SELECT 'bar', ... FROM baz *---------- */ - expandRTE(rte, rtr->rtindex, 0, -1, false, NULL, &exprList); + exprList = NIL; + foreach(lc, selectQuery->targetList) + { + TargetEntry *tle = (TargetEntry *) lfirst(lc); + Expr *expr; + + if (tle->resjunk) + continue; + if (tle->expr && + (IsA(tle->expr, Const) ||IsA(tle->expr, Param)) && + exprType((Node *) tle->expr) == UNKNOWNOID) + expr = tle->expr; + else + { + Var *var = makeVarFromTargetEntry(rtr->rtindex, tle); + + expr = (Expr *) var; + } + exprList = lappend(exprList, expr); + } /* Prepare row for assignment to target table */ exprList = transformInsertRow(pstate, exprList, diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index f33cf0e07ff9..bb95a798265d 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -421,11 +421,7 @@ static BackgroundWorker PMAuxProcList[MaxPMAuxProc] = #ifdef ENABLE_IC_PROXY {"ic proxy process", -#ifdef FAULT_INJECTOR BGWORKER_SHMEM_ACCESS, -#else - 0, -#endif BgWorkerStart_RecoveryFinished, 0, /* restart immediately if ic proxy process exits with non-zero code */ ICProxyMain, {0}, {0}, 0, 0, diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index 204bb55ad53b..aaf831603179 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -134,7 +134,19 @@ struct BufFile /* This holds holds compressed input, during decompression. */ ZSTD_inBuffer compressed_buffer; bool decompression_finished; + + /* Memory usage by ZSTD compression buffer */ + size_t compressed_buffer_size; #endif + + /* + * workfile_set for the files in current buffile. The workfile_set creator + * should take care of the workfile_set's lifecycle. So, no need to call + * workfile_mgr_close_set under the buffile logic. + * If the workfile_set is created in BufFileCreateTemp. The workfile_set + * should get freed once all the files in it are closed in BufFileClose. + */ + workfile_set *work_set; }; /* @@ -177,6 +189,10 @@ makeBufFile(File firstfile) file->maxoffset = 0L; file->buffer = palloc(BLCKSZ); +#ifdef USE_ZSTD + file->compressed_buffer_size = 0; +#endif + return file; } @@ -227,6 +243,7 @@ BufFileCreateTempInSet(workfile_set *work_set, bool interXact) file = makeBufFile(pfile); file->isTemp = true; + file->work_set = work_set; FileSetIsWorkfile(file->file); RegisterFileWithSet(file->file, work_set); @@ -288,6 +305,7 @@ BufFileCreateNamedTemp(const char *fileName, bool interXact, workfile_set *work_ if (work_set) { + file->work_set = work_set; FileSetIsWorkfile(file->file); RegisterFileWithSet(file->file, work_set); } @@ -986,11 +1004,26 @@ bool gp_workfile_compression; /* GUC */ void BufFilePledgeSequential(BufFile *buffile) { + workfile_set *work_set = buffile->work_set; + if (buffile->maxoffset != 0) elog(ERROR, "cannot pledge sequential access to a temporary file after writing it"); - if (gp_workfile_compression) + AssertImply(work_set->compression_buf_total > 0, gp_workfile_compression); + + /* + * If gp_workfile_compression_overhead_limit is 0, it means no limit for + * memory used by compressed work files. Othersize, compress the work file + * only when the used memory size is under the limit. + */ + if (gp_workfile_compression && + (gp_workfile_compression_overhead_limit == 0 || + work_set->compression_buf_total < + gp_workfile_compression_overhead_limit * 1024UL)) + { BufFileStartCompression(buffile); + work_set->num_files_compressed++; + } } /* @@ -1072,6 +1105,7 @@ static void BufFileDumpCompressedBuffer(BufFile *file, const void *buffer, Size nbytes) { ZSTD_inBuffer input; + size_t compressed_buffer_size = 0; file->uncompressed_bytes += nbytes; @@ -1104,6 +1138,32 @@ BufFileDumpCompressedBuffer(BufFile *file, const void *buffer, Size nbytes) file->maxoffset += wrote; } } + + /* + * Calculate the delta of buffer used by ZSTD stream and take it into + * account to work_set->comp_buf_total. + * On GPDB 7X, we call ZSTD API ZSTD_sizeof_CStream() to get the buffer + * size. However, the API is unavaliable on 6X (marked as + * ZSTD_STATIC_LINKING_ONLY) due to different version of ZSTD lib. + * After some experiments, it's proved that the compression buffer size + * per file is pretty stable (about 1.3MB) regard of the temp file size, + * so we simply use the hard-coded value here. + * We may use the API ZSTD_sizeof_CStream() in future if the ZSTD lib + * version is updated on 6X. + */ + + compressed_buffer_size = 1.3 * 1024 * 1024; + + /* + * As ZSTD comments said, the memory usage can evolve (increase or + * decrease) over time. We update work_set->compressed_buffer_size only + * when compressed_buffer_size increases. It means we apply the comp buff + * limit to max ever memory usage and ignore the case of memory decreasing. + */ + if (compressed_buffer_size > file->compressed_buffer_size) + file->work_set->compression_buf_total + += compressed_buffer_size - file->compressed_buffer_size; + file->compressed_buffer_size = compressed_buffer_size; } /* diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index e06441cbbf7d..3c664ccf93eb 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -67,6 +67,7 @@ #include "utils/session_state.h" #include "cdb/cdbendpoint.h" #include "replication/gp_replication.h" +#include "cdb/ic_proxy_bgworker.h" shmem_startup_hook_type shmem_startup_hook = NULL; @@ -185,6 +186,10 @@ CreateSharedMemoryAndSemaphores(int port) size = add_size(size, FaultInjector_ShmemSize()); #endif +#ifdef ENABLE_IC_PROXY + size = add_size(size, ICProxyShmemSize()); +#endif + /* This elog happens before we know the name of the log file we are supposed to use */ elog(DEBUG1, "Size not including the buffer pool %lu", (unsigned long) size); @@ -337,6 +342,10 @@ CreateSharedMemoryAndSemaphores(int port) FaultInjector_ShmemInit(); #endif +#ifdef ENABLE_IC_PROXY + ICProxyShmemInit(); +#endif + /* * Set up other modules that need some shared memory space */ diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index c65c8bf0f922..a3a45c5153a8 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -4010,9 +4010,25 @@ ProcessInterrupts(const char* filename, int lineno) (errcode(ERRCODE_GP_OPERATION_CANCELED), errmsg("canceling MPP operation%s", cancel_msg_str.data))); else - ereport(ERROR, - (errcode(ERRCODE_QUERY_CANCELED), - errmsg("canceling statement due to user request%s", cancel_msg_str.data))); + { + char msec_str[32]; + + switch (check_log_duration(msec_str, false)) + { + case 0: + ereport(ERROR, + (errcode(ERRCODE_QUERY_CANCELED), + errmsg("canceling statement due to user request%s", cancel_msg_str.data))); + break; + case 1: + case 2: + ereport(ERROR, + (errcode(ERRCODE_QUERY_CANCELED), + errmsg("canceling statement due to user request%s, duration:%s", + cancel_msg_str.data, msec_str))); + break; + } + } } } /* If we get here, do nothing (probably, QueryCancelPending was reset) */ diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index 0994a6157dd1..53ce9a74b301 100644 --- a/src/backend/utils/adt/genfile.c +++ b/src/backend/utils/adt/genfile.c @@ -59,7 +59,7 @@ typedef struct * absolute paths that match DataDir or Log_directory. */ static char * -convert_and_check_filename(text *arg) +convert_and_check_filename(text *arg, bool abs_ok) { char *filename; @@ -68,6 +68,19 @@ convert_and_check_filename(text *arg) if (is_absolute_path(filename)) { + /* + * Allow absolute path if caller indicates so. Only for superuser. + * This is to support utility function gp_move_orphaned_files which + * can move files between absolute paths. So far only pg_file_rename + * requires abs_ok=true. + * + * P.S. in 7X superuser can do the same but it is achieved via a new + * role 'pg_read_server_files' which is not in 6X. So adding the + * superuser() check instead. + */ + if (abs_ok && superuser()) + return filename; + /* Disallow '/a/b/data/..' */ if (path_contains_parent_reference(filename)) ereport(ERROR, @@ -237,7 +250,7 @@ pg_read_file(PG_FUNCTION_ARGS) if (PG_NARGS() >= 4) missing_ok = PG_GETARG_BOOL(3); - filename = convert_and_check_filename(filename_t); + filename = convert_and_check_filename(filename_t, false); result = read_text_file(filename, seek_offset, bytes_to_read, missing_ok); if (result) @@ -278,7 +291,7 @@ pg_read_binary_file(PG_FUNCTION_ARGS) if (PG_NARGS() >= 4) missing_ok = PG_GETARG_BOOL(3); - filename = convert_and_check_filename(filename_t); + filename = convert_and_check_filename(filename_t, false); result = read_binary_file(filename, seek_offset, bytes_to_read, missing_ok); @@ -345,7 +358,7 @@ pg_stat_file(PG_FUNCTION_ARGS) if (PG_NARGS() == 2) missing_ok = PG_GETARG_BOOL(1); - filename = convert_and_check_filename(filename_t); + filename = convert_and_check_filename(filename_t, false); if (stat(filename, &fst) < 0) { @@ -445,7 +458,7 @@ pg_ls_dir(PG_FUNCTION_ARGS) oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); fctx = palloc(sizeof(directory_fctx)); - fctx->location = convert_and_check_filename(PG_GETARG_TEXT_P(0)); + fctx->location = convert_and_check_filename(PG_GETARG_TEXT_P(0), false); fctx->include_dot_dirs = include_dot_dirs; fctx->dirdesc = AllocateDir(fctx->location); @@ -512,7 +525,7 @@ pg_file_write(PG_FUNCTION_ARGS) requireSuperuser(); - filename = convert_and_check_filename(PG_GETARG_TEXT_P(0)); + filename = convert_and_check_filename(PG_GETARG_TEXT_P(0), false); data = PG_GETARG_TEXT_P(1); if (!PG_GETARG_BOOL(2)) @@ -563,12 +576,12 @@ pg_file_rename(PG_FUNCTION_ARGS) if (PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL(); - fn1 = convert_and_check_filename(PG_GETARG_TEXT_P(0)); - fn2 = convert_and_check_filename(PG_GETARG_TEXT_P(1)); + fn1 = convert_and_check_filename(PG_GETARG_TEXT_P(0), true); + fn2 = convert_and_check_filename(PG_GETARG_TEXT_P(1), true); if (PG_ARGISNULL(2)) fn3 = 0; else - fn3 = convert_and_check_filename(PG_GETARG_TEXT_P(2)); + fn3 = convert_and_check_filename(PG_GETARG_TEXT_P(2), true); if (access(fn1, W_OK) < 0) { @@ -647,7 +660,7 @@ pg_file_unlink(PG_FUNCTION_ARGS) requireSuperuser(); - filename = convert_and_check_filename(PG_GETARG_TEXT_P(0)); + filename = convert_and_check_filename(PG_GETARG_TEXT_P(0), false); if (access(filename, W_OK) < 0) { @@ -818,7 +831,7 @@ pg_file_length(PG_FUNCTION_ARGS) requireSuperuser(); - filename = convert_and_check_filename(filename_t); + filename = convert_and_check_filename(filename_t, false); if (stat(filename, &fst) < 0) ereport(ERROR, diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index fade51a691f0..f27c779d7c51 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -90,6 +90,7 @@ static void CheckMyDatabase(const char *name, bool am_superuser); static void InitCommunication(void); static void ShutdownPostgres(int code, Datum arg); static void StatementTimeoutHandler(void); +static void GpParallelRetrieveCursorCheckTimeoutHandler(void); static void LockTimeoutHandler(void); static void ClientCheckTimeoutHandler(void); static bool ThereIsAtLeastOneRole(void); @@ -689,6 +690,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, { RegisterTimeout(DEADLOCK_TIMEOUT, CheckDeadLock); RegisterTimeout(STATEMENT_TIMEOUT, StatementTimeoutHandler); + RegisterTimeout(GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT, GpParallelRetrieveCursorCheckTimeoutHandler); RegisterTimeout(LOCK_TIMEOUT, LockTimeoutHandler); RegisterTimeout(GANG_TIMEOUT, IdleGangTimeoutHandler); RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler); @@ -1444,6 +1446,36 @@ StatementTimeoutHandler(void) #endif kill(MyProcPid, SIGINT); } +extern bool DoingCommandRead; +static void +GpParallelRetrieveCursorCheckTimeoutHandler(void) +{ + /* + * issue: https://github.com/greenplum-db/gpdb/issues/15143 + * + * handle errors of parallel retrieve cursor's non-root slices + */ + if (DoingCommandRead) + { + Assert(Gp_role == GP_ROLE_DISPATCH); + + /* It calls cdbdisp_checkForCancel(), which doesn't raise error */ + gp_check_parallel_retrieve_cursor_error(); + int num = GetNumOfParallelRetrieveCursors(); + + /* Reset the alarm to check after a timeout */ + if (num > 0) + { + elog(DEBUG1, "There are still %d parallel retrieve cursors alive", num); + enable_parallel_retrieve_cursor_check_timeout(); + } + } + else + { + elog(DEBUG1, "DoingCommandRead is false, check parallel cursor timeout delay"); + enable_parallel_retrieve_cursor_check_timeout(); + } +} /* * LOCK_TIMEOUT handler: trigger a query-cancel interrupt. diff --git a/src/backend/utils/misc/guc_gp.c b/src/backend/utils/misc/guc_gp.c index b49b6a6a27ae..bae10eb794b1 100644 --- a/src/backend/utils/misc/guc_gp.c +++ b/src/backend/utils/misc/guc_gp.c @@ -3298,6 +3298,17 @@ struct config_bool ConfigureNamesBool_gp[] = NULL, NULL, NULL }, + { + {"gp_detect_data_correctness", PGC_USERSET, UNGROUPED, + gettext_noop("Detect if the current partitioning of the table or data distribution is correct."), + NULL, + GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE + }, + &gp_detect_data_correctness, + false, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL @@ -3507,6 +3518,17 @@ struct config_int ConfigureNamesInt_gp[] = NULL, NULL, NULL }, + { + {"gp_workfile_compression_overhead_limit", PGC_USERSET, RESOURCES, + gettext_noop("The overhead memory (kB) limit for all compressed workfiles of a single workfile_set."), + gettext_noop("0 for no limit. Once the limit is hit, the following files will not be compressed."), + GUC_UNIT_KB + }, + &gp_workfile_compression_overhead_limit, + 2048 * 1024, 0, INT_MAX, + NULL, NULL, NULL + }, + { {"gp_workfile_limit_per_segment", PGC_POSTMASTER, RESOURCES, gettext_noop("Maximum disk space (in KB) used for workfiles per segment."), @@ -3671,6 +3693,17 @@ struct config_int ConfigureNamesInt_gp[] = NULL, NULL, NULL }, + { + {"gp_interconnect_cursor_ic_table_size", PGC_USERSET, GP_ARRAY_TUNING, + gettext_noop("Sets the size of Cursor Table in the UDP interconnect"), + gettext_noop("You can try to increase it when a UDF which contains many concurrent " + "cursor queries hangs. The default value is 128.") + }, + &Gp_interconnect_cursor_ic_table_size, + 128, 128, 102400, + NULL, NULL, NULL + }, + { {"gp_interconnect_timer_period", PGC_USERSET, GP_ARRAY_TUNING, gettext_noop("Sets the timer period (in ms) for UDP interconnect"), diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c index 78faeb6a3545..20d82410126d 100644 --- a/src/backend/utils/misc/timeout.c +++ b/src/backend/utils/misc/timeout.c @@ -417,13 +417,23 @@ RegisterTimeout(TimeoutId id, timeout_handler_proc handler) /* There's no need to disable the signal handler here. */ - if (id >= USER_TIMEOUT) + /* + * GP_ABI_BUMP_FIXME + * + * all the GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT here were MAX_TIMEOUTS, + * we did the change to avoid ABI break via putting the + * GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT after the reserved + * USER_TIMEOUTs and before MAX_TIMEOUTS. + * + * restore to the original shape once we are fine to bump the ABI version. + */ + if (id >= USER_TIMEOUT && id < GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT) { /* Allocate a user-defined timeout reason */ - for (id = USER_TIMEOUT; id < MAX_TIMEOUTS; id++) + for (id = USER_TIMEOUT; id < GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT; id++) if (all_timeouts[id].timeout_handler == NULL) break; - if (id >= MAX_TIMEOUTS) + if (id >= GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT) ereport(FATAL, (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), errmsg("cannot add more timeout reasons"))); diff --git a/src/backend/utils/mmgr/memprot.c b/src/backend/utils/mmgr/memprot.c index 15e3aa01c67b..1c46b15b3718 100644 --- a/src/backend/utils/mmgr/memprot.c +++ b/src/backend/utils/mmgr/memprot.c @@ -315,7 +315,13 @@ static void gp_failed_to_alloc(MemoryAllocationStatus ec, int en, int sz) } else if (ec == MemoryFailure_VmemExhausted) { - elog(LOG, "Logging memory usage for reaching Vmem limit"); + /* + * The memory usage have reached Vmem limit, it will loop in gp_malloc + * and gp_failed_to_alloc if new allocation happens, and then errors out + * with "ERRORDATA_STACK_SIZE exceeded". We are therefore printing the + * log message header using write_stderr. + */ + write_stderr("Logging memory usage for reaching Vmem limit"); } else if (ec == MemoryFailure_SystemMemoryExhausted) { @@ -330,7 +336,10 @@ static void gp_failed_to_alloc(MemoryAllocationStatus ec, int en, int sz) } else if (ec == MemoryFailure_ResourceGroupMemoryExhausted) { - elog(LOG, "Logging memory usage for reaching resource group limit"); + /* + * The behavior in resgroup group mode is the same as MemoryFailure_VmemExhausted. + */ + write_stderr("Logging memory usage for reaching resource group limit"); } else elog(ERROR, "Unknown memory failure error code"); diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c index 620c9bfafb92..08a41f3d0bec 100644 --- a/src/backend/utils/mmgr/portalmem.c +++ b/src/backend/utils/mmgr/portalmem.c @@ -1282,3 +1282,40 @@ ThereAreNoReadyPortals(void) return true; } + +/* Find all Parallel Retrieve cursors and return a list of Portals */ +List * +GetAllParallelRetrieveCursorPortals(void) +{ + List *portals; + PortalHashEnt *hentry; + HASH_SEQ_STATUS status; + + if (PortalHashTable == NULL) + return NULL; + + portals = NULL; + hash_seq_init(&status, PortalHashTable); + while ((hentry = hash_seq_search(&status)) != NULL) + { + if (PortalIsParallelRetrieveCursor(hentry->portal) && + hentry->portal->queryDesc != NULL) + portals = lappend(portals, hentry->portal); + } + + return portals; +} + +/* Return the amount of parallel retrieve cursors */ +int +GetNumOfParallelRetrieveCursors(void) +{ + List *portals; + int sum; + + portals = GetAllParallelRetrieveCursorPortals(); + sum = list_length(portals); + + list_free(portals); + return sum; +} diff --git a/src/backend/utils/workfile_manager/workfile_mgr.c b/src/backend/utils/workfile_manager/workfile_mgr.c index fae74545d7f7..2704c7952a2b 100644 --- a/src/backend/utils/workfile_manager/workfile_mgr.c +++ b/src/backend/utils/workfile_manager/workfile_mgr.c @@ -634,6 +634,8 @@ workfile_mgr_create_set_internal(const char *operator_name, const char *prefix) work_set->total_bytes = 0; work_set->active = true; work_set->pinned = false; + work_set->compression_buf_total = 0; + work_set->num_files_compressed = 0; /* Track all workfile_sets created in current process */ if (!localCtl.initialized) diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 679c1bcfc5b5..091f28c68c6a 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -4,6 +4,7 @@ use TestLib; use File::Compare; use File::Path qw(rmtree); +use PostgresNode; use Test::More tests => 48 + 4; program_help_ok('pg_basebackup'); @@ -74,9 +75,19 @@ ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created'); ########################## Test that the headers are zeroed out in both the primary and mirror WAL files -my $compare_tempdir = "$tempdir/checksum_test"; - -# Ensure that when pg_basebackup is run that the last WAL segment file +my $node_wal_compare_primary = get_new_node('wal_compare_primary'); +# We need to enable archiving for this test because we depend on the backup history +# file created by pg_basebackup to retrieve the "STOP WAL LOCATION". This file only +# gets persisted if archiving is turned on. +$node_wal_compare_primary->init( + has_archiving => 1, + allows_streaming => 1); +$node_wal_compare_primary->start; + +my $node_wal_compare_primary_datadir = $node_wal_compare_primary->data_dir; +my $node_wal_compare_standby_datadir = "$tempdir/wal_compare_standby"; + +# Ensure that when pg_basebackup is run, the last WAL segment file # containing the XLOG_BACKUP_END and XLOG_SWITCH records match on both # the primary and mirror segment. We want to ensure that all pages after # the XLOG_SWITCH record are all zeroed out. Previously, the primary @@ -86,31 +97,41 @@ # and would lead to checksum mismatches for external tools that checked # for that. -#Insert data and then run pg_basebackup -psql 'postgres', 'CREATE TABLE zero_header_test as SELECT generate_series(1,1000);'; -command_ok([ 'pg_basebackup', '-D', $compare_tempdir, '--target-gp-dbid', '123' , '-X', 'stream'], +# Insert data and then run pg_basebackup +$node_wal_compare_primary->psql('postgres', 'CREATE TABLE zero_header_test as SELECT generate_series(1,1000);'); +$node_wal_compare_primary->command_ok([ 'pg_basebackup', '-D', $node_wal_compare_standby_datadir, '--target-gp-dbid', '123' , '-X', 'stream'], 'pg_basebackup wal file comparison test'); -ok( -f "$compare_tempdir/PG_VERSION", 'pg_basebackup ran successfully'); - -my $current_wal_file = psql 'postgres', "SELECT pg_xlogfile_name(pg_current_xlog_location());"; -my $primary_wal_file_path = "$tempdir/pgdata/pg_xlog/$current_wal_file"; -my $mirror_wal_file_path = "$compare_tempdir/pg_xlog/$current_wal_file"; - -## Test that primary and mirror WAL file is the same +ok( -f "$node_wal_compare_standby_datadir/PG_VERSION", 'pg_basebackup ran successfully'); + +# We can't rely on `pg_current_xlog_location()` to get the last WAL filename that was +# copied over to the standby. This is because it's possible for newer WAL files +# to get created after pg_basebackup is run. +# So instead, we rely on the backup history file created by pg_basebackup to get +# this information. We can safely assume that there's only one backup history +# file in the primary's xlog dir +my $backup_history_file = "$node_wal_compare_primary_datadir/pg_xlog/*.backup"; +my $stop_wal_file_cmd = 'sed -n "s/STOP WAL LOCATION.*(file //p" ' . $backup_history_file . ' | sed "s/)//g"'; +my $stop_wal_file = `$stop_wal_file_cmd`; +chomp($stop_wal_file); +my $primary_wal_file_path = "$node_wal_compare_primary_datadir/pg_xlog/$stop_wal_file"; +my $mirror_wal_file_path = "$node_wal_compare_standby_datadir/pg_xlog/$stop_wal_file"; + +# Test that primary and mirror WAL file is the same ok(compare($primary_wal_file_path, $mirror_wal_file_path) eq 0, "wal file comparison"); -## Test that all the bytes after the last written record in the WAL file are zeroed out -my $total_bytes_cmd = 'pg_controldata ' . $compare_tempdir . ' | grep "Bytes per WAL segment:" | awk \'{print $5}\''; +# Test that all the bytes after the last written record in the WAL file are zeroed out +my $total_bytes_cmd = 'pg_controldata ' . $node_wal_compare_standby_datadir . ' | grep "Bytes per WAL segment:" | awk \'{print $5}\''; my $total_allocated_bytes = `$total_bytes_cmd`; -my $current_lsn_cmd = 'pg_xlogdump -f ' . $primary_wal_file_path . ' | grep "xlog switch" | awk \'{print $10}\' | sed "s/,//"'; +my $current_lsn_cmd = 'pg_xlogdump ' . $primary_wal_file_path . ' | grep "xlog switch" | awk \'{print $10}\' | sed "s/,//"'; my $current_lsn = `$current_lsn_cmd`; chomp($current_lsn); -my $current_byte_offset = psql 'postgres', "SELECT file_offset FROM pg_xlogfile_name_offset('$current_lsn');"; -#Get offset of last written record +my $current_byte_offset = $node_wal_compare_primary->safe_psql('postgres', "SELECT file_offset FROM pg_xlogfile_name_offset('$current_lsn');"); + +# Get offset of last written record open my $fh, '<:raw', $primary_wal_file_path; -#Since pg_xlogfile_name_offset does not account for the xlog switch record, we need to add it ourselves +# Since pg_xlogfile_name_offset does not account for the xlog switch record, we need to add it ourselves my $xlog_switch_record_len = 32; seek $fh, $current_byte_offset + $xlog_switch_record_len, 0; my $bytes_read = ""; @@ -119,6 +140,8 @@ close $fh; ok($bytes_read =~ /\A\x00*+\z/, 'make sure wal segment is zeroed'); +############################## End header test ##################################### + # The following tests test symlinks. Windows doesn't have symlinks, so # skip on Windows. SKIP: { diff --git a/src/include/cdb/cdbendpoint.h b/src/include/cdb/cdbendpoint.h index b9fd6d74353d..8be13ff746af 100644 --- a/src/include/cdb/cdbendpoint.h +++ b/src/include/cdb/cdbendpoint.h @@ -140,6 +140,8 @@ extern enum EndPointExecPosition GetParallelCursorEndpointPosition(PlannedStmt * extern void WaitEndpointsReady(EState *estate); extern void AtAbort_EndpointExecState(void); extern void allocEndpointExecState(void); +extern bool gp_check_parallel_retrieve_cursor_error(void); +extern void enable_parallel_retrieve_cursor_check_timeout(void); /* * Below functions should run on Endpoints(QE/Entry DB). diff --git a/src/include/cdb/cdbhash.h b/src/include/cdb/cdbhash.h index 15bd957e15bb..40be9811f334 100644 --- a/src/include/cdb/cdbhash.h +++ b/src/include/cdb/cdbhash.h @@ -50,6 +50,7 @@ typedef struct CdbHash */ extern CdbHash *makeCdbHash(int numsegs, int natts, Oid *typeoids); extern CdbHash *makeCdbHashForRelation(Relation rel); +extern void freeCdbHash(CdbHash *h); /* * Initialize CdbHash for hashing the next tuple values. diff --git a/src/include/cdb/cdbvars.h b/src/include/cdb/cdbvars.h index 88d99280b891..6529ce4c17dc 100644 --- a/src/include/cdb/cdbvars.h +++ b/src/include/cdb/cdbvars.h @@ -183,6 +183,9 @@ extern int gp_reject_percent_threshold; */ extern bool gp_select_invisible; +/* Detect if the current partitioning of the table or data distribution is correct */ +extern bool gp_detect_data_correctness; + /* * Used to set the maximum length of the current query which is displayed * when the user queries pg_stat_activty table. @@ -417,6 +420,16 @@ extern int Gp_interconnect_queue_depth; * */ extern int Gp_interconnect_snd_queue_depth; + +/* + * Cursor IC table size. + * + * For cursor case, there may be several concurrent interconnect + * instances on QD. The table is used to track the status of the + * instances, which is quite useful for "ACK the past and NAK the future" paradigm. + * + */ +extern int Gp_interconnect_cursor_ic_table_size; extern int Gp_interconnect_timer_period; extern int Gp_interconnect_timer_checking_period; extern int Gp_interconnect_default_rtt; @@ -837,6 +850,7 @@ extern int gpperfmon_log_alert_level; extern int gp_workfile_limit_per_segment; extern int gp_workfile_limit_per_query; extern int gp_workfile_limit_files_per_query; +extern int gp_workfile_compression_overhead_limit; extern int gp_workfile_caching_loglevel; extern int gp_sessionstate_loglevel; extern int gp_workfile_bytes_to_checksum; diff --git a/src/include/cdb/ic_proxy_bgworker.h b/src/include/cdb/ic_proxy_bgworker.h index d30a73c285d5..a9a9a4b2d49a 100644 --- a/src/include/cdb/ic_proxy_bgworker.h +++ b/src/include/cdb/ic_proxy_bgworker.h @@ -13,10 +13,14 @@ #ifndef IC_PROXY_BGWORKER_H #define IC_PROXY_BGWORKER_H -#include "postgres.h" +#include "port/atomics.h" +/* flag (in SHM) for incidaing if peer listener bind/listen failed */ +extern pg_atomic_uint32 *ic_proxy_peer_listener_failed; extern bool ICProxyStartRule(Datum main_arg); extern void ICProxyMain(Datum main_arg); +extern Size ICProxyShmemSize(void); +extern void ICProxyShmemInit(void); #endif /* IC_PROXY_BGWORKER_H */ diff --git a/src/include/executor/execDML.h b/src/include/executor/execDML.h index 2d0124897bab..ff66c9e4cc4a 100644 --- a/src/include/executor/execDML.h +++ b/src/include/executor/execDML.h @@ -24,6 +24,9 @@ reconstructTupleValues(AttrMap *map, extern TupleTableSlot * reconstructMatchingTupleSlot(TupleTableSlot *slot, ResultRelInfo *resultRelInfo); +extern void +makePartitionCheckMap(EState *estate, ResultRelInfo *resultRelInfo); + /* * In PostgreSQL, ExecInsert, ExecDelete and ExecUpdate are static in nodeModifyTable.c. * In GPDB, they're exported. diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h index d4b4d86641ac..ffe5f28eb16b 100644 --- a/src/include/executor/hashjoin.h +++ b/src/include/executor/hashjoin.h @@ -208,6 +208,12 @@ typedef struct HashJoinTableData HashJoinState * hjstate; /* reference to the enclosing HashJoinState */ bool first_pass; /* Is this the first pass (pre-rescan) */ + + /* Statistic info of work file set, copied from work_set */ + uint32 workset_num_files; + uint32 workset_num_files_compressed; + uint64 workset_avg_file_size; + uint64 workset_compression_buf_total; } HashJoinTableData; #endif /* HASHJOIN_H */ diff --git a/src/include/gpopt/gpdbwrappers.h b/src/include/gpopt/gpdbwrappers.h index 12ab3a7b3ac8..4f8fa6b24635 100644 --- a/src/include/gpopt/gpdbwrappers.h +++ b/src/include/gpopt/gpdbwrappers.h @@ -683,9 +683,20 @@ FaultInjectorType_e InjectFaultInOptTasks(const char *fault_name); gpos::ULONG CountLeafPartTables(Oid oidRelation); // Does the metadata cache need to be reset (because of a catalog -// table has been changed?) +// table has been changed or TransactionXmin changed from that we saved)? bool MDCacheNeedsReset(void); +// Check that the index is usable in the current snapshot and if not, save the +// xmin of the current snapshot. Returns true if the index is not usable and +// should be skipped. +bool MDCacheSetTransientState(Relation index_rel); + +// reset TransactionXmin value that we saved +void MDCacheResetTransientState(void); + +// returns true if cache is in transient state +bool MDCacheInTransientState(void); + // returns true if a query cancel is requested in GPDB bool IsAbortRequested(void); diff --git a/src/include/gpopt/translate/CDXLTranslateContext.h b/src/include/gpopt/translate/CDXLTranslateContext.h index 31e07e681945..bd9b59ca144f 100644 --- a/src/include/gpopt/translate/CDXLTranslateContext.h +++ b/src/include/gpopt/translate/CDXLTranslateContext.h @@ -17,6 +17,12 @@ #ifndef GPDXL_CDXLTranslateContext_H #define GPDXL_CDXLTranslateContext_H +extern "C" { +#include "postgres.h" + +#include "nodes/plannodes.h" +} + #include "gpos/base.h" #include "gpos/common/CHashMap.h" #include "gpos/common/CHashMapIter.h" @@ -78,12 +84,15 @@ class CDXLTranslateContext // to use OUTER instead of 0 for Var::varno in Agg target lists (MPP-12034) BOOL m_is_child_agg_node; + const Query *m_query; + // copy the params hashmap void CopyParamHashmap(ULongToColParamMap *original); public: // ctor/dtor - CDXLTranslateContext(CMemoryPool *mp, BOOL is_child_agg_node); + CDXLTranslateContext(CMemoryPool *mp, BOOL is_child_agg_node, + const Query *query); CDXLTranslateContext(CMemoryPool *mp, BOOL is_child_agg_node, ULongToColParamMap *original); @@ -100,6 +109,12 @@ class CDXLTranslateContext return m_colid_to_paramid_map; } + const Query * + GetQuery() + { + return m_query; + } + // return the target entry corresponding to the given ColId const TargetEntry *GetTargetEntry(ULONG colid) const; diff --git a/src/include/gpopt/translate/CTranslatorDXLToPlStmt.h b/src/include/gpopt/translate/CTranslatorDXLToPlStmt.h index 809cdae278fa..279ff83c706a 100644 --- a/src/include/gpopt/translate/CTranslatorDXLToPlStmt.h +++ b/src/include/gpopt/translate/CTranslatorDXLToPlStmt.h @@ -178,6 +178,7 @@ class CTranslatorDXLToPlStmt // main translation routine for DXL tree -> PlannedStmt PlannedStmt *GetPlannedStmtFromDXL(const CDXLNode *dxlnode, + const Query *orig_query, bool can_set_tag); // translate the join types from its DXL representation to the GPDB one diff --git a/src/include/gpopt/utils/COptTasks.h b/src/include/gpopt/utils/COptTasks.h index ea6d30650e4f..5c70e589db8a 100644 --- a/src/include/gpopt/utils/COptTasks.h +++ b/src/include/gpopt/utils/COptTasks.h @@ -132,8 +132,9 @@ class COptTasks // translate a DXL tree into a planned statement static PlannedStmt *ConvertToPlanStmtFromDXL( - CMemoryPool *mp, CMDAccessor *md_accessor, const CDXLNode *dxlnode, - bool can_set_tag, DistributionHashOpsKind distribution_hashops); + CMemoryPool *mp, CMDAccessor *md_accessor, const Query *orig_query, + const CDXLNode *dxlnode, bool can_set_tag, + DistributionHashOpsKind distribution_hashops); // load search strategy from given path static CSearchStageArray *LoadSearchStrategy(CMemoryPool *mp, char *path); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 0c6692e9eac9..2ff11d3f587b 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -382,11 +382,11 @@ typedef struct ResultRelInfo uint64 ri_aoprocessed; /* tuples added/deleted for AO */ struct AttrMap *ri_partInsertMap; TupleTableSlot *ri_resultSlot; - /* Parent relation in checkPartitionUpdate */ + /* Parent relation in makePartitionCheckMap */ Relation ri_PartitionParent; - /* tupdesc_match for checkPartitionUpdate */ + /* tupdesc_match for makePartitionCheckMap */ int ri_PartCheckTupDescMatch; - /* Attribute map in checkPartitionUpdate */ + /* Attribute map in makePartitionCheckMap */ struct AttrMap *ri_PartCheckMap; /* diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index b609b3406c25..d5c33a47edad 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -322,11 +322,13 @@ typedef struct PlannerInfo typedef struct CtePlanInfo { /* - * A subplan, prepared for sharing among many CTE references by - * prepare_plan_for_sharing(), that implements the CTE. NULL if the - * CTE is not shared among references. + * A subplan, that implements the CTE and which is prepared either for + * sharing among many CTE references by prepare_plan_for_sharing() or + * for inlining in cases, when sharing produces invalid plans. NULL if + * the CTE is not shared among references (gp_cte_sharing is off), or to + * be planned or inlined and has not been planned yet. */ - Plan *shared_plan; + Plan *subplan; /* * The subroot corresponding to the subplan. diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h index 64b503996889..ebe7c61d735d 100644 --- a/src/include/utils/portal.h +++ b/src/include/utils/portal.h @@ -253,5 +253,7 @@ extern bool ThereAreNoReadyPortals(void); extern void AtExitCleanup_ResPortals(void); extern void TotalResPortalIncrements(int pid, Oid queueid, Cost *totalIncrements, int *num); +extern List *GetAllParallelRetrieveCursorPortals(void); +extern int GetNumOfParallelRetrieveCursors(void); #endif /* PORTAL_H */ diff --git a/src/include/utils/sync_guc_name.h b/src/include/utils/sync_guc_name.h index 4b9a23665477..a5bd03e67f67 100644 --- a/src/include/utils/sync_guc_name.h +++ b/src/include/utils/sync_guc_name.h @@ -18,6 +18,7 @@ "gp_blockdirectory_minipage_size", "gp_debug_linger", "gp_default_storage_options", + "gp_detect_data_correctness", "gp_disable_tuple_hints", "gp_enable_mk_sort", "gp_enable_motion_mk_sort", @@ -31,6 +32,7 @@ "gp_indexcheck_insert", "gp_indexcheck_vacuum", "gp_initial_bad_row_limit", + "gp_interconnect_cursor_ic_table_size", "gp_interconnect_debug_retry_interval", "gp_interconnect_default_rtt", "gp_interconnect_fc_method", @@ -79,6 +81,7 @@ "gp_vmem_idle_resource_timeout", "gp_workfile_caching_loglevel", "gp_workfile_compression", + "gp_workfile_compression_overhead_limit", "gp_workfile_limit_files_per_query", "gp_workfile_limit_per_query", "IntervalStyle", diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h index fef38e5539de..cf987128399c 100644 --- a/src/include/utils/timeout.h +++ b/src/include/utils/timeout.h @@ -16,6 +16,9 @@ #include "datatype/timestamp.h" +/* GPDB: the period of parallel retrieve cursor check */ +#define GP_PARALLEL_RETRIEVE_CURSOR_CHECK_PERIOD_MS (10000) + /* * Identifiers for timeout reasons. Note that in case multiple timeouts * trigger at the same time, they are serviced in the order of this enum. @@ -33,8 +36,14 @@ typedef enum TimeoutId CLIENT_CONNECTION_CHECK_TIMEOUT, /* First user-definable timeout reason */ USER_TIMEOUT, - /* Maximum number of timeout reasons */ - MAX_TIMEOUTS = 16 + /* + * GP_ABI_BUMP_FIXME + * To not break ABI, we have to reserve the timeouts from the **original** + * USER_TIMEOUT (included) and the **original** MAX_TIMEOUTS, [9, 16) in + * this case. + */ + GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT = 16, + MAX_TIMEOUTS } TimeoutId; /* callback function signature */ diff --git a/src/include/utils/workfile_mgr.h b/src/include/utils/workfile_mgr.h index c31d7332c49e..2c50d52185e8 100644 --- a/src/include/utils/workfile_mgr.h +++ b/src/include/utils/workfile_mgr.h @@ -87,6 +87,12 @@ typedef struct workfile_set /* Used to track workfile_set created in current process */ dlist_node local_node; + + /* Total memory usage by compression buffer */ + uint64 compression_buf_total; + + /* Number of compressed work files */ + uint32 num_files_compressed; } workfile_set; /* Workfile Set operations */ diff --git a/src/pl/plpython/expected/plpython_test.out b/src/pl/plpython/expected/plpython_test.out index f377e614ed3d..81641de2952a 100755 --- a/src/pl/plpython/expected/plpython_test.out +++ b/src/pl/plpython/expected/plpython_test.out @@ -78,3 +78,33 @@ CONTEXT: Traceback (most recent call last): PL/Python function "elog_test", line 10, in plpy.error('error') PL/Python function "elog_test" +-- Long query Log will be truncated when writing log.csv. +-- If a UTF-8 character is truncated in its middle, +-- the encoding ERROR will appear due to the half of a UTF-8 character, like �. +-- PR-11946 can fix it. +-- If you want more detail, please refer to ISSUE-15319 +SET client_encoding TO 'UTF8'; +CREATE FUNCTION elog_test_string_truncate() RETURNS void +AS $$ +plpy.log("1"+("床前明月光疑是地上霜举头望明月低头思故乡\n"+ +"独坐幽篁里弹琴复长啸深林人不知明月来相照\n"+ +"千山鸟飞绝万径人踪灭孤舟蓑笠翁独钓寒江雪\n"+ +"白日依山尽黄河入海流欲穷千里目更上一层楼\n"+ +"好雨知时节当春乃发生随风潜入夜润物细无声\n")*267) +$$ LANGUAGE plpythonu; +SELECT elog_test_string_truncate(); + elog_test_string_truncate +--------------------------- + +(1 row) + +SELECT logseverity FROM gp_toolkit.__gp_log_master_ext order by logtime desc limit 5; + logseverity +------------- + LOG + LOG + LOG + LOG + LOG +(5 rows) + diff --git a/src/pl/plpython/sql/plpython_test.sql b/src/pl/plpython/sql/plpython_test.sql index 3a761047a091..449c211e890c 100644 --- a/src/pl/plpython/sql/plpython_test.sql +++ b/src/pl/plpython/sql/plpython_test.sql @@ -51,3 +51,23 @@ plpy.error('error') $$ LANGUAGE plpythonu; SELECT elog_test(); + +-- Long query Log will be truncated when writing log.csv. +-- If a UTF-8 character is truncated in its middle, +-- the encoding ERROR will appear due to the half of a UTF-8 character, like �. +-- PR-11946 can fix it. +-- If you want more detail, please refer to ISSUE-15319 +SET client_encoding TO 'UTF8'; + +CREATE FUNCTION elog_test_string_truncate() RETURNS void +AS $$ +plpy.log("1"+("床前明月光疑是地上霜举头望明月低头思故乡\n"+ +"独坐幽篁里弹琴复长啸深林人不知明月来相照\n"+ +"千山鸟飞绝万径人踪灭孤舟蓑笠翁独钓寒江雪\n"+ +"白日依山尽黄河入海流欲穷千里目更上一层楼\n"+ +"好雨知时节当春乃发生随风潜入夜润物细无声\n")*267) +$$ LANGUAGE plpythonu; + +SELECT elog_test_string_truncate(); + +SELECT logseverity FROM gp_toolkit.__gp_log_master_ext order by logtime desc limit 5; diff --git a/src/test/isolation/expected/create_index_hot.out b/src/test/isolation/expected/create_index_hot.out index 519318e1d9f0..fb37c9a989d7 100644 --- a/src/test/isolation/expected/create_index_hot.out +++ b/src/test/isolation/expected/create_index_hot.out @@ -1,6 +1,6 @@ Parsed test spec with 2 sessions -starting permutation: s2begin s2select s1update s1createindexonc s2select s2forceindexscan s2select +starting permutation: s2begin s2select s1optimizeroff s1update s1createindexonc s2select s2forceindexscan s2select step s2begin: BEGIN ISOLATION LEVEL SERIALIZABLE; step s2select: select '#' as expected, c from hot where c = '#' union all @@ -8,6 +8,7 @@ step s2select: select '#' as expected, c from hot where c = '#' expected c # # +step s1optimizeroff: set optimizer = off; step s1update: update hot set c = '$' where c = '#'; step s1createindexonc: create index idx_c on hot (c); step s2select: select '#' as expected, c from hot where c = '#' diff --git a/src/test/isolation/specs/create_index_hot.spec b/src/test/isolation/specs/create_index_hot.spec index bb80d8e3cdec..fe224c4917d1 100644 --- a/src/test/isolation/specs/create_index_hot.spec +++ b/src/test/isolation/specs/create_index_hot.spec @@ -23,7 +23,9 @@ teardown # Update a row, and create an index on the updated column. This produces # a broken HOT chain. +#FIXME do not turn off the optimizer when ORCA stops always using Split Update. session "s1" +step "s1optimizeroff" { set optimizer = off; } step "s1update" { update hot set c = '$' where c = '#'; } step "s1createindexonc" { create index idx_c on hot (c); } @@ -39,6 +41,7 @@ permutation "s2begin" "s2select" + "s1optimizeroff" "s1update" "s1createindexonc" diff --git a/src/test/isolation2/expected/bitmap_index_concurrent.out b/src/test/isolation2/expected/bitmap_index_concurrent.out index 4cccce527a45..281ad02fe627 100644 --- a/src/test/isolation2/expected/bitmap_index_concurrent.out +++ b/src/test/isolation2/expected/bitmap_index_concurrent.out @@ -345,3 +345,54 @@ SELECT count(*) FROM bmupdate WHERE id >= 97 and id <= 99 and gp_segment_id = 0; 6401 (1 row) +-- Regression test, when large amount of inserts concurrent inserts happen, +-- querying the table shouldn't take along time. +-- This test is from https://github.com/greenplum-db/gpdb/issues/15389 +DROP TABLE IF EXISTS bug.let_me_out; +DROP +DROP SCHEMA IF EXISTS bug; +DROP +CREATE SCHEMA bug; +CREATE +CREATE TABLE bug.let_me_out ( date_column date NULL, int_column int4 NULL ) WITH (appendonly = true, orientation = column) distributed randomly; +CREATE + +1&: INSERT INTO bug.let_me_out(date_column, int_column) SELECT ('2017-01-01'::timestamp + random() * ('2023-08-10'::timestamp - '2017-01-01'::timestamp))::date AS date_column, id / 50000 AS int_column -- id % 700 as int_column FROM generate_series(1, 30000000) s(id); + +2&: INSERT INTO bug.let_me_out(date_column, int_column) SELECT ('2017-01-01'::timestamp + random() * ('2023-08-10'::timestamp - '2017-01-01'::timestamp))::date AS date_column, id / 50000 AS int_column -- id % 700 as int_column FROM generate_series(30000000, 50000000) s(id); + +1<: <... completed> +INSERT 30000000 +2<: <... completed> +INSERT 20000001 + +CREATE INDEX idx_let_me_out__date_column ON bug.let_me_out USING bitmap (date_column); +CREATE +CREATE INDEX idx_let_me_out__int_column ON bug.let_me_out USING bitmap (int_column); +CREATE +VACUUM FULL ANALYZE bug.let_me_out; +VACUUM + +SET random_page_cost = 1; +SET +-- expected to finish under 250ms, but if we go over 60000, then something really bad happened +SET statement_timeout=60000; +SET +EXPLAIN ANALYZE SELECT date_column, int_column FROM bug.let_me_out WHERE date_column in ('2023-03-19', '2023-03-08', '2023-03-13', '2023-03-29', '2023-03-20', '2023-03-28', '2023-03-23', '2023-03-04', '2023-03-05', '2023-03-18', '2023-03-14', '2023-03-06', '2023-03-15', '2023-03-31', '2023-03-11', '2023-03-21', '2023-03-24', '2023-03-30', '2023-03-26', '2023-03-03', '2023-03-22', '2023-03-01', '2023-03-12', '2023-03-17', '2023-03-27', '2023-03-07', '2023-03-16', '2023-03-10', '2023-03-25', '2023-03-09', '2023-03-02') AND int_column IN (1003,1025,1026,1033,1034,1216,1221,160,161,1780,3049,305,3051,3052,3069,3077,3083,3084,3092,3121,3122,3123,3124,3180,3182,3183,3184,3193,3225,3226,3227,3228,3234,3267,3269,3270,3271,3272,3277,3301,3302,3303,3305,3307,3308,3310,3314,3317,3318,3319,3320,3321,3343,3344,3345,3347,3348,3388,339,341,345,346,347,349,3522,3565,3606,3607,3610,3612,3613,3637,3695,3738,3739,3740,3741,3742,3764,3829,3859,3861,3864,3865,3866,3867,3870,3871,3948,3967,3969,3971,3974,3975,3976,4043,4059,4061,4062,4064,4065,4069,4070,4145,42,423,4269,43,4300,4303,4308,4311,4312,4313,4361,4449,445,446,4475,4476,4479,4480,4483,4485,4486,450,4581,4609,4610,4611,4613,4614,4685,4707,4708,4709,4710,4799,4800,4825,4831,4832,4905,4940,4941,4942,4945,4947,4948,4953,4954,4957,540,572,627,743,762,763,77,787,80,81,84,871,899,901,902,905,906); + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) (cost=67739.00..108455.94 rows=270079 width=8) (actual time=32.410..123.035 rows=20163 loops=1) + -> Bitmap Heap Scan on let_me_out (cost=67739.00..108455.94 rows=90027 width=8) (actual time=33.937..119.569 rows=6800 loops=1) + Recheck Cond: ((date_column = ANY ('{03-19-2023,03-08-2023,03-13-2023,03-29-2023,03-20-2023,03-28-2023,03-23-2023,03-04-2023,03-05-2023,03-18-2023,03-14-2023,03-06-2023,03-15-2023,03-31-2023,03-11-2023,03-21-2023,03-24-2023,03-30-2023,03-26-2023,03-03-2023,03-22-2023,03-01-2023,03-12-2023,03-17-2023,03-27-2023,03-07-2023,03-16-2023,03-10-2023,03-25-2023,03-09-2023,03-02-2023}'::date[])) AND (int_column = ANY ('{1003,1025,1026,1033,1034,1216,1221,160,161,1780,3049,305,3051,3052,3069,3077,3083,3084,3092,3121,3122,3123,3124,3180,3182,3183,3184,3193,3225,3226,3227,3228,3234,3267,3269,3270,3271,3272,3277,3301,3302,3303,3305,3307,3308,3310,3314,3317,3318,3319,3320,3321,3343,3344,3345,3347,3348,3388,339,341,345,346,347,349,3522,3565,3606,3607,3610,3612,3613,3637,3695,3738,3739,3740,3741,3742,3764,3829,3859,3861,3864,3865,3866,3867,3870,3871,3948,3967,3969,3971,3974,3975,3976,4043,4059,4061,4062,4064,4065,4069,4070,4145,42,423,4269,43,4300,4303,4308,4311,4312,4313,4361,4449,445,446,4475,4476,4479,4480,4483,4485,4486,450,4581,4609,4610,4611,4613,4614,4685,4707,4708,4709,4710,4799,4800,4825,4831,4832,4905,4940,4941,4942,4945,4947,4948,4953,4954,4957,540,572,627,743,762,763,77,787,80,81,84,871,899,901,902,905,906}'::integer[]))) + -> BitmapAnd (cost=67739.00..67739.00 rows=36530 width=0) (actual time=17.288..17.288 rows=1 loops=1) + -> Bitmap Index Scan on idx_let_me_out__date_column (cost=0.00..5393.04 rows=221868 width=0) (actual time=7.834..7.834 rows=31 loops=1) + Index Cond: (date_column = ANY ('{03-19-2023,03-08-2023,03-13-2023,03-29-2023,03-20-2023,03-28-2023,03-23-2023,03-04-2023,03-05-2023,03-18-2023,03-14-2023,03-06-2023,03-15-2023,03-31-2023,03-11-2023,03-21-2023,03-24-2023,03-30-2023,03-26-2023,03-03-2023,03-22-2023,03-01-2023,03-12-2023,03-17-2023,03-27-2023,03-07-2023,03-16-2023,03-10-2023,03-25-2023,03-09-2023,03-02-2023}'::date[])) + -> Bitmap Index Scan on idx_let_me_out__int_column (cost=0.00..62210.67 rows=2744086 width=0) (actual time=9.449..9.449 rows=169 loops=1) + Index Cond: (int_column = ANY ('{1003,1025,1026,1033,1034,1216,1221,160,161,1780,3049,305,3051,3052,3069,3077,3083,3084,3092,3121,3122,3123,3124,3180,3182,3183,3184,3193,3225,3226,3227,3228,3234,3267,3269,3270,3271,3272,3277,3301,3302,3303,3305,3307,3308,3310,3314,3317,3318,3319,3320,3321,3343,3344,3345,3347,3348,3388,339,341,345,346,347,349,3522,3565,3606,3607,3610,3612,3613,3637,3695,3738,3739,3740,3741,3742,3764,3829,3859,3861,3864,3865,3866,3867,3870,3871,3948,3967,3969,3971,3974,3975,3976,4043,4059,4061,4062,4064,4065,4069,4070,4145,42,423,4269,43,4300,4303,4308,4311,4312,4313,4361,4449,445,446,4475,4476,4479,4480,4483,4485,4486,450,4581,4609,4610,4611,4613,4614,4685,4707,4708,4709,4710,4799,4800,4825,4831,4832,4905,4940,4941,4942,4945,4947,4948,4953,4954,4957,540,572,627,743,762,763,77,787,80,81,84,871,899,901,902,905,906}'::integer[])) + Planning time: 11.073 ms + (slice0) Executor memory: 119K bytes. + (slice1) Executor memory: 49521K bytes avg x 3 workers, 49521K bytes max (seg0). + Memory used: 128000kB + Optimizer: Postgres query optimizer + Execution time: 126.450 ms +(14 rows) diff --git a/src/test/isolation2/expected/guc_gp.out b/src/test/isolation2/expected/guc_gp.out new file mode 100644 index 000000000000..70b788cd6498 --- /dev/null +++ b/src/test/isolation2/expected/guc_gp.out @@ -0,0 +1,61 @@ +-- case 1: test gp_detect_data_correctness +create table data_correctness_detect(a int, b int); +CREATE +create table data_correctness_detect_randomly(a int, b int) distributed randomly; +CREATE +create table data_correctness_detect_replicated(a int, b int) distributed replicated; +CREATE + +set gp_detect_data_correctness = on; +SET +-- should no data insert +insert into data_correctness_detect select i, i from generate_series(1, 100) i; +INSERT 0 +select count(*) from data_correctness_detect; + count +------- + 0 +(1 row) +insert into data_correctness_detect_randomly select i, i from generate_series(1, 100) i; +INSERT 0 +select count(*) from data_correctness_detect_randomly; + count +------- + 0 +(1 row) +insert into data_correctness_detect_replicated select i, i from generate_series(1, 100) i; +INSERT 0 +select count(*) from data_correctness_detect_replicated; + count +------- + 0 +(1 row) +set gp_detect_data_correctness = off; +SET + +-- insert some data that not belongs to it +1U: insert into data_correctness_detect select i, i from generate_series(1, 100) i; +INSERT 100 +1U: insert into data_correctness_detect_randomly select i, i from generate_series(1, 100) i; +INSERT 100 +1U: insert into data_correctness_detect_replicated select i, i from generate_series(1, 100) i; +INSERT 100 +set gp_detect_data_correctness = on; +SET +insert into data_correctness_detect select * from data_correctness_detect; +ERROR: trying to insert row into wrong segment (seg1 127.0.1.1:6003 pid=3027104) +insert into data_correctness_detect select * from data_correctness_detect_randomly; +INSERT 0 +insert into data_correctness_detect select * from data_correctness_detect_replicated; +INSERT 0 + +-- clean up +set gp_detect_data_correctness = off; +SET +drop table data_correctness_detect; +DROP +drop table data_correctness_detect_randomly; +DROP +drop table data_correctness_detect_replicated; +DROP + diff --git a/src/test/isolation2/expected/ic_proxy_listen_failed.out b/src/test/isolation2/expected/ic_proxy_listen_failed.out new file mode 100644 index 000000000000..74b914a4deb0 --- /dev/null +++ b/src/test/isolation2/expected/ic_proxy_listen_failed.out @@ -0,0 +1,64 @@ +-- Test case for the scenario which ic-proxy peer listener port has been occupied + +-- start_matchsubs +-- m/ic_tcp.c:\d+/ +-- s/ic_tcp.c:\d+/ic_tcp.c:LINE/ +-- end_matchsubs + +1:create table PR_16438 (i int); +CREATE +1:insert into PR_16438 select generate_series(1,100); +INSERT 100 +1q: ... + +-- get one port and occupy it (start_py_httpserver.sh), then restart cluster +!\retcode ic_proxy_port=`psql postgres -Atc "show gp_interconnect_proxy_addresses;" | awk -F ',' '{print $1}' | awk -F ':' '{print $4}'` && gpstop -ai > /dev/null && ./script/start_py_httpserver.sh $ic_proxy_port; +-- start_ignore +started a http server + +-- end_ignore +(exited with code 0) +!\retcode sleep 2 && gpstart -a > /dev/null; +-- start_ignore + +-- end_ignore +(exited with code 0) + +-- this output is hard to match, let's ignore it +-- start_ignore +2&:select count(*) from PR_16438; +FAILED: Forked command is not blocking; got output: ERROR: Failed to setup ic_proxy interconnect +DETAIL: The ic_proxy process failed to bind or listen. +HINT: Please check the server log for related WARNING messages. +2<: <... completed> +FAILED: Execution failed +2q: ... +-- end_ignore + +-- execute a query (should failed) +3:select count(*) from PR_16438; +ERROR: Failed to setup ic_proxy interconnect +DETAIL: The ic_proxy process failed to bind or listen. +HINT: Please check the server log for related WARNING messages. + +-- kill the script to release port and execute query again (should successfully) +-- Note: different from 7x here, we have to restart cluster (no need in 7x) +-- because 6x's icproxy code doesn't align with 7x: https://github.com/greenplum-db/gpdb/issues/14485 +!\retcode ps aux | grep SimpleHTTPServer | grep -v grep | awk '{print $2}' | xargs kill; +-- start_ignore + +-- end_ignore +(exited with code 0) +!\retcode sleep 2 && gpstop -ari > /dev/null; +-- start_ignore + +-- end_ignore +(exited with code 0) + +4:select count(*) from PR_16438; + count +------- + 100 +(1 row) +4:drop table PR_16438; +DROP diff --git a/src/test/isolation2/expected/misc.out b/src/test/isolation2/expected/misc.out index 0fc8dff0dee2..e2edac1654d0 100644 --- a/src/test/isolation2/expected/misc.out +++ b/src/test/isolation2/expected/misc.out @@ -51,3 +51,22 @@ CREATE -- 0U: create table utilitymode_pt_lt_tab (col1 int, col2 decimal) distributed by (col1) partition by list(col2) (partition part1 values(1)); ERROR: cannot create partition table in utility mode + +-- +-- gp_check_orphaned_files should not be running with concurrent transaction (even idle) +-- +-- use a different database to do the test, otherwise we might be reporting tons +-- of orphaned files produced by the many intential PANICs/restarts in the isolation2 tests. +create database check_orphaned_db; +CREATE +1:@db_name check_orphaned_db: create extension gp_check_functions; +CREATE +1:@db_name check_orphaned_db: begin; +BEGIN +2:@db_name check_orphaned_db: select * from gp_check_orphaned_files; +ERROR: There is a client session running on one or more segment. Aborting... +1q: ... +2q: ... + +drop database check_orphaned_db; +DROP diff --git a/src/test/isolation2/input/parallel_retrieve_cursor/status_check.source b/src/test/isolation2/input/parallel_retrieve_cursor/status_check.source index 92719a0c9ac9..c9d55e859c25 100644 --- a/src/test/isolation2/input/parallel_retrieve_cursor/status_check.source +++ b/src/test/isolation2/input/parallel_retrieve_cursor/status_check.source @@ -256,3 +256,15 @@ insert into t1 select generate_series(1,100); 2: CLOSE c8; 2: END; +---------- Test9: Test parallel retrieve cursor auto-check +1: drop table if exists t1; +1: create table t1(a int, b int); +1: insert into t1 values (generate_series(1,100000), 1); +1: insert into t1 values (-1, 1); +1: BEGIN; +1: DECLARE c9 PARALLEL RETRIEVE CURSOR FOR select count(*) from t1 group by sqrt(a); select count() from gp_get_endpoints(); +-- GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT is 10s, we sleep 12 to check all QEs are already finished. +1: ! sleep 12; +1: SELECT endpointname,auth_token,hostname,port,state FROM gp_get_endpoints() WHERE cursorname='c9'; +1: rollback; +1q: diff --git a/src/test/isolation2/isolation2_ic_proxy_schedule b/src/test/isolation2/isolation2_ic_proxy_schedule index 4b1255784407..3187f0184566 100644 --- a/src/test/isolation2/isolation2_ic_proxy_schedule +++ b/src/test/isolation2/isolation2_ic_proxy_schedule @@ -7,3 +7,6 @@ test: tcp_ic_teardown # test TCP proxy peer shutdown test: ic_proxy_peer_shutdown + +# test ic-proxy listen failed +test: ic_proxy_listen_failed diff --git a/src/test/isolation2/output/parallel_retrieve_cursor/status_check.source b/src/test/isolation2/output/parallel_retrieve_cursor/status_check.source index 393822f7cf30..49788ae5f841 100644 --- a/src/test/isolation2/output/parallel_retrieve_cursor/status_check.source +++ b/src/test/isolation2/output/parallel_retrieve_cursor/status_check.source @@ -1392,3 +1392,29 @@ CLOSE 2: END; END +---------- Test9: Test parallel retrieve cursor auto-check +1: drop table if exists t1; +DROP +1: create table t1(a int, b int); +CREATE +1: insert into t1 values (generate_series(1,100000), 1); +INSERT 100000 +1: insert into t1 values (-1, 1); +INSERT 1 +1: BEGIN; +BEGIN +1: DECLARE c9 PARALLEL RETRIEVE CURSOR FOR select count(*) from t1 group by sqrt(a); select count() from gp_get_endpoints(); + count +------- + 3 +(1 row) +-- GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT is 10s, we sleep 12 to check all QEs are already finished. +1: ! sleep 12; + +1: SELECT endpointname,auth_token,hostname,port,state FROM gp_get_endpoints() WHERE cursorname='c9'; + endpointname | auth_token | hostname | port | state +--------------+------------+----------+------+------- +(0 rows) +1: rollback; +ERROR: cannot take square root of a negative number (seg2 slice1 127.0.1.1:6004 pid=83657) +1q: ... diff --git a/src/test/isolation2/script/start_py_httpserver.sh b/src/test/isolation2/script/start_py_httpserver.sh new file mode 100755 index 000000000000..31457d0ff220 --- /dev/null +++ b/src/test/isolation2/script/start_py_httpserver.sh @@ -0,0 +1,13 @@ +#!/bin/bash +# start a python http server (port is $1) in background + +which python2 > /dev/null +if [ $? -eq 0 ] +then + python2 -m SimpleHTTPServer $1 >/dev/null 2>&1 & + echo "started a http server" + exit 0 +fi + +echo "no python found" +exit 1 diff --git a/src/test/isolation2/sql/bitmap_index_concurrent.sql b/src/test/isolation2/sql/bitmap_index_concurrent.sql index 0d236da5076a..9db6ded7f095 100644 --- a/src/test/isolation2/sql/bitmap_index_concurrent.sql +++ b/src/test/isolation2/sql/bitmap_index_concurrent.sql @@ -184,3 +184,46 @@ SELECT gp_inject_fault('after_read_one_bitmap_idx_page', 'reset', dbid) FROM gp_ -- Let's check the total tuple count after the test. SELECT count(*) FROM bmupdate WHERE id >= 97 and id <= 99 and gp_segment_id = 0; +-- Regression test, when large amount of inserts concurrent inserts happen, +-- querying the table shouldn't take along time. +-- This test is from https://github.com/greenplum-db/gpdb/issues/15389 +DROP TABLE IF EXISTS bug.let_me_out; +DROP SCHEMA IF EXISTS bug; +CREATE SCHEMA bug; +CREATE TABLE bug.let_me_out +( + date_column date NULL, + int_column int4 NULL +) +WITH (appendonly = true, orientation = column) +distributed randomly; + +1&: INSERT INTO bug.let_me_out(date_column, int_column) + SELECT ('2017-01-01'::timestamp + random() * ('2023-08-10'::timestamp - '2017-01-01'::timestamp))::date AS date_column, + id / 50000 AS int_column + -- id % 700 as int_column + FROM generate_series(1, 30000000) s(id); + +2&: INSERT INTO bug.let_me_out(date_column, int_column) + SELECT ('2017-01-01'::timestamp + random() * ('2023-08-10'::timestamp - '2017-01-01'::timestamp))::date AS date_column, + id / 50000 AS int_column + -- id % 700 as int_column + FROM generate_series(30000000, 50000000) s(id); + +1<: +2<: + +CREATE INDEX idx_let_me_out__date_column ON bug.let_me_out USING bitmap (date_column); +CREATE INDEX idx_let_me_out__int_column ON bug.let_me_out USING bitmap (int_column); +VACUUM FULL ANALYZE bug.let_me_out; + +SET random_page_cost = 1; +-- expected to finish under 250ms, but if we go over 60000, then something really bad happened +SET statement_timeout=60000; +EXPLAIN ANALYZE +SELECT date_column, + int_column +FROM bug.let_me_out +WHERE date_column in ('2023-03-19', '2023-03-08', '2023-03-13', '2023-03-29', '2023-03-20', '2023-03-28', '2023-03-23', '2023-03-04', '2023-03-05', '2023-03-18', '2023-03-14', '2023-03-06', '2023-03-15', '2023-03-31', '2023-03-11', '2023-03-21', '2023-03-24', '2023-03-30', '2023-03-26', '2023-03-03', '2023-03-22', '2023-03-01', '2023-03-12', '2023-03-17', '2023-03-27', '2023-03-07', '2023-03-16', '2023-03-10', '2023-03-25', '2023-03-09', '2023-03-02') +AND +int_column IN (1003,1025,1026,1033,1034,1216,1221,160,161,1780,3049,305,3051,3052,3069,3077,3083,3084,3092,3121,3122,3123,3124,3180,3182,3183,3184,3193,3225,3226,3227,3228,3234,3267,3269,3270,3271,3272,3277,3301,3302,3303,3305,3307,3308,3310,3314,3317,3318,3319,3320,3321,3343,3344,3345,3347,3348,3388,339,341,345,346,347,349,3522,3565,3606,3607,3610,3612,3613,3637,3695,3738,3739,3740,3741,3742,3764,3829,3859,3861,3864,3865,3866,3867,3870,3871,3948,3967,3969,3971,3974,3975,3976,4043,4059,4061,4062,4064,4065,4069,4070,4145,42,423,4269,43,4300,4303,4308,4311,4312,4313,4361,4449,445,446,4475,4476,4479,4480,4483,4485,4486,450,4581,4609,4610,4611,4613,4614,4685,4707,4708,4709,4710,4799,4800,4825,4831,4832,4905,4940,4941,4942,4945,4947,4948,4953,4954,4957,540,572,627,743,762,763,77,787,80,81,84,871,899,901,902,905,906); diff --git a/src/test/isolation2/sql/guc_gp.sql b/src/test/isolation2/sql/guc_gp.sql new file mode 100644 index 000000000000..2e5a560704ae --- /dev/null +++ b/src/test/isolation2/sql/guc_gp.sql @@ -0,0 +1,30 @@ +-- case 1: test gp_detect_data_correctness +create table data_correctness_detect(a int, b int); +create table data_correctness_detect_randomly(a int, b int) distributed randomly; +create table data_correctness_detect_replicated(a int, b int) distributed replicated; + +set gp_detect_data_correctness = on; +-- should no data insert +insert into data_correctness_detect select i, i from generate_series(1, 100) i; +select count(*) from data_correctness_detect; +insert into data_correctness_detect_randomly select i, i from generate_series(1, 100) i; +select count(*) from data_correctness_detect_randomly; +insert into data_correctness_detect_replicated select i, i from generate_series(1, 100) i; +select count(*) from data_correctness_detect_replicated; +set gp_detect_data_correctness = off; + +-- insert some data that not belongs to it +1U: insert into data_correctness_detect select i, i from generate_series(1, 100) i; +1U: insert into data_correctness_detect_randomly select i, i from generate_series(1, 100) i; +1U: insert into data_correctness_detect_replicated select i, i from generate_series(1, 100) i; +set gp_detect_data_correctness = on; +insert into data_correctness_detect select * from data_correctness_detect; +insert into data_correctness_detect select * from data_correctness_detect_randomly; +insert into data_correctness_detect select * from data_correctness_detect_replicated; + +-- clean up +set gp_detect_data_correctness = off; +drop table data_correctness_detect; +drop table data_correctness_detect_randomly; +drop table data_correctness_detect_replicated; + diff --git a/src/test/isolation2/sql/ic_proxy_listen_failed.sql b/src/test/isolation2/sql/ic_proxy_listen_failed.sql new file mode 100644 index 000000000000..8ac255879cf3 --- /dev/null +++ b/src/test/isolation2/sql/ic_proxy_listen_failed.sql @@ -0,0 +1,33 @@ +-- Test case for the scenario which ic-proxy peer listener port has been occupied + +-- start_matchsubs +-- m/ic_tcp.c:\d+/ +-- s/ic_tcp.c:\d+/ic_tcp.c:LINE/ +-- end_matchsubs + +1:create table PR_16438 (i int); +1:insert into PR_16438 select generate_series(1,100); +1q: + +-- get one port and occupy it (start_py_httpserver.sh), then restart cluster +!\retcode ic_proxy_port=`psql postgres -Atc "show gp_interconnect_proxy_addresses;" | awk -F ',' '{print $1}' | awk -F ':' '{print $4}'` && gpstop -ai > /dev/null && ./script/start_py_httpserver.sh $ic_proxy_port; +!\retcode sleep 2 && gpstart -a > /dev/null; + +-- this output is hard to match, let's ignore it +-- start_ignore +2&:select count(*) from PR_16438; +2<: +2q: +-- end_ignore + +-- execute a query (should failed) +3:select count(*) from PR_16438; + +-- kill the script to release port and execute query again (should successfully) +-- Note: different from 7x here, we have to restart cluster (no need in 7x) +-- because 6x's icproxy code doesn't align with 7x: https://github.com/greenplum-db/gpdb/issues/14485 +!\retcode ps aux | grep SimpleHTTPServer | grep -v grep | awk '{print $2}' | xargs kill; +!\retcode sleep 2 && gpstop -ari > /dev/null; + +4:select count(*) from PR_16438; +4:drop table PR_16438; diff --git a/src/test/isolation2/sql/misc.sql b/src/test/isolation2/sql/misc.sql index 9c02970b35fc..248a47fb5e34 100644 --- a/src/test/isolation2/sql/misc.sql +++ b/src/test/isolation2/sql/misc.sql @@ -38,3 +38,17 @@ -- 0U: create table utilitymode_pt_lt_tab (col1 int, col2 decimal) distributed by (col1) partition by list(col2) (partition part1 values(1)); + +-- +-- gp_check_orphaned_files should not be running with concurrent transaction (even idle) +-- +-- use a different database to do the test, otherwise we might be reporting tons +-- of orphaned files produced by the many intential PANICs/restarts in the isolation2 tests. +create database check_orphaned_db; +1:@db_name check_orphaned_db: create extension gp_check_functions; +1:@db_name check_orphaned_db: begin; +2:@db_name check_orphaned_db: select * from gp_check_orphaned_files; +1q: +2q: + +drop database check_orphaned_db; diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 7a64a7663267..57f0cf351c24 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -609,6 +609,9 @@ Restoring WAL segments from archives using restore_command can be enabled by passing the keyword parameter has_restoring => 1. This is disabled by default. +If has_restoring is used, standby mode is used by default. To use +recovery mode instead, pass the keyword parameter standby => 0. + The backup is copied, leaving the original unmodified. pg_hba.conf is unconditionally set to enable replication connections. @@ -625,6 +628,7 @@ sub init_from_backup $params{has_streaming} = 0 unless defined $params{has_streaming}; $params{has_restoring} = 0 unless defined $params{has_restoring}; + $params{standby} = 1 unless defined $params{standby}; print "# Initializing node \"$node_name\" from backup \"$backup_name\" of node \"$root_name\"\n"; @@ -655,7 +659,7 @@ port = $port "unix_socket_directories = '$host'"); } $self->enable_streaming($root_node) if $params{has_streaming}; - $self->enable_restoring($root_node) if $params{has_restoring}; + $self->enable_restoring($root_node, $params{standby}) if $params{has_restoring}; } =pod @@ -849,7 +853,7 @@ standby_mode=on # Internal routine to enable archive recovery command on a standby node sub enable_restoring { - my ($self, $root_node) = @_; + my ($self, $root_node, $standby) = @_; my $path = TestLib::perl2host($root_node->archive_dir); my $name = $self->name; @@ -870,8 +874,9 @@ sub enable_restoring $self->append_conf( 'recovery.conf', qq( restore_command = '$copy_command' -standby_mode = on +standby_mode = $standby )); + return; } # Internal routine to enable archiving diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl index 2e967b6cbe4f..659781245db8 100644 --- a/src/test/recovery/t/003_recovery_targets.pl +++ b/src/test/recovery/t/003_recovery_targets.pl @@ -3,7 +3,8 @@ use warnings; use PostgresNode; use TestLib; -use Test::More tests =>6; +use Test::More tests =>7; +use Time::HiRes qw(usleep); # Create and test a standby from given backup, with a certain recovery target. # Choose $until_lsn later than the transaction commit that causes the row @@ -128,3 +129,24 @@ sub test_recovery_standby "recovery_target_time = '$recovery_time'"); test_recovery_standby('multiple conflicting settings', 'standby_6', $node_master, \@recovery_params, "3000", $lsn3); + +# Check behavior when recovery ends before target is reached + +my $node_standby = get_new_node('standby_8'); +$node_standby->init_from_backup($node_master, 'my_backup', + has_restoring => 1, standby => 0); +$node_standby->append_conf('recovery.conf', + "recovery_target_name = 'does_not_exist'"); +run_log(['pg_ctl', '-w', '-D', $node_standby->data_dir, '-l', + $node_standby->logfile, '-o', "-c gp_role=utility --gp_dbid=$node_standby->{_dbid} --gp_contentid=0", + 'start']); + +# wait up to 180s for postgres to terminate +foreach my $i (0..1800) +{ + last if ! -f $node_standby->data_dir . '/postmaster.pid'; + usleep(100_000); +} +my $logfile = slurp_file($node_standby->logfile()); +ok($logfile =~ qr/FATAL: recovery ended before configured recovery target was reached/, + 'recovery end before target reached is a fatal error'); diff --git a/src/test/regress/expected/.gitignore b/src/test/regress/expected/.gitignore index a180015d21dd..cbfe088dca04 100644 --- a/src/test/regress/expected/.gitignore +++ b/src/test/regress/expected/.gitignore @@ -14,6 +14,7 @@ dispatch.out external_table.out filespace.out gpcopy.out +gp_check_files.out gptokencheck.out /gp_transactions.out /gp_tablespace_with_faults.out diff --git a/src/test/regress/expected/bfv_olap.out b/src/test/regress/expected/bfv_olap.out index cc7cb31136e7..4e4512653e8d 100644 --- a/src/test/regress/expected/bfv_olap.out +++ b/src/test/regress/expected/bfv_olap.out @@ -638,7 +638,6 @@ select * from (select sum(a.salary) over(), count(*) 2100 | 1 (2 rows) --- this query currently falls back, needs to be fixed select (select rn from (select row_number() over () as rn, name from t1_github_issue_10143 where code = a.code diff --git a/src/test/regress/expected/bfv_olap_optimizer.out b/src/test/regress/expected/bfv_olap_optimizer.out index bf4134c10e26..068ef921da6f 100644 --- a/src/test/regress/expected/bfv_olap_optimizer.out +++ b/src/test/regress/expected/bfv_olap_optimizer.out @@ -638,7 +638,6 @@ select * from (select sum(a.salary) over(), count(*) 2100 | 1 (2 rows) --- this query currently falls back, needs to be fixed select (select rn from (select row_number() over () as rn, name from t1_github_issue_10143 where code = a.code diff --git a/src/test/regress/expected/bfv_planner.out b/src/test/regress/expected/bfv_planner.out index 6e4f499ae099..d648bcbab34c 100644 --- a/src/test/regress/expected/bfv_planner.out +++ b/src/test/regress/expected/bfv_planner.out @@ -566,6 +566,158 @@ explain (costs off) select * from t_hashdist cross join (select * from generate_ Optimizer: Postgres query optimizer (8 rows) +set gp_cte_sharing = on; +-- ensure that the volatile function is executed on one segment if it is in the CTE target list +explain (costs off, verbose) with cte as ( + select a * random() as a from generate_series(1, 5) a +) +select * from cte join (select * from t_hashdist join cte using(a)) b using(a); + QUERY PLAN +--------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice3; segments: 3) + Output: share0_ref1.a, b.b, b.c + -> Hash Join + Output: share0_ref1.a, b.b, b.c + Hash Cond: (b.a = share0_ref1.a) + -> Subquery Scan on b + Output: b.b, b.c, b.a + -> Hash Join + Output: share0_ref2.a, t_hashdist.b, t_hashdist.c + Hash Cond: ((t_hashdist.a)::double precision = share0_ref2.a) + -> Seq Scan on public.t_hashdist + Output: t_hashdist.b, t_hashdist.c, t_hashdist.a + -> Hash + Output: share0_ref2.a + -> Broadcast Motion 1:3 (slice1; segments: 1) + Output: share0_ref2.a + -> Shared Scan (share slice:id 1:0) + Output: share0_ref2.a + -> Hash + Output: share0_ref1.a + -> Broadcast Motion 1:3 (slice2; segments: 1) + Output: share0_ref1.a + -> Shared Scan (share slice:id 2:0) + Output: share0_ref1.a + -> Materialize + Output: (((a.a)::double precision * random())) + -> Function Scan on pg_catalog.generate_series a + Output: ((a.a)::double precision * random()) + Function Call: generate_series(1, 5) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, gp_cte_sharing=on, optimizer=off +(31 rows) + +set gp_cte_sharing = off; +explain (costs off, verbose) with cte as ( + select a, a * random() from generate_series(1, 5) a +) +select * from cte join t_hashdist using(a); + QUERY PLAN +----------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + Output: a.a, (((a.a)::double precision * random())), t_hashdist.b, t_hashdist.c + -> Hash Join + Output: a.a, (((a.a)::double precision * random())), t_hashdist.b, t_hashdist.c + Hash Cond: (t_hashdist.a = a.a) + -> Seq Scan on public.t_hashdist + Output: t_hashdist.b, t_hashdist.c, t_hashdist.a + -> Hash + Output: a.a, (((a.a)::double precision * random())) + -> Redistribute Motion 1:3 (slice1; segments: 1) + Output: a.a, (((a.a)::double precision * random())) + Hash Key: a.a + -> Function Scan on pg_catalog.generate_series a + Output: a.a, ((a.a)::double precision * random()) + Function Call: generate_series(1, 5) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, gp_cte_sharing=off, optimizer=off +(17 rows) + +reset gp_cte_sharing; +-- ensure that the volatile function is executed on one segment if it is in the union target list +explain (costs off, verbose) select * from ( + select random() as a from generate_series(1, 5) + union + select random() as a from generate_series(1, 5) +) +a join t_hashdist on a.a = t_hashdist.a; + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + Output: (random()), t_hashdist.a, t_hashdist.b, t_hashdist.c + -> Hash Join + Output: (random()), t_hashdist.a, t_hashdist.b, t_hashdist.c + Hash Cond: ((t_hashdist.a)::double precision = (random())) + -> Seq Scan on public.t_hashdist + Output: t_hashdist.a, t_hashdist.b, t_hashdist.c + -> Hash + Output: (random()) + -> Broadcast Motion 1:3 (slice1; segments: 1) + Output: (random()) + -> HashAggregate + Output: (random()) + Group Key: (random()) + -> Append + -> Function Scan on pg_catalog.generate_series + Output: random() + Function Call: generate_series(1, 5) + -> Function Scan on pg_catalog.generate_series generate_series_1 + Output: random() + Function Call: generate_series(1, 5) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, optimizer=off +(23 rows) + +-- ensure that the volatile function is executed on one segment if it is in target list of subplan of multiset function +explain (costs off, verbose) select * from ( + SELECT count(*) as a FROM anytable_out( TABLE( SELECT random()::int from generate_series(1, 5) a ) ) +) a join t_hashdist using(a); + QUERY PLAN +----------------------------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + Output: (count(*)), t_hashdist.b, t_hashdist.c + -> Hash Join + Output: (count(*)), t_hashdist.b, t_hashdist.c + Hash Cond: (t_hashdist.a = (count(*))) + -> Seq Scan on public.t_hashdist + Output: t_hashdist.b, t_hashdist.c, t_hashdist.a + -> Hash + Output: (count(*)) + -> Redistribute Motion 1:3 (slice1; segments: 1) + Output: (count(*)) + Hash Key: (count(*)) + -> Aggregate + Output: count(*) + -> Table Function Scan on pg_catalog.anytable_out + Output: anytable_out + -> Function Scan on pg_catalog.generate_series a + Output: (random())::integer + Function Call: generate_series(1, 5) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, optimizer=off +(21 rows) + +-- if there is a volatile function in the target list of a plan with the locus type +-- General or Segment General, then such a plan should be executed on single +-- segment, since it is assumed that nodes with such locus types will give the same +-- result on all segments, which is impossible for a volatile function. +-- start_ignore +drop table if exists d; +-- end_ignore +create table d (b int, a int default 1) distributed by (b); +insert into d select * from generate_series(0, 20) j; +-- change distribution without reorganize +alter table d set distributed randomly; +with cte as ( + select a as a, a * random() as rand from generate_series(0, 3)a +) +select count(distinct(rand)) from cte join d on cte.a = d.a; + count +------- + 1 +(1 row) + +drop table d; -- CTAS on general locus into replicated table create temp SEQUENCE test_seq; explain (costs off) create table t_rep as select nextval('test_seq') from (select generate_series(1,10)) t1 distributed replicated; diff --git a/src/test/regress/expected/gp_locale.out b/src/test/regress/expected/gp_locale.out new file mode 100644 index 000000000000..0d916a93d70c --- /dev/null +++ b/src/test/regress/expected/gp_locale.out @@ -0,0 +1,90 @@ +-- ORCA uses functions (e.g. vswprintf) to translation to wide character +-- format. But those libraries may fail if the current locale cannot handle the +-- character set. This test checks that even when those libraries fail, ORCA is +-- still able to generate plans. +-- +-- Create a database that sets the minimum locale +-- +DROP DATABASE IF EXISTS test_locale; +CREATE DATABASE test_locale WITH LC_COLLATE='C' LC_CTYPE='C' TEMPLATE=template0; +\c test_locale +-- +-- drop/add/remove columns +-- +CREATE TABLE hi_안녕세계 (a int, 안녕세계1 text, 안녕세계2 text, 안녕세계3 text) DISTRIBUTED BY (a); +ALTER TABLE hi_안녕세계 DROP COLUMN 안녕세계2; +ALTER TABLE hi_안녕세계 ADD COLUMN 안녕세계2_ADD_COLUMN text; +ALTER TABLE hi_안녕세계 RENAME COLUMN 안녕세계3 TO こんにちわ3; +INSERT INTO hi_안녕세계 VALUES(1, '안녕세계1 first', '안녕세2 first', '안녕세계3 first'); +INSERT INTO hi_안녕세계 VALUES(42, '안녕세계1 second', '안녕세2 second', '안녕세계3 second'); +-- +-- Try various queries containing multibyte character set and check the column +-- name output +-- +SET optimizer_trace_fallback=on; +-- DELETE +DELETE FROM hi_안녕세계 WHERE a=42; +-- UPDATE +UPDATE hi_안녕세계 SET 안녕세계1='안녕세계1 first UPDATE' WHERE 안녕세계1='안녕세계1 first'; +-- SELECT +SELECT * FROM hi_안녕세계; + a | 안녕세계1 | こんにちわ3 | 안녕세계2_add_column +---+------------------------+---------------+---------------------- + 1 | 안녕세계1 first UPDATE | 안녕세2 first | 안녕세계3 first +(1 row) + +SELECT 안녕세계1 || こんにちわ3 FROM hi_안녕세계; + ?column? +------------------------------------- + 안녕세계1 first UPDATE안녕세2 first +(1 row) + +-- SELECT ALIAS +SELECT 안녕세계1 AS 안녕세계1_Alias FROM hi_안녕세계; + 안녕세계1_alias +------------------------ + 안녕세계1 first UPDATE +(1 row) + +-- SUBQUERY +SELECT * FROM (SELECT 안녕세계1 FROM hi_안녕세계) t; + 안녕세계1 +------------------------ + 안녕세계1 first UPDATE +(1 row) + +SELECT (SELECT こんにちわ3 FROM hi_안녕세계) FROM (SELECT 1) AS q; + こんにちわ3 +--------------- + 안녕세2 first +(1 row) + +SELECT (SELECT (SELECT こんにちわ3 FROM hi_안녕세계) FROM hi_안녕세계) FROM (SELECT 1) AS q; + こんにちわ3 +--------------- + 안녕세2 first +(1 row) + +-- CTE +WITH cte AS +(SELECT 안녕세계1, こんにちわ3 FROM hi_안녕세계) SELECT * FROM cte WHERE 안녕세계1 LIKE '안녕세계1%'; + 안녕세계1 | こんにちわ3 +------------------------+--------------- + 안녕세계1 first UPDATE | 안녕세2 first +(1 row) + +WITH cte(안녕세계x, こんにちわx) AS +(SELECT 안녕세계1, こんにちわ3 FROM hi_안녕세계) SELECT * FROM cte WHERE 안녕세계x LIKE '안녕세계1%'; + 안녕세계x | こんにちわx +------------------------+--------------- + 안녕세계1 first UPDATE | 안녕세2 first +(1 row) + +-- JOIN +SELECT * FROM hi_안녕세계 hi_안녕세계1, hi_안녕세계 hi_안녕세계2 WHERE hi_안녕세계1.안녕세계1 LIKE '%UPDATE'; + a | 안녕세계1 | こんにちわ3 | 안녕세계2_add_column | a | 안녕세계1 | こんにちわ3 | 안녕세계2_add_column +---+------------------------+---------------+----------------------+---+------------------------+---------------+---------------------- + 1 | 안녕세계1 first UPDATE | 안녕세2 first | 안녕세계3 first | 1 | 안녕세계1 first UPDATE | 안녕세2 first | 안녕세계3 first +(1 row) + +RESET optimizer_trace_fallback; diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out index 9227a113dc05..f1f67c78434c 100644 --- a/src/test/regress/expected/matview.out +++ b/src/test/regress/expected/matview.out @@ -649,3 +649,22 @@ distributed randomly; refresh materialized view mat_view_github_issue_11956; drop materialized view mat_view_github_issue_11956; drop table t_github_issue_11956; +-- test REFRESH MATERIALIZED VIEW on AO table with index +-- more details could be found at https://github.com/greenplum-db/gpdb/issues/16447 +CREATE TABLE base_table (idn character varying(10) NOT NULL); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'idn' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +INSERT INTO base_table select i from generate_series(1, 5000) i; +CREATE MATERIALIZED VIEW base_view WITH (APPENDONLY=true) AS SELECT tt1.idn AS idn_ban FROM base_table tt1; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'idn_ban' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +CREATE INDEX test_id1 on base_view using btree(idn_ban); +REFRESH MATERIALIZED VIEW base_view ; +SELECT * FROM base_view where idn_ban = '10'; + idn_ban +--------- + 10 +(1 row) + +DROP MATERIALIZED VIEW base_view; +DROP TABLE base_table; diff --git a/src/test/regress/expected/matview_optimizer.out b/src/test/regress/expected/matview_optimizer.out index af498f2ef712..a4efb1caa6fd 100644 --- a/src/test/regress/expected/matview_optimizer.out +++ b/src/test/regress/expected/matview_optimizer.out @@ -650,3 +650,21 @@ distributed randomly; refresh materialized view mat_view_github_issue_11956; drop materialized view mat_view_github_issue_11956; drop table t_github_issue_11956; +-- test REFRESH MATERIALIZED VIEW on AO table with index +-- more details could be found at https://github.com/greenplum-db/gpdb/issues/16447 +CREATE TABLE base_table (idn character varying(10) NOT NULL); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'idn' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +INSERT INTO base_table select i from generate_series(1, 5000) i; +CREATE MATERIALIZED VIEW base_view WITH (APPENDONLY=true) AS SELECT tt1.idn AS idn_ban FROM base_table tt1; +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry. +CREATE INDEX test_id1 on base_view using btree(idn_ban); +REFRESH MATERIALIZED VIEW base_view ; +SELECT * FROM base_view where idn_ban = '10'; + idn_ban +--------- + 10 +(1 row) + +DROP MATERIALIZED VIEW base_view; +DROP TABLE base_table; diff --git a/src/test/regress/expected/qp_dropped_cols.out b/src/test/regress/expected/qp_dropped_cols.out index c0aa54e30f20..bdfc308ae2db 100644 --- a/src/test/regress/expected/qp_dropped_cols.out +++ b/src/test/regress/expected/qp_dropped_cols.out @@ -16360,3 +16360,302 @@ DELETE FROM dist_key_dropped_pt WHERE b=6; -- the tables, or the pg_upgrade test fails. set client_min_messages='warning'; drop schema qp_dropped_cols cascade; +-- Test modifying DML on leaf partition when parent has dropped columns and +-- the partition has not. Ensure that DML commands pass without execution +-- errors and produce valid results. +RESET search_path; +-- start_ignore +DROP TABLE IF EXISTS t_part_dropped; +-- end_ignore +CREATE TABLE t_part_dropped (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); +ALTER TABLE t_part_dropped DROP c2; +ALTER TABLE t_part_dropped ADD PARTITION p2 VALUES (2); +-- Partition selection should go smoothly when inserting into leaf +-- partition with different attribute structure. +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_dropped VALUES (1, 2, 4); + QUERY PLAN +---------------------------------------- + Insert on public.t_part_dropped + -> Result + Output: 1, NULL::integer, 2, 4 + Optimizer: Postgres query optimizer + Settings: optimizer=off +(5 rows) + +INSERT INTO t_part_dropped VALUES (1, 2, 4); +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 4); + QUERY PLAN +------------------------------------------ + Insert on public.t_part_dropped_1_prt_p2 + -> Result + Output: 1, 2, 4 + Optimizer: Postgres query optimizer + Settings: optimizer=off +(5 rows) + +INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 4); +INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 0); +-- Ensure that split update on leaf and root partitions does not +-- throw partition selection error in both planners. +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped_1_prt_p2 SET c1 = 2; + QUERY PLAN +--------------------------------------------------------------------- + Update on public.t_part_dropped_1_prt_p2 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: 2, c3, c4, c1, ctid, gp_segment_id, (DMLAction) + Hash Key: c1 + -> Split + Output: 2, c3, c4, c1, ctid, gp_segment_id, DMLAction + -> Seq Scan on public.t_part_dropped_1_prt_p2 + Output: 2, c3, c4, c1, ctid, gp_segment_id + Optimizer: Postgres query optimizer + Settings: optimizer=off +(10 rows) + +UPDATE t_part_dropped_1_prt_p2 SET c1 = 2; +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped SET c1 = 3; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + Update on public.t_part_dropped_1_prt_p0 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: 3, NULL::integer, t_part_dropped_1_prt_p0.c3, t_part_dropped_1_prt_p0.c4, t_part_dropped_1_prt_p0.c1, t_part_dropped_1_prt_p0.ctid, t_part_dropped_1_prt_p0.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, NULL::integer, t_part_dropped_1_prt_p0.c3, t_part_dropped_1_prt_p0.c4, t_part_dropped_1_prt_p0.c1, t_part_dropped_1_prt_p0.ctid, t_part_dropped_1_prt_p0.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_dropped_1_prt_p0 + Output: 3, NULL::integer, t_part_dropped_1_prt_p0.c3, t_part_dropped_1_prt_p0.c4, t_part_dropped_1_prt_p0.c1, t_part_dropped_1_prt_p0.ctid, t_part_dropped_1_prt_p0.gp_segment_id + -> Redistribute Motion 3:3 (slice2; segments: 3) + Output: 3, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_dropped_1_prt_p2 + Output: 3, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id + Optimizer: Postgres query optimizer + Settings: optimizer=off +(17 rows) + +UPDATE t_part_dropped SET c1 = 3; +-- Ensure that split update on leaf partition does not throw constraint error +-- (executor does not choose the wrong partition at insert stage of update). +INSERT INTO t_part_dropped VALUES (1, 2, 0); +UPDATE t_part_dropped_1_prt_p2 SET c1 = 2 WHERE c4 = 0; +SELECT count(*) FROM t_part_dropped_1_prt_p2; + count +------- + 4 +(1 row) + +-- Split update on root relation should choose the correct partition +-- at insert (executor doesn't put the tuple to wrong partition for legacy +-- planner case). +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped SET c1 = 3 WHERE c4 = 0; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + Update on public.t_part_dropped_1_prt_p0 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: 3, NULL::integer, t_part_dropped_1_prt_p0.c3, t_part_dropped_1_prt_p0.c4, t_part_dropped_1_prt_p0.c1, t_part_dropped_1_prt_p0.ctid, t_part_dropped_1_prt_p0.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, NULL::integer, t_part_dropped_1_prt_p0.c3, t_part_dropped_1_prt_p0.c4, t_part_dropped_1_prt_p0.c1, t_part_dropped_1_prt_p0.ctid, t_part_dropped_1_prt_p0.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_dropped_1_prt_p0 + Output: 3, NULL::integer, t_part_dropped_1_prt_p0.c3, t_part_dropped_1_prt_p0.c4, t_part_dropped_1_prt_p0.c1, t_part_dropped_1_prt_p0.ctid, t_part_dropped_1_prt_p0.gp_segment_id + Filter: (t_part_dropped_1_prt_p0.c4 = 0) + -> Redistribute Motion 3:3 (slice2; segments: 3) + Output: 3, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_dropped_1_prt_p2 + Output: 3, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id + Filter: (t_part_dropped_1_prt_p2.c4 = 0) + Optimizer: Postgres query optimizer + Settings: optimizer=off +(19 rows) + +UPDATE t_part_dropped SET c1 = 3 WHERE c4 = 0; +SELECT count(*) FROM t_part_dropped_1_prt_p2; + count +------- + 4 +(1 row) + +SELECT * FROM t_part_dropped_1_prt_p0; + c1 | c3 | c4 +----+----+---- +(0 rows) + +-- For ORCA the partition selection error should not occur. +EXPLAIN (COSTS OFF, VERBOSE) DELETE FROM t_part_dropped_1_prt_p2; + QUERY PLAN +-------------------------------------------------- + Delete on public.t_part_dropped_1_prt_p2 + -> Seq Scan on public.t_part_dropped_1_prt_p2 + Output: ctid, gp_segment_id + Optimizer: Postgres query optimizer + Settings: optimizer=off +(5 rows) + +DELETE FROM t_part_dropped_1_prt_p2; +DROP TABLE t_part_dropped; +-- Test modifying DML on leaf partition after it was exchanged with a relation, +-- that contained dropped columns. Ensure that DML commands pass without +-- execution errors and produce valid results. +-- start_ignore +DROP TABLE IF EXISTS t_part; +DROP TABLE IF EXISTS t_new_part; +-- end_ignore +CREATE TABLE t_part (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); +ALTER TABLE t_part ADD PARTITION p2 VALUES (2); +CREATE TABLE t_new_part (c1 int, c11 int, c2 int, c3 int, c4 int); +ALTER TABLE t_new_part DROP c11; +ALTER TABLE t_part EXCHANGE PARTITION FOR (2) WITH TABLE t_new_part; +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part VALUES (1, 5, 2, 5); + QUERY PLAN +------------------------------------- + Insert on public.t_part + -> Result + Output: 1, 5, 2, 5 + Optimizer: Postgres query optimizer + Settings: optimizer=off +(5 rows) + +INSERT INTO t_part VALUES (1, 5, 2, 5); +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_1_prt_p2 VALUES (1, 5, 2, 5); + QUERY PLAN +------------------------------------------- + Insert on public.t_part_1_prt_p2 + -> Result + Output: 1, NULL::integer, 5, 2, 5 + Optimizer: Postgres query optimizer + Settings: optimizer=off +(5 rows) + +INSERT INTO t_part_1_prt_p2 VALUES (1, 5, 2, 5); +-- Ensure that split update on leaf and root partitions does not +-- throw partition selection error in both planners. +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_1_prt_p2 SET c1 = 2; + QUERY PLAN +---------------------------------------------------------------------------------------- + Update on public.t_part_1_prt_p2 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: 2, NULL::integer, c2, c3, c4, c1, ctid, gp_segment_id, (DMLAction) + Hash Key: c1 + -> Split + Output: 2, NULL::integer, c2, c3, c4, c1, ctid, gp_segment_id, DMLAction + -> Seq Scan on public.t_part_1_prt_p2 + Output: 2, NULL::integer, c2, c3, c4, c1, ctid, gp_segment_id + Optimizer: Postgres query optimizer + Settings: optimizer=off +(10 rows) + +UPDATE t_part_1_prt_p2 SET c1 = 2; +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part SET c1 = 3; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Update on public.t_part_1_prt_p0 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: 3, t_part_1_prt_p0.c2, t_part_1_prt_p0.c3, t_part_1_prt_p0.c4, t_part_1_prt_p0.c1, t_part_1_prt_p0.ctid, t_part_1_prt_p0.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, t_part_1_prt_p0.c2, t_part_1_prt_p0.c3, t_part_1_prt_p0.c4, t_part_1_prt_p0.c1, t_part_1_prt_p0.ctid, t_part_1_prt_p0.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_1_prt_p0 + Output: 3, t_part_1_prt_p0.c2, t_part_1_prt_p0.c3, t_part_1_prt_p0.c4, t_part_1_prt_p0.c1, t_part_1_prt_p0.ctid, t_part_1_prt_p0.gp_segment_id + -> Redistribute Motion 3:3 (slice2; segments: 3) + Output: 3, NULL::integer, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.c1, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, NULL::integer, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.c1, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_1_prt_p2 + Output: 3, NULL::integer, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.c1, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id + Optimizer: Postgres query optimizer + Settings: optimizer=off +(17 rows) + +UPDATE t_part SET c1 = 3; +-- Ensure that split update on leaf partition does not throw constraint error +-- (executor does not choose the wrong partition at insert stage of update). +INSERT INTO t_part VALUES (1, 0, 2, 0); +UPDATE t_part_1_prt_p2 SET c1 = 2 WHERE c4 = 0; +SELECT count(*) FROM t_part_1_prt_p2; + count +------- + 3 +(1 row) + +-- For ORCA the partition selection error should not occur. +EXPLAIN (COSTS OFF, VERBOSE) DELETE FROM t_part_1_prt_p2; + QUERY PLAN +------------------------------------------ + Delete on public.t_part_1_prt_p2 + -> Seq Scan on public.t_part_1_prt_p2 + Output: ctid, gp_segment_id + Optimizer: Postgres query optimizer + Settings: optimizer=off +(5 rows) + +DELETE FROM t_part_1_prt_p2; +DROP TABLE t_part; +DROP TABLE t_new_part; +-- Test split update execution of a plan from legacy planner in case +-- when parent relation has several partitions, and one of them has +-- physically-different attribute structure from parent's due to +-- dropped columns. Ensure that split update does not reconstruct tuple +-- of correct (without dropped attributes) partition. +CREATE TABLE t_part (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); +-- Legacy planner UPDATE's plan consists of several subplans (partitioned +-- relations are considered in inheritance planner), and their execution +-- order varies depending on the order the partitions have been added. +-- Therefore, we add each partition through EXCHANGE to get UPDATE's +-- test plan in a form such that the t_new_part0 update comes first, and the +-- t_new_part2 comes second. This aspect is crucial because executor's +-- partitions related logic depended on that fact, what led to the +-- issue this test demonstrates. +-- This paritition is not compatible with the parent due to dropped columns +CREATE TABLE t_new_part0 (c1 int, c11 int, c2 int, c3 int, c4 int); +ALTER TABLE t_new_part0 drop c11; +ALTER TABLE t_part EXCHANGE PARTITION FOR (0) WITH TABLE t_new_part0; +-- This partition is compatible with the parent. +ALTER TABLE t_part ADD PARTITION p2 VALUES (2); +CREATE TABLE t_new_part2 (c1 int, c2 int, c3 int, c4 int); +ALTER TABLE t_part EXCHANGE PARTITION FOR (2) WITH TABLE t_new_part2; +-- Insert into correct partition, and perform split update on root, +-- that will execute split update on each subplan in case of inheritance +-- plan (legacy planner). Ensure that split update does not reconstruct the +-- tuple at insert. +INSERT INTO t_part VALUES (1, 4, 2, 2); +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part SET c1 = 3; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Update on public.t_part_1_prt_p0 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: 3, NULL::integer, t_part_1_prt_p0.c2, t_part_1_prt_p0.c3, t_part_1_prt_p0.c4, t_part_1_prt_p0.c1, t_part_1_prt_p0.ctid, t_part_1_prt_p0.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, NULL::integer, t_part_1_prt_p0.c2, t_part_1_prt_p0.c3, t_part_1_prt_p0.c4, t_part_1_prt_p0.c1, t_part_1_prt_p0.ctid, t_part_1_prt_p0.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_1_prt_p0 + Output: 3, NULL::integer, t_part_1_prt_p0.c2, t_part_1_prt_p0.c3, t_part_1_prt_p0.c4, t_part_1_prt_p0.c1, t_part_1_prt_p0.ctid, t_part_1_prt_p0.gp_segment_id + -> Redistribute Motion 3:3 (slice2; segments: 3) + Output: 3, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.c1, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id, (DMLAction) + Hash Key: "outer".c1 + -> Split + Output: 3, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.c1, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id, DMLAction + -> Seq Scan on public.t_part_1_prt_p2 + Output: 3, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.c1, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id + Optimizer: Postgres query optimizer + Settings: optimizer=off +(17 rows) + +UPDATE t_part SET c1 = 3; +SELECT * FROM t_part_1_prt_p2; + c1 | c2 | c3 | c4 +----+----+----+---- + 3 | 4 | 2 | 2 +(1 row) + +DROP TABLE t_part; +DROP TABLE t_new_part0; +DROP TABLE t_new_part2; diff --git a/src/test/regress/expected/qp_dropped_cols_optimizer.out b/src/test/regress/expected/qp_dropped_cols_optimizer.out index 6ee68fc7f4a6..ff72f0edf2b6 100644 --- a/src/test/regress/expected/qp_dropped_cols_optimizer.out +++ b/src/test/regress/expected/qp_dropped_cols_optimizer.out @@ -16273,3 +16273,329 @@ DELETE FROM dist_key_dropped_pt WHERE b=6; -- the tables, or the pg_upgrade test fails. set client_min_messages='warning'; drop schema qp_dropped_cols cascade; +-- Test modifying DML on leaf partition when parent has dropped columns and +-- the partition has not. Ensure that DML commands pass without execution +-- errors and produce valid results. +RESET search_path; +-- start_ignore +DROP TABLE IF EXISTS t_part_dropped; +-- end_ignore +CREATE TABLE t_part_dropped (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); +ALTER TABLE t_part_dropped DROP c2; +ALTER TABLE t_part_dropped ADD PARTITION p2 VALUES (2); +-- Partition selection should go smoothly when inserting into leaf +-- partition with different attribute structure. +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_dropped VALUES (1, 2, 4); + QUERY PLAN +-------------------------------------------------- + Insert + Output: c1, NULL::integer, c3, c4, ColRef_0004 + -> Result + Output: c1, c3, c4, 1 + -> Result + Output: c1, c3, c4 + -> Result + Output: 1, 2, 4 + -> Result + Output: true + Optimizer: Pivotal Optimizer (GPORCA) +(11 rows) + +INSERT INTO t_part_dropped VALUES (1, 2, 4); +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 4); + QUERY PLAN +---------------------------------------- + Insert + Output: c1, c3, c4, ColRef_0004 + -> Result + Output: c1, c3, c4, 1 + -> Result + Output: c1, c3, c4 + -> Result + Output: 1, 2, 4 + -> Result + Output: true + Optimizer: Pivotal Optimizer (GPORCA) +(11 rows) + +INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 4); +INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 0); +-- Ensure that split update on leaf and root partitions does not +-- throw partition selection error in both planners. +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped_1_prt_p2 SET c1 = 2; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Update + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, (DMLAction), t_part_dropped_1_prt_p2.ctid + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id, (DMLAction) + Hash Key: t_part_dropped_1_prt_p2.c1 + -> Split + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id, DMLAction + -> Result + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, 2, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id + -> Seq Scan on public.t_part_dropped_1_prt_p2 + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id + Optimizer: Pivotal Optimizer (GPORCA) +(12 rows) + +UPDATE t_part_dropped_1_prt_p2 SET c1 = 2; +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped SET c1 = 3; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------------------------------------------- + Update + Output: t_part_dropped.c1, NULL::integer, t_part_dropped.c3, t_part_dropped.c4, (DMLAction), t_part_dropped.ctid + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id, (DMLAction) + Hash Key: t_part_dropped.c1 + -> Split + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id, DMLAction + -> Result + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, 3, t_part_dropped.ctid, t_part_dropped.gp_segment_id + -> Sequence + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id + -> Partition Selector for t_part_dropped (dynamic scan id: 1) + Partitions selected: 2 (out of 2) + -> Dynamic Seq Scan on public.t_part_dropped (dynamic scan id: 1) + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id + Optimizer: Pivotal Optimizer (GPORCA) +(16 rows) + +UPDATE t_part_dropped SET c1 = 3; +-- Ensure that split update on leaf partition does not throw constraint error +-- (executor does not choose the wrong partition at insert stage of update). +INSERT INTO t_part_dropped VALUES (1, 2, 0); +UPDATE t_part_dropped_1_prt_p2 SET c1 = 2 WHERE c4 = 0; +SELECT count(*) FROM t_part_dropped_1_prt_p2; + count +------- + 4 +(1 row) + +-- Split update on root relation should choose the correct partition +-- at insert (executor doesn't put the tuple to wrong partition for legacy +-- planner case). +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped SET c1 = 3 WHERE c4 = 0; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------------------------------------------- + Update + Output: t_part_dropped.c1, NULL::integer, t_part_dropped.c3, t_part_dropped.c4, (DMLAction), t_part_dropped.ctid + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id, (DMLAction) + Hash Key: t_part_dropped.c1 + -> Split + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id, DMLAction + -> Result + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, 3, t_part_dropped.ctid, t_part_dropped.gp_segment_id + -> Sequence + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id + -> Partition Selector for t_part_dropped (dynamic scan id: 1) + Partitions selected: 2 (out of 2) + -> Dynamic Seq Scan on public.t_part_dropped (dynamic scan id: 1) + Output: t_part_dropped.c1, t_part_dropped.c3, t_part_dropped.c4, t_part_dropped.ctid, t_part_dropped.gp_segment_id + Filter: (t_part_dropped.c4 = 0) + Optimizer: Pivotal Optimizer (GPORCA) +(17 rows) + +UPDATE t_part_dropped SET c1 = 3 WHERE c4 = 0; +SELECT count(*) FROM t_part_dropped_1_prt_p2; + count +------- + 4 +(1 row) + +SELECT * FROM t_part_dropped_1_prt_p0; + c1 | c3 | c4 +----+----+---- +(0 rows) + +-- For ORCA the partition selection error should not occur. +EXPLAIN (COSTS OFF, VERBOSE) DELETE FROM t_part_dropped_1_prt_p2; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Delete + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, "outer".ColRef_0010, t_part_dropped_1_prt_p2.ctid + -> Result + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id, 0 + -> Seq Scan on public.t_part_dropped_1_prt_p2 + Output: t_part_dropped_1_prt_p2.c1, t_part_dropped_1_prt_p2.c3, t_part_dropped_1_prt_p2.c4, t_part_dropped_1_prt_p2.ctid, t_part_dropped_1_prt_p2.gp_segment_id + Optimizer: Pivotal Optimizer (GPORCA) +(7 rows) + +DELETE FROM t_part_dropped_1_prt_p2; +DROP TABLE t_part_dropped; +-- Test modifying DML on leaf partition after it was exchanged with a relation, +-- that contained dropped columns. Ensure that DML commands pass without +-- execution errors and produce valid results. +-- start_ignore +DROP TABLE IF EXISTS t_part; +DROP TABLE IF EXISTS t_new_part; +-- end_ignore +CREATE TABLE t_part (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); +ALTER TABLE t_part ADD PARTITION p2 VALUES (2); +CREATE TABLE t_new_part (c1 int, c11 int, c2 int, c3 int, c4 int); +ALTER TABLE t_new_part DROP c11; +ALTER TABLE t_part EXCHANGE PARTITION FOR (2) WITH TABLE t_new_part; +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part VALUES (1, 5, 2, 5); + QUERY PLAN +---------------------------------------- + Insert + Output: c1, c2, c3, c4, ColRef_0005 + -> Result + Output: c1, c2, c3, c4, 1 + -> Result + Output: c1, c2, c3, c4 + -> Result + Output: 1, 5, 2, 5 + -> Result + Output: true + Optimizer: Pivotal Optimizer (GPORCA) +(11 rows) + +INSERT INTO t_part VALUES (1, 5, 2, 5); +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_1_prt_p2 VALUES (1, 5, 2, 5); + QUERY PLAN +------------------------------------------------------ + Insert + Output: c1, NULL::integer, c2, c3, c4, ColRef_0005 + -> Result + Output: c1, c2, c3, c4, 1 + -> Result + Output: c1, c2, c3, c4 + -> Result + Output: 1, 5, 2, 5 + -> Result + Output: true + Optimizer: Pivotal Optimizer (GPORCA) +(11 rows) + +INSERT INTO t_part_1_prt_p2 VALUES (1, 5, 2, 5); +-- Ensure that split update on leaf and root partitions does not +-- throw partition selection error in both planners. +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_1_prt_p2 SET c1 = 2; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Update + Output: t_part_1_prt_p2.c1, NULL::integer, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, (DMLAction), t_part_1_prt_p2.ctid + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t_part_1_prt_p2.c1, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id, (DMLAction) + Hash Key: t_part_1_prt_p2.c1 + -> Split + Output: t_part_1_prt_p2.c1, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id, DMLAction + -> Result + Output: t_part_1_prt_p2.c1, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, 2, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id + -> Seq Scan on public.t_part_1_prt_p2 + Output: t_part_1_prt_p2.c1, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id + Optimizer: Pivotal Optimizer (GPORCA) +(12 rows) + +UPDATE t_part_1_prt_p2 SET c1 = 2; +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part SET c1 = 3; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------- + Update + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, (DMLAction), t_part.ctid + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id, (DMLAction) + Hash Key: t_part.c1 + -> Split + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id, DMLAction + -> Result + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, 3, t_part.ctid, t_part.gp_segment_id + -> Sequence + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id + -> Partition Selector for t_part (dynamic scan id: 1) + Partitions selected: 2 (out of 2) + -> Dynamic Seq Scan on public.t_part (dynamic scan id: 1) + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id + Optimizer: Pivotal Optimizer (GPORCA) +(16 rows) + +UPDATE t_part SET c1 = 3; +-- Ensure that split update on leaf partition does not throw constraint error +-- (executor does not choose the wrong partition at insert stage of update). +INSERT INTO t_part VALUES (1, 0, 2, 0); +UPDATE t_part_1_prt_p2 SET c1 = 2 WHERE c4 = 0; +SELECT count(*) FROM t_part_1_prt_p2; + count +------- + 3 +(1 row) + +-- For ORCA the partition selection error should not occur. +EXPLAIN (COSTS OFF, VERBOSE) DELETE FROM t_part_1_prt_p2; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------- + Delete + Output: t_part_1_prt_p2.c1, NULL::integer, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, "outer".ColRef_0011, t_part_1_prt_p2.ctid + -> Result + Output: t_part_1_prt_p2.c1, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id, 0 + -> Seq Scan on public.t_part_1_prt_p2 + Output: t_part_1_prt_p2.c1, t_part_1_prt_p2.c2, t_part_1_prt_p2.c3, t_part_1_prt_p2.c4, t_part_1_prt_p2.ctid, t_part_1_prt_p2.gp_segment_id + Optimizer: Pivotal Optimizer (GPORCA) +(7 rows) + +DELETE FROM t_part_1_prt_p2; +DROP TABLE t_part; +DROP TABLE t_new_part; +-- Test split update execution of a plan from legacy planner in case +-- when parent relation has several partitions, and one of them has +-- physically-different attribute structure from parent's due to +-- dropped columns. Ensure that split update does not reconstruct tuple +-- of correct (without dropped attributes) partition. +CREATE TABLE t_part (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); +-- Legacy planner UPDATE's plan consists of several subplans (partitioned +-- relations are considered in inheritance planner), and their execution +-- order varies depending on the order the partitions have been added. +-- Therefore, we add each partition through EXCHANGE to get UPDATE's +-- test plan in a form such that the t_new_part0 update comes first, and the +-- t_new_part2 comes second. This aspect is crucial because executor's +-- partitions related logic depended on that fact, what led to the +-- issue this test demonstrates. +-- This paritition is not compatible with the parent due to dropped columns +CREATE TABLE t_new_part0 (c1 int, c11 int, c2 int, c3 int, c4 int); +ALTER TABLE t_new_part0 drop c11; +ALTER TABLE t_part EXCHANGE PARTITION FOR (0) WITH TABLE t_new_part0; +-- This partition is compatible with the parent. +ALTER TABLE t_part ADD PARTITION p2 VALUES (2); +CREATE TABLE t_new_part2 (c1 int, c2 int, c3 int, c4 int); +ALTER TABLE t_part EXCHANGE PARTITION FOR (2) WITH TABLE t_new_part2; +-- Insert into correct partition, and perform split update on root, +-- that will execute split update on each subplan in case of inheritance +-- plan (legacy planner). Ensure that split update does not reconstruct the +-- tuple at insert. +INSERT INTO t_part VALUES (1, 4, 2, 2); +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part SET c1 = 3; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------- + Update + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, (DMLAction), t_part.ctid + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id, (DMLAction) + Hash Key: t_part.c1 + -> Split + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id, DMLAction + -> Result + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, 3, t_part.ctid, t_part.gp_segment_id + -> Sequence + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id + -> Partition Selector for t_part (dynamic scan id: 1) + Partitions selected: 2 (out of 2) + -> Dynamic Seq Scan on public.t_part (dynamic scan id: 1) + Output: t_part.c1, t_part.c2, t_part.c3, t_part.c4, t_part.ctid, t_part.gp_segment_id + Optimizer: Pivotal Optimizer (GPORCA) +(16 rows) + +UPDATE t_part SET c1 = 3; +SELECT * FROM t_part_1_prt_p2; + c1 | c2 | c3 | c4 +----+----+----+---- + 3 | 4 | 2 | 2 +(1 row) + +DROP TABLE t_part; +DROP TABLE t_new_part0; +DROP TABLE t_new_part2; diff --git a/src/test/regress/expected/rpt.out b/src/test/regress/expected/rpt.out index b0a595e468df..15e18ecb70ac 100644 --- a/src/test/regress/expected/rpt.out +++ b/src/test/regress/expected/rpt.out @@ -1350,9 +1350,68 @@ select c from rep_tab where c in (select distinct d from rand_tab); 2 (2 rows) +-- test for optimizer_enable_replicated_table +explain (costs off) select * from rep_tab; + QUERY PLAN +------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) + -> Seq Scan on rep_tab + Optimizer: Postgres query optimizer +(3 rows) + +set optimizer_enable_replicated_table=off; +set optimizer_trace_fallback=on; +explain (costs off) select * from rep_tab; + QUERY PLAN +------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) + -> Seq Scan on rep_tab + Optimizer: Postgres query optimizer +(3 rows) + +reset optimizer_trace_fallback; +reset optimizer_enable_replicated_table; +-- Ensure plan with Gather Motion node is generated. +drop table if exists t; +NOTICE: table "t" does not exist, skipping +create table t (i int, j int) distributed replicated; +insert into t values (1, 2); +explain (costs off) select j, (select j) AS "Correlated Field" from t; + QUERY PLAN +------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) + -> Seq Scan on t + SubPlan 1 (slice1; segments: 1) + -> Result + Optimizer: Postgres query optimizer +(5 rows) + +select j, (select j) AS "Correlated Field" from t; + j | Correlated Field +---+------------------ + 2 | 2 +(1 row) + +explain (costs off) select j, (select 5) AS "Uncorrelated Field" from t; + QUERY PLAN +------------------------------------------- + Gather Motion 1:1 (slice1; segments: 1) + -> Seq Scan on t + InitPlan 1 (returns $0) (slice2) + -> Result + Optimizer: Postgres query optimizer +(5 rows) + +select j, (select 5) AS "Uncorrelated Field" from t; + j | Uncorrelated Field +---+-------------------- + 2 | 5 +(1 row) + -- -- Check sub-selects with distributed replicated tables and volatile functions -- +drop table if exists t; create table t (i int) distributed replicated; create table t1 (a int) distributed by (a); create table t2 (a int, b float) distributed replicated; @@ -1513,6 +1572,130 @@ explain (costs off, verbose) select * from t1 where 1 <= ALL (select i from t gr Settings: enable_bitmapscan=off, enable_seqscan=off (20 rows) +set gp_cte_sharing = on; +-- ensure that the volatile function is executed on one segment if it is in the CTE target list +explain (costs off, verbose) with cte as ( + select a * random() as a from t2 +) +select * from cte join (select * from t1 join cte using(a)) b using(a); + QUERY PLAN +--------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice4; segments: 3) + Output: share0_ref1.a + -> Hash Join + Output: share0_ref1.a + Hash Cond: (share0_ref2.a = share0_ref1.a) + -> Hash Join + Output: share0_ref2.a + Hash Cond: ((t1.a)::double precision = share0_ref2.a) + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t1.a + Hash Key: (t1.a)::double precision + -> Seq Scan on rpt.t1 + Output: t1.a + -> Hash + Output: share0_ref2.a + -> Redistribute Motion 1:3 (slice2; segments: 1) + Output: share0_ref2.a + Hash Key: share0_ref2.a + -> Shared Scan (share slice:id 2:0) + Output: share0_ref2.a + -> Hash + Output: share0_ref1.a + -> Redistribute Motion 1:3 (slice3; segments: 1) + Output: share0_ref1.a + Hash Key: share0_ref1.a + -> Shared Scan (share slice:id 3:0) + Output: share0_ref1.a + -> Materialize + Output: (((t2.a)::double precision * random())) + -> Seq Scan on rpt.t2 + Output: ((t2.a)::double precision * random()) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, gp_cte_sharing=on, optimizer=off +(33 rows) + +set gp_cte_sharing = off; +explain (costs off, verbose) with cte as ( + select a, a * random() from t2 +) +select * from cte join t1 using(a); + QUERY PLAN +---------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + Output: t2.a, (((t2.a)::double precision * random())) + -> Hash Join + Output: t2.a, (((t2.a)::double precision * random())) + Hash Cond: (t1.a = t2.a) + -> Seq Scan on rpt.t1 + Output: t1.a + -> Hash + Output: t2.a, (((t2.a)::double precision * random())) + -> Redistribute Motion 1:3 (slice1; segments: 1) + Output: t2.a, (((t2.a)::double precision * random())) + Hash Key: t2.a + -> Seq Scan on rpt.t2 + Output: t2.a, ((t2.a)::double precision * random()) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, gp_cte_sharing=off, optimizer=off +(16 rows) + +reset gp_cte_sharing; +-- ensure that the volatile function is executed on one segment if it is in target list of subplan of multiset function +explain (costs off, verbose) select * from ( + SELECT count(*) as a FROM anytable_out( TABLE( SELECT random()::int from t2 ) ) +) a join t1 using(a); + QUERY PLAN +------------------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + Output: (count(*)) + -> Hash Join + Output: (count(*)) + Hash Cond: (t1.a = (count(*))) + -> Seq Scan on rpt.t1 + Output: t1.a + -> Hash + Output: (count(*)) + -> Redistribute Motion 1:3 (slice1; segments: 1) + Output: (count(*)) + Hash Key: (count(*)) + -> Aggregate + Output: count(*) + -> Table Function Scan on pg_catalog.anytable_out + Output: anytable_out + -> Seq Scan on rpt.t2 + Output: (random())::integer + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, optimizer=off +(20 rows) + +-- if there is a volatile function in the target list of a plan with the locus type +-- General or Segment General, then such a plan should be executed on single +-- segment, since it is assumed that nodes with such locus types will give the same +-- result on all segments, which is impossible for a volatile function. +-- start_ignore +drop table if exists d; +NOTICE: table "d" does not exist, skipping +drop table if exists r; +NOTICE: table "r" does not exist, skipping +-- end_ignore +create table r (a int, b int) distributed replicated; +create table d (b int, a int default 1) distributed by (b); +insert into d select * from generate_series(0, 20) j; +-- change distribution without reorganize +alter table d set distributed randomly; +insert into r values (1, 1), (2, 2), (3, 3); +with cte as ( + select a, b * random() as rand from r +) +select count(distinct(rand)) from cte join d on cte.a = d.a; + count +------- + 1 +(1 row) + +drop table r; +drop table d; drop table if exists t; drop table if exists t1; drop table if exists t2; diff --git a/src/test/regress/expected/rpt_optimizer.out b/src/test/regress/expected/rpt_optimizer.out index 557392270fc6..c176fd5c8908 100644 --- a/src/test/regress/expected/rpt_optimizer.out +++ b/src/test/regress/expected/rpt_optimizer.out @@ -1354,9 +1354,77 @@ select c from rep_tab where c in (select distinct d from rand_tab); 2 (2 rows) +-- test for optimizer_enable_replicated_table +explain (costs off) select * from rep_tab; + QUERY PLAN +------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) + -> Seq Scan on rep_tab + Optimizer: Pivotal Optimizer (GPORCA) +(3 rows) + +set optimizer_enable_replicated_table=off; +set optimizer_trace_fallback=on; +explain (costs off) select * from rep_tab; +INFO: GPORCA failed to produce a plan, falling back to planner +DETAIL: Feature not supported: Use optimizer_enable_replicated_table to enable replicated tables +WARNING: relcache reference leak: relation "rep_tab" not closed + QUERY PLAN +------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) + -> Seq Scan on rep_tab + Optimizer: Postgres query optimizer +(3 rows) + +reset optimizer_trace_fallback; +reset optimizer_enable_replicated_table; +-- Ensure plan with Gather Motion node is generated. +drop table if exists t; +NOTICE: table "t" does not exist, skipping +create table t (i int, j int) distributed replicated; +insert into t values (1, 2); +explain (costs off) select j, (select j) AS "Correlated Field" from t; + QUERY PLAN +------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) + -> Result + -> Seq Scan on t + SubPlan 1 (slice1; segments: 1) + -> Result + -> Result + Optimizer: Pivotal Optimizer (GPORCA) +(7 rows) + +select j, (select j) AS "Correlated Field" from t; + j | Correlated Field +---+------------------ + 2 | 2 +(1 row) + +explain (costs off) select j, (select 5) AS "Uncorrelated Field" from t; + QUERY PLAN +------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) + -> Result + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on t + -> Materialize + -> Result + -> Result + Optimizer: Pivotal Optimizer (GPORCA) +(9 rows) + +select j, (select 5) AS "Uncorrelated Field" from t; + j | Uncorrelated Field +---+-------------------- + 2 | 5 +(1 row) + -- -- Check sub-selects with distributed replicated tables and volatile functions -- +drop table if exists t; create table t (i int) distributed replicated; create table t1 (a int) distributed by (a); create table t2 (a int, b float) distributed replicated; @@ -1517,6 +1585,130 @@ explain (costs off, verbose) select * from t1 where 1 <= ALL (select i from t gr Settings: enable_bitmapscan=off, enable_seqscan=off (20 rows) +set gp_cte_sharing = on; +-- ensure that the volatile function is executed on one segment if it is in the CTE target list +explain (costs off, verbose) with cte as ( + select a * random() as a from t2 +) +select * from cte join (select * from t1 join cte using(a)) b using(a); + QUERY PLAN +------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice4; segments: 3) + Output: share0_ref1.a + -> Hash Join + Output: share0_ref1.a + Hash Cond: (share0_ref2.a = share0_ref1.a) + -> Hash Join + Output: share0_ref2.a + Hash Cond: ((t1.a)::double precision = share0_ref2.a) + -> Redistribute Motion 3:3 (slice1; segments: 3) + Output: t1.a + Hash Key: (t1.a)::double precision + -> Seq Scan on rpt.t1 + Output: t1.a + -> Hash + Output: share0_ref2.a + -> Redistribute Motion 1:3 (slice2; segments: 1) + Output: share0_ref2.a + Hash Key: share0_ref2.a + -> Shared Scan (share slice:id 2:0) + Output: share0_ref2.a + -> Hash + Output: share0_ref1.a + -> Redistribute Motion 1:3 (slice3; segments: 1) + Output: share0_ref1.a + Hash Key: share0_ref1.a + -> Shared Scan (share slice:id 3:0) + Output: share0_ref1.a + -> Materialize + Output: (((t2.a)::double precision * random())) + -> Seq Scan on rpt.t2 + Output: ((t2.a)::double precision * random()) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, gp_cte_sharing=on +(33 rows) + +set gp_cte_sharing = off; +explain (costs off, verbose) with cte as ( + select a, a * random() from t2 +) +select * from cte join t1 using(a); + QUERY PLAN +------------------------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + Output: t2.a, (((t2.a)::double precision * random())) + -> Hash Join + Output: t2.a, (((t2.a)::double precision * random())) + Hash Cond: (t1.a = t2.a) + -> Seq Scan on rpt.t1 + Output: t1.a + -> Hash + Output: t2.a, (((t2.a)::double precision * random())) + -> Redistribute Motion 1:3 (slice1; segments: 1) + Output: t2.a, (((t2.a)::double precision * random())) + Hash Key: t2.a + -> Seq Scan on rpt.t2 + Output: t2.a, ((t2.a)::double precision * random()) + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off, gp_cte_sharing=off +(16 rows) + +reset gp_cte_sharing; +-- ensure that the volatile function is executed on one segment if it is in target list of subplan of multiset function +explain (costs off, verbose) select * from ( + SELECT count(*) as a FROM anytable_out( TABLE( SELECT random()::int from t2 ) ) +) a join t1 using(a); + QUERY PLAN +------------------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + Output: (count(*)) + -> Hash Join + Output: (count(*)) + Hash Cond: (t1.a = (count(*))) + -> Seq Scan on rpt.t1 + Output: t1.a + -> Hash + Output: (count(*)) + -> Redistribute Motion 1:3 (slice1; segments: 1) + Output: (count(*)) + Hash Key: (count(*)) + -> Aggregate + Output: count(*) + -> Table Function Scan on pg_catalog.anytable_out + Output: anytable_out + -> Seq Scan on rpt.t2 + Output: (random())::integer + Optimizer: Postgres query optimizer + Settings: enable_bitmapscan=off, enable_seqscan=off +(20 rows) + +-- if there is a volatile function in the target list of a plan with the locus type +-- General or Segment General, then such a plan should be executed on single +-- segment, since it is assumed that nodes with such locus types will give the same +-- result on all segments, which is impossible for a volatile function. +-- start_ignore +drop table if exists d; +NOTICE: table "d" does not exist, skipping +drop table if exists r; +NOTICE: table "r" does not exist, skipping +-- end_ignore +create table r (a int, b int) distributed replicated; +create table d (b int, a int default 1) distributed by (b); +insert into d select * from generate_series(0, 20) j; +-- change distribution without reorganize +alter table d set distributed randomly; +insert into r values (1, 1), (2, 2), (3, 3); +with cte as ( + select a, b * random() as rand from r +) +select count(distinct(rand)) from cte join d on cte.a = d.a; + count +------- + 1 +(1 row) + +drop table r; +drop table d; drop table if exists t; drop table if exists t1; drop table if exists t2; diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out index 9187a83c7328..a77c73e9e3f1 100755 --- a/src/test/regress/expected/strings.out +++ b/src/test/regress/expected/strings.out @@ -1933,6 +1933,15 @@ SELECT encode(overlay(E'Th\\000omas'::bytea placing E'\\002\\003'::bytea from 5 Th\000o\x02\x03 (1 row) +-- copy unknown-type column from targetlist rather than reference to subquery outputs +CREATE DOMAIN public.date_timestamp AS timestamp without time zone; +create table dt1(a int, b int, c public.date_timestamp, d public.date_timestamp); +NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. +HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. +insert into dt1 values(1, 1, now(), now()); +insert into dt1 select a, b, 'Thu Sep 14 03:19:54 EDT 2023' as c, 'Thu Sep 14 03:19:54 EDT 2023' as d from dt1; +DROP TABLE dt1; +DROP DOMAIN public.date_timestamp; -- Clean up GPDB-added tables DROP TABLE char_strings_tbl; DROP TABLE varchar_strings_tbl; diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index a375efe83766..22815988bb8d 100755 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -962,7 +962,11 @@ commit; --end_ignore -- Ensure that both planners produce valid plans for the query with the nested -- SubLink, which contains attributes referenced in query's GROUP BY clause. --- The inner part of SubPlan should contain only t.j. +-- Due to presence of non-grouping columns in targetList, ORCA performs query +-- normalization, during which ORCA establishes a correspondence between vars +-- from targetlist entries to grouping attributes. And this process should +-- correctly handle nested structures. The inner part of SubPlan in the test +-- should contain only t.j. -- start_ignore drop table if exists t; NOTICE: table "t" does not exist, skipping @@ -1000,8 +1004,12 @@ group by i, j; -- Ensure that both planners produce valid plans for the query with the nested -- SubLink when this SubLink is inside the GROUP BY clause. Attribute, which is --- not grouping column, is added to query targetList to make ORCA perform query --- normalization. For ORCA the fallback shouldn't occur. +-- not grouping column (1 as c), is added to query targetList to make ORCA +-- perform query normalization. During normalization ORCA modifies the vars of +-- the grouping elements of targetList in order to produce a new Query tree. +-- The modification of vars inside nested part of SubLinks should be handled +-- correctly. ORCA shouldn't fall back due to missing variable entry as a result +-- of incorrect query normalization. explain (verbose, costs off) select j, 1 as c, (select j from (select j) q2) q1 @@ -1038,8 +1046,9 @@ group by j, q1; (1 row) -- Ensure that both planners produce valid plans for the query with the nested --- SubLink, and this SubLink is under the aggregation. For ORCA the fallback --- shouldn't occur. +-- SubLink, and this SubLink is under aggregation. ORCA shouldn't fall back due +-- to missing variable entry as a result of incorrect query normalization. ORCA +-- should correctly process args of the aggregation during normalization. explain (verbose, costs off) select (select max((select t.i))) from t; QUERY PLAN diff --git a/src/test/regress/expected/subselect_gp.out b/src/test/regress/expected/subselect_gp.out index ce9060e971e3..2109d91f2009 100644 --- a/src/test/regress/expected/subselect_gp.out +++ b/src/test/regress/expected/subselect_gp.out @@ -3112,6 +3112,210 @@ select * from r where b in (select b from s where c=10 order by c limit 2); 1 | 2 | 3 (1 row) +-- Test nested query with aggregate inside a sublink, +-- ORCA should correctly normalize the aggregate expression inside the +-- sublink's nested query and the column variable accessed in aggregate should +-- be accessible to the aggregate after the normalization of query. +-- If the query is not supported, ORCA should gracefully fallback to postgres +explain (COSTS OFF) with t0 AS ( + SELECT + ROW_TO_JSON((SELECT x FROM (SELECT max(t.b)) x)) + AS c + FROM r + JOIN s ON true + JOIN s as t ON true + ) +SELECT c FROM t0; + QUERY PLAN +--------------------------------------------------------------------------------------- + Aggregate + -> Gather Motion 3:1 (slice3; segments: 3) + -> Aggregate + -> Nested Loop + -> Broadcast Motion 3:3 (slice2; segments: 3) + -> Nested Loop + -> Seq Scan on r + -> Materialize + -> Broadcast Motion 3:3 (slice1; segments: 3) + -> Seq Scan on s + -> Materialize + -> Seq Scan on s t + SubPlan 1 (slice0) + -> Subquery Scan on x + -> Result + Optimizer: Postgres query optimizer +(16 rows) + +-- +-- Test case for ORCA semi join with random table +-- See https://github.com/greenplum-db/gpdb/issues/16611 +-- +--- case for random distribute +create table table_left (l1 int, l2 int) distributed by (l1); +create table table_right (r1 int, r2 int) distributed randomly; +create index table_right_idx on table_right(r1); +insert into table_left values (1,1); +insert into table_right select i, i from generate_series(1, 300) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; +--- make sure the same value (1,1) rows are inserted into different segments +select count(distinct gp_segment_id) > 1 from table_right where r1 = 1; + ?column? +---------- + t +(1 row) + +analyze table_left; +analyze table_right; +-- two types of semi join tests +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(11 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(11 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +--- case for replicate distribute +alter table table_right set distributed replicated; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +----------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(9 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +----------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(9 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +--- case for partition table with random distribute +drop table table_right; +create table table_right (r1 int, r2 int) distributed randomly partition by range (r1) ( start (0) end (300) every (100)); +NOTICE: CREATE TABLE will create partition "table_right_1_prt_1" for table "table_right" +NOTICE: CREATE TABLE will create partition "table_right_1_prt_2" for table "table_right" +NOTICE: CREATE TABLE will create partition "table_right_1_prt_3" for table "table_right" +create index table_right_idx on table_right(r1); +insert into table_right select i, i from generate_series(1, 299) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; +analyze table_right; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right_1_prt_1.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right_1_prt_1.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right_1_prt_1.r1 + -> Append + -> Seq Scan on table_right_1_prt_1 + -> Seq Scan on table_right_1_prt_2 + -> Seq Scan on table_right_1_prt_3 + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(14 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right_1_prt_1.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right_1_prt_1.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right_1_prt_1.r1 + -> Append + -> Seq Scan on table_right_1_prt_1 + -> Seq Scan on table_right_1_prt_2 + -> Seq Scan on table_right_1_prt_3 + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(14 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +-- clean up +drop table table_left; +drop table table_right; -- Test that Explicit Redistribute Motion is applied properly for -- queries that have modifying operation inside a SubPlan. That -- requires the ModifyTable's top Flow node to be copied correctly inside diff --git a/src/test/regress/expected/subselect_gp_1.out b/src/test/regress/expected/subselect_gp_1.out index 25655d29153a..a84cf6c0b411 100644 --- a/src/test/regress/expected/subselect_gp_1.out +++ b/src/test/regress/expected/subselect_gp_1.out @@ -3112,6 +3112,210 @@ select * from r where b in (select b from s where c=10 order by c limit 2); 1 | 2 | 3 (1 row) +-- Test nested query with aggregate inside a sublink, +-- ORCA should correctly normalize the aggregate expression inside the +-- sublink's nested query and the column variable accessed in aggregate should +-- be accessible to the aggregate after the normalization of query. +-- If the query is not supported, ORCA should gracefully fallback to postgres +explain (COSTS OFF) with t0 AS ( + SELECT + ROW_TO_JSON((SELECT x FROM (SELECT max(t.b)) x)) + AS c + FROM r + JOIN s ON true + JOIN s as t ON true + ) +SELECT c FROM t0; + QUERY PLAN +--------------------------------------------------------------------------------------- + Aggregate + -> Gather Motion 3:1 (slice3; segments: 3) + -> Aggregate + -> Nested Loop + -> Broadcast Motion 3:3 (slice2; segments: 3) + -> Nested Loop + -> Seq Scan on r + -> Materialize + -> Broadcast Motion 3:3 (slice1; segments: 3) + -> Seq Scan on s + -> Materialize + -> Seq Scan on s t + SubPlan 1 (slice0) + -> Subquery Scan on x + -> Result + Optimizer: Postgres query optimizer +(16 rows) + +-- +-- Test case for ORCA semi join with random table +-- See https://github.com/greenplum-db/gpdb/issues/16611 +-- +--- case for random distribute +create table table_left (l1 int, l2 int) distributed by (l1); +create table table_right (r1 int, r2 int) distributed randomly; +create index table_right_idx on table_right(r1); +insert into table_left values (1,1); +insert into table_right select i, i from generate_series(1, 300) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; +--- make sure the same value (1,1) rows are inserted into different segments +select count(distinct gp_segment_id) > 1 from table_right where r1 = 1; + ?column? +---------- + t +(1 row) + +analyze table_left; +analyze table_right; +-- two types of semi join tests +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(11 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(11 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +--- case for replicate distribute +alter table table_right set distributed replicated; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +----------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(9 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +----------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join + Hash Cond: (table_right.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right.r1 + -> Seq Scan on table_right + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(9 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +--- case for partition table with random distribute +drop table table_right; +create table table_right (r1 int, r2 int) distributed randomly partition by range (r1) ( start (0) end (300) every (100)); +NOTICE: CREATE TABLE will create partition "table_right_1_prt_1" for table "table_right" +NOTICE: CREATE TABLE will create partition "table_right_1_prt_2" for table "table_right" +NOTICE: CREATE TABLE will create partition "table_right_1_prt_3" for table "table_right" +create index table_right_idx on table_right(r1); +insert into table_right select i, i from generate_series(1, 299) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; +analyze table_right; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right_1_prt_1.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right_1_prt_1.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right_1_prt_1.r1 + -> Append + -> Seq Scan on table_right_1_prt_1 + -> Seq Scan on table_right_1_prt_2 + -> Seq Scan on table_right_1_prt_3 + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(14 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Join + Hash Cond: (table_right_1_prt_1.r1 = table_left.l1) + -> HashAggregate + Group Key: table_right_1_prt_1.r1 + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right_1_prt_1.r1 + -> Append + -> Seq Scan on table_right_1_prt_1 + -> Seq Scan on table_right_1_prt_2 + -> Seq Scan on table_right_1_prt_3 + -> Hash + -> Seq Scan on table_left + Optimizer: Postgres query optimizer +(14 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +-- clean up +drop table table_left; +drop table table_right; -- Test that Explicit Redistribute Motion is applied properly for -- queries that have modifying operation inside a SubPlan. That -- requires the ModifyTable's top Flow node to be copied correctly inside diff --git a/src/test/regress/expected/subselect_gp_optimizer.out b/src/test/regress/expected/subselect_gp_optimizer.out index b81ffb0be78b..f48c204b78a0 100644 --- a/src/test/regress/expected/subselect_gp_optimizer.out +++ b/src/test/regress/expected/subselect_gp_optimizer.out @@ -3253,6 +3253,210 @@ select * from r where b in (select b from s where c=10 order by c limit 2); 1 | 2 | 3 (1 row) +-- Test nested query with aggregate inside a sublink, +-- ORCA should correctly normalize the aggregate expression inside the +-- sublink's nested query and the column variable accessed in aggregate should +-- be accessible to the aggregate after the normalization of query. +-- If the query is not supported, ORCA should gracefully fallback to postgres +explain (COSTS OFF) with t0 AS ( + SELECT + ROW_TO_JSON((SELECT x FROM (SELECT max(t.b)) x)) + AS c + FROM r + JOIN s ON true + JOIN s as t ON true + ) +SELECT c FROM t0; + QUERY PLAN +--------------------------------------------------------------------------------------- + Aggregate + -> Gather Motion 3:1 (slice3; segments: 3) + -> Aggregate + -> Nested Loop + -> Broadcast Motion 3:3 (slice2; segments: 3) + -> Nested Loop + -> Seq Scan on r + -> Materialize + -> Broadcast Motion 3:3 (slice1; segments: 3) + -> Seq Scan on s + -> Materialize + -> Seq Scan on s t + SubPlan 1 (slice0) + -> Subquery Scan on x + -> Result + Optimizer: Postgres query optimizer +(16 rows) + +-- +-- Test case for ORCA semi join with random table +-- See https://github.com/greenplum-db/gpdb/issues/16611 +-- +--- case for random distribute +create table table_left (l1 int, l2 int) distributed by (l1); +create table table_right (r1 int, r2 int) distributed randomly; +create index table_right_idx on table_right(r1); +insert into table_left values (1,1); +insert into table_right select i, i from generate_series(1, 300) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; +--- make sure the same value (1,1) rows are inserted into different segments +select count(distinct gp_segment_id) > 1 from table_right where r1 = 1; + ?column? +---------- + t +(1 row) + +analyze table_left; +analyze table_right; +-- two types of semi join tests +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +------------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Semi Join + Hash Cond: (table_left.l1 = table_right.r1) + -> Seq Scan on table_left + Filter: (NOT (l1 IS NULL)) + -> Hash + -> Result + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Seq Scan on table_right + Optimizer: Pivotal Optimizer (GPORCA) +(11 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Semi Join + Hash Cond: (table_left.l1 = table_right.r1) + -> Seq Scan on table_left + -> Hash + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Seq Scan on table_right + Optimizer: Pivotal Optimizer (GPORCA) +(9 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +--- case for replicate distribute +alter table table_right set distributed replicated; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +------------------------------------------------------------------------- + Gather Motion 1:1 (slice2; segments: 1) + -> Nested Loop + Join Filter: true + -> Broadcast Motion 3:1 (slice1; segments: 3) + -> Seq Scan on table_left + Filter: (NOT (l1 IS NULL)) + -> GroupAggregate + Group Key: table_right.r1 + -> Result + -> Index Scan using table_right_idx on table_right + Index Cond: (r1 = table_left.l1) + Optimizer: Pivotal Optimizer (GPORCA) +(12 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +------------------------------------------------------------------- + Gather Motion 1:1 (slice2; segments: 1) + -> Nested Loop + Join Filter: true + -> Broadcast Motion 3:1 (slice1; segments: 3) + -> Seq Scan on table_left + -> GroupAggregate + Group Key: table_right.r1 + -> Index Scan using table_right_idx on table_right + Index Cond: (r1 = table_left.l1) + Optimizer: Pivotal Optimizer (GPORCA) +(10 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +--- case for partition table with random distribute +drop table table_right; +create table table_right (r1 int, r2 int) distributed randomly partition by range (r1) ( start (0) end (300) every (100)); +NOTICE: CREATE TABLE will create partition "table_right_1_prt_1" for table "table_right" +NOTICE: CREATE TABLE will create partition "table_right_1_prt_2" for table "table_right" +NOTICE: CREATE TABLE will create partition "table_right_1_prt_3" for table "table_right" +create index table_right_idx on table_right(r1); +insert into table_right select i, i from generate_series(1, 299) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; +analyze table_right; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); + QUERY PLAN +--------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Semi Join + Hash Cond: (table_left.l1 = table_right.r1) + -> Seq Scan on table_left + Filter: (NOT (l1 IS NULL)) + -> Hash + -> Result + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Sequence + -> Partition Selector for table_right (dynamic scan id: 1) + Partitions selected: 3 (out of 3) + -> Dynamic Seq Scan on table_right (dynamic scan id: 1) + Optimizer: Pivotal Optimizer (GPORCA) +(14 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +explain (costs off) select * from table_left where l1 in (select r1 from table_right); + QUERY PLAN +--------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice2; segments: 3) + -> Hash Semi Join + Hash Cond: (table_left.l1 = table_right.r1) + -> Seq Scan on table_left + -> Hash + -> Redistribute Motion 3:3 (slice1; segments: 3) + Hash Key: table_right.r1 + -> Sequence + -> Partition Selector for table_right (dynamic scan id: 1) + Partitions selected: 3 (out of 3) + -> Dynamic Seq Scan on table_right (dynamic scan id: 1) + Optimizer: Pivotal Optimizer (GPORCA) +(12 rows) + +select * from table_left where exists (select 1 from table_right where l1 = r1); + l1 | l2 +----+---- + 1 | 1 +(1 row) + +-- clean up +drop table table_left; +drop table table_right; -- Test that Explicit Redistribute Motion is applied properly for -- queries that have modifying operation inside a SubPlan. That -- requires the ModifyTable's top Flow node to be copied correctly inside diff --git a/src/test/regress/expected/subselect_optimizer.out b/src/test/regress/expected/subselect_optimizer.out index c869799d6f41..85e610f315b5 100644 --- a/src/test/regress/expected/subselect_optimizer.out +++ b/src/test/regress/expected/subselect_optimizer.out @@ -1011,7 +1011,11 @@ commit; --end_ignore -- Ensure that both planners produce valid plans for the query with the nested -- SubLink, which contains attributes referenced in query's GROUP BY clause. --- The inner part of SubPlan should contain only t.j. +-- Due to presence of non-grouping columns in targetList, ORCA performs query +-- normalization, during which ORCA establishes a correspondence between vars +-- from targetlist entries to grouping attributes. And this process should +-- correctly handle nested structures. The inner part of SubPlan in the test +-- should contain only t.j. -- start_ignore drop table if exists t; NOTICE: table "t" does not exist, skipping @@ -1056,8 +1060,12 @@ group by i, j; -- Ensure that both planners produce valid plans for the query with the nested -- SubLink when this SubLink is inside the GROUP BY clause. Attribute, which is --- not grouping column, is added to query targetList to make ORCA perform query --- normalization. For ORCA the fallback shouldn't occur. +-- not grouping column (1 as c), is added to query targetList to make ORCA +-- perform query normalization. During normalization ORCA modifies the vars of +-- the grouping elements of targetList in order to produce a new Query tree. +-- The modification of vars inside nested part of SubLinks should be handled +-- correctly. ORCA shouldn't fall back due to missing variable entry as a result +-- of incorrect query normalization. explain (verbose, costs off) select j, 1 as c, (select j from (select j) q2) q1 @@ -1102,8 +1110,9 @@ group by j, q1; (1 row) -- Ensure that both planners produce valid plans for the query with the nested --- SubLink, and this SubLink is under the aggregation. For ORCA the fallback --- shouldn't occur. +-- SubLink, and this SubLink is under aggregation. ORCA shouldn't fall back due +-- to missing variable entry as a result of incorrect query normalization. ORCA +-- should correctly process args of the aggregation during normalization. explain (verbose, costs off) select (select max((select t.i))) from t; QUERY PLAN diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out index 4f449766740e..2fe564e8cb96 100644 --- a/src/test/regress/expected/with.out +++ b/src/test/regress/expected/with.out @@ -2280,3 +2280,88 @@ WITH cte AS ( RESET optimizer; DROP TABLE d; +-- Test if sharing is disabled for a SegmentGeneral CTE to avoid deadlock if CTE is +-- executed with 1-gang and joined with n-gang +SET optimizer = off; +--start_ignore +DROP TABLE IF EXISTS d; +NOTICE: table "d" does not exist, skipping +DROP TABLE IF EXISTS r; +NOTICE: table "r" does not exist, skipping +--end_ignore +CREATE TABLE d (a int, b int) DISTRIBUTED BY (a); +INSERT INTO d VALUES ( 1, 2 ),( 2, 3 ); +CREATE TABLE r (a int, b int) DISTRIBUTED REPLICATED; +INSERT INTO r VALUES ( 1, 2 ),( 3, 4 ); +EXPLAIN (COSTS off) +WITH cte AS ( + SELECT count(*) a FROM r +) SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + QUERY PLAN +--------------------------------------------------------------- + Hash Join + Hash Cond: (d_join_cte.a = (count(*))) + -> Subquery Scan on d_join_cte + -> Limit + -> Gather Motion 3:1 (slice1; segments: 3) + -> Limit + -> Hash Join + Hash Cond: (d.a = (count(*))) + -> Seq Scan on d + -> Hash + -> Aggregate + -> Seq Scan on r + -> Hash + -> Gather Motion 1:1 (slice2; segments: 1) + -> Aggregate + -> Seq Scan on r r_1 + Optimizer: Postgres query optimizer +(17 rows) + +WITH cte AS ( + SELECT count(*) a FROM r +) SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + a | b +---+--- + 2 | 3 +(1 row) + +-- Test if sharing is disabled for a General CTE to avoid deadlock if CTE is +-- executed with coordinator gang and joined with n-gang +EXPLAIN (COSTS OFF) +WITH cte AS ( + SELECT count(*) a FROM (VALUES ( 1, 2 ),( 3, 4 )) v +) +SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + QUERY PLAN +--------------------------------------------------------------------------- + Hash Join + Hash Cond: (d_join_cte.a = (count(*))) + -> Subquery Scan on d_join_cte + -> Limit + -> Gather Motion 3:1 (slice1; segments: 3) + -> Limit + -> Hash Join + Hash Cond: (d.a = (count(*))) + -> Seq Scan on d + -> Hash + -> Aggregate + -> Values Scan on "*VALUES*" + -> Hash + -> Aggregate + -> Values Scan on "*VALUES*_1" + Optimizer: Postgres query optimizer +(16 rows) + +WITH cte AS ( + SELECT count(*) a FROM (VALUES ( 1, 2 ),( 3, 4 )) v +) +SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + a | b +---+--- + 2 | 3 +(1 row) + +RESET optimizer; +DROP TABLE d; +DROP TABLE r; diff --git a/src/test/regress/expected/zlib.out b/src/test/regress/expected/zlib.out index 18f8e3deae08..e74822bcf218 100644 --- a/src/test/regress/expected/zlib.out +++ b/src/test/regress/expected/zlib.out @@ -20,6 +20,36 @@ CREATE TABLE test_zlib_hashjoin (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int, INSERT INTO test_zlib_hashjoin SELECT i,i,i,i,i,i,i,i FROM (select generate_series(1, nsegments * 333333) as i from (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +-- start_ignore +create language plpythonu; +-- end_ignore +-- Check if compressed work file count is limited to file_count_limit +-- If the parameter is_comp_buff_limit is true, it means the comp_workfile_created +-- must be smaller than file_count_limit because some work files are not compressed; +-- If the parameter is_comp_buff_limit is false, it means the comp_workfile_created +-- must be equal to file_count_limit because all work files are compressed. +create or replace function check_workfile_compressed(explain_query text, + is_comp_buff_limit bool) +returns setof int as +$$ +import re +rv = plpy.execute(explain_query) +search_text = 'Work file set' +result = [] +for i in range(len(rv)): + cur_line = rv[i]['QUERY PLAN'] + if search_text.lower() in cur_line.lower(): + p = re.compile('(\d+) files \((\d+) compressed\)') + m = p.search(cur_line) + workfile_created = int(m.group(1)) + comp_workfile_created = int(m.group(2)) + if is_comp_buff_limit: + result.append(int(comp_workfile_created < workfile_created)) + else: + result.append(int(comp_workfile_created == workfile_created)) +return result +$$ +language plpythonu; SET statement_mem=5000; --Fail after workfile creation and before add it to workfile set select gp_inject_fault('workfile_creation_failure', 'reset', 2); @@ -147,3 +177,104 @@ select gp_inject_fault('workfile_creation_failure', 'reset', 2); Success: (1 row) +-- Test gp_workfile_compression_overhead_limit to control the memory limit used by +-- compressed temp file +DROP TABLE IF EXISTS test_zlib_memlimit; +NOTICE: table "test_zlib_memlimit" does not exist, skipping +create table test_zlib_memlimit(a int, b text, c timestamp) distributed by (a); +insert into test_zlib_memlimit select id, 'test ' || id, clock_timestamp() from + (select generate_series(1, nsegments * 30000) as id from + (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +insert into test_zlib_memlimit select 1,'test', now() from + (select generate_series(1, nsegments * 2000) as id from + (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +insert into test_zlib_memlimit select id, 'test ' || id, clock_timestamp() from + (select generate_series(1, nsegments * 3000) as id from + (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +analyze test_zlib_memlimit; +set statement_mem='4500kB'; +set gp_workfile_compression=on; +set gp_workfile_limit_files_per_query=0; +-- Run the query with a large value of gp_workfile_compression_overhead_limit +-- The compressed file number should be equal to total work file number +set gp_workfile_compression_overhead_limit=2048000; +select * from check_workfile_compressed(' +explain (analyze) +with B as (select distinct a+1 as a,b,c from test_zlib_memlimit) +,C as (select distinct a+2 as a,b,c from test_zlib_memlimit) +,D as (select a+3 as a,b,c from test_zlib_memlimit) +,E as (select a+4 as a,b,c from test_zlib_memlimit) +,F as (select (a+5)::text as a,b,c from test_zlib_memlimit) +select count(*) from test_zlib_memlimit A +inner join B on A.a = B.a +inner join C on A.a = C.a +inner join D on A.a = D.a +inner join E on A.a = E.a +inner join F on A.a::text = F.a ;', +false) limit 6; + check_workfile_compressed +--------------------------- + 1 + 1 + 1 + 1 + 1 + 1 +(6 rows) + +-- Run the query with a smaller value of gp_workfile_compression_overhead_limit +-- The compressed file number should be less than total work file number +set gp_workfile_compression_overhead_limit=1000; +select * from check_workfile_compressed(' +explain (analyze) +with B as (select distinct a+1 as a,b,c from test_zlib_memlimit) +,C as (select distinct a+2 as a,b,c from test_zlib_memlimit) +,D as (select a+3 as a,b,c from test_zlib_memlimit) +,E as (select a+4 as a,b,c from test_zlib_memlimit) +,F as (select (a+5)::text as a,b,c from test_zlib_memlimit) +select count(*) from test_zlib_memlimit A +inner join B on A.a = B.a +inner join C on A.a = C.a +inner join D on A.a = D.a +inner join E on A.a = E.a +inner join F on A.a::text = F.a ;', +true) limit 6; + check_workfile_compressed +--------------------------- + 1 + 1 + 1 + 1 + 1 + 1 +(6 rows) + +-- Run the query with gp_workfile_compression_overhead_limit=0, which means +-- no limit +-- The compressed file number should be equal to total work file number +set gp_workfile_compression_overhead_limit=0; +select * from check_workfile_compressed(' +explain (analyze) +with B as (select distinct a+1 as a,b,c from test_zlib_memlimit) +,C as (select distinct a+2 as a,b,c from test_zlib_memlimit) +,D as (select a+3 as a,b,c from test_zlib_memlimit) +,E as (select a+4 as a,b,c from test_zlib_memlimit) +,F as (select (a+5)::text as a,b,c from test_zlib_memlimit) +select count(*) from test_zlib_memlimit A +inner join B on A.a = B.a +inner join C on A.a = C.a +inner join D on A.a = D.a +inner join E on A.a = E.a +inner join F on A.a::text = F.a ;', +false) limit 6; + check_workfile_compressed +--------------------------- + 1 + 1 + 1 + 1 + 1 + 1 +(6 rows) + +DROP TABLE test_zlib_memlimit; diff --git a/src/test/regress/greenplum_schedule b/src/test/regress/greenplum_schedule index d5856d84c8a5..b23652fe8713 100755 --- a/src/test/regress/greenplum_schedule +++ b/src/test/regress/greenplum_schedule @@ -41,7 +41,7 @@ test: instr_in_shmem_setup test: instr_in_shmem test: createdb -test: gp_aggregates gp_metadata variadic_parameters default_parameters function_extensions spi gp_xml update_gp returning_gp resource_queue_with_rule gp_types gp_index gp_lock +test: gp_aggregates gp_metadata variadic_parameters default_parameters function_extensions spi gp_xml update_gp returning_gp resource_queue_with_rule gp_types gp_index gp_lock gp_locale test: shared_scan test: spi_processed64bit test: python_processed64bit @@ -308,4 +308,7 @@ test: aux_ao_rels_stat # check syslogger (since GP syslogger code is divergent from upstream) test: syslogger_gp +# run this at the end of the schedule for more chance to catch abnormalies +test: gp_check_files + # end of tests diff --git a/src/test/regress/input/alter_db_set_tablespace.source b/src/test/regress/input/alter_db_set_tablespace.source index 56d83b18f275..61034f6dce6f 100644 --- a/src/test/regress/input/alter_db_set_tablespace.source +++ b/src/test/regress/input/alter_db_set_tablespace.source @@ -15,10 +15,8 @@ CREATE SCHEMA adst; SET search_path TO adst,public; -CREATE OR REPLACE FUNCTION get_tablespace_version_directory_name() - RETURNS TEXT -AS '@abs_builddir@/regress.so', 'get_tablespace_version_directory_name' - LANGUAGE C; +-- to get function get_tablespace_version_directory_name() +CREATE EXTENSION gp_check_functions; -- start_ignore CREATE LANGUAGE plpythonu; diff --git a/src/test/regress/input/external_table.source b/src/test/regress/input/external_table.source index 757be9c1ef4d..3724d4d899f8 100644 --- a/src/test/regress/input/external_table.source +++ b/src/test/regress/input/external_table.source @@ -293,6 +293,16 @@ SELECT encoding from gp_dist_random('pg_exttable') where urilocation='{gpfdist:/ DROP EXTERNAL TABLE issue_9727; RESET client_encoding; +-- Test "DROP OWNED BY" when everything of the protocol is granted to some user. +-- GitHub Issue #12748: https://github.com/greenplum-db/gpdb/issues/12748 +CREATE TRUSTED PROTOCOL dummy_protocol_issue_12748 (readfunc = 'read_from_file', writefunc = 'write_to_file'); +CREATE ROLE test_role_issue_12748; +GRANT ALL ON PROTOCOL dummy_protocol_issue_12748 TO test_role_issue_12748; +DROP OWNED BY test_role_issue_12748; +-- Clean up. +DROP ROLE test_role_issue_12748; +DROP PROTOCOL dummy_protocol_issue_12748; + -- -- WET tests -- diff --git a/src/test/regress/input/gp_check_files.source b/src/test/regress/input/gp_check_files.source new file mode 100644 index 000000000000..255c923d1dd9 --- /dev/null +++ b/src/test/regress/input/gp_check_files.source @@ -0,0 +1,178 @@ +-- Test views/functions to check missing/orphaned data files + +-- start_matchsubs +-- m/aoseg_\d+/ +-- s/aoseg_\d+/aoseg_xxx/g +-- m/aocsseg_\d+/ +-- s/aocsseg_\d+/aocsseg_xxx/g +-- m/aovisimap_\d+/ +-- s/aovisimap_\d+/aovisimap_xxx/g +-- m/seg1_pg_tblspc_.*/ +-- s/seg1_pg_tblspc_.*/seg1_pg_tblspc_XXX/g +-- m/ERROR\: could not rename .*/ +-- s/ERROR\: could not rename .*/ERROR\: could not rename XXX/g +-- m/ERROR\: cannot rename .*/ +-- s/ERROR\: cannot rename .*/ERROR\: cannot rename XXX/g +-- end_matchsubs + +create extension gp_check_functions; + +-- helper function to repeatedly run gp_check_orphaned_files for up to 10 minutes, +-- in case any flakiness happens (like background worker makes LOCK pg_class unsuccessful etc.) +CREATE OR REPLACE FUNCTION run_orphaned_files_view() +RETURNS TABLE(gp_segment_id INT, filename TEXT) AS $$ +DECLARE + retry_counter INT := 0; +BEGIN + WHILE retry_counter < 120 LOOP + BEGIN + RETURN QUERY SELECT q.gp_segment_id, q.filename FROM gp_check_orphaned_files q; + RETURN; -- If successful + EXCEPTION + WHEN OTHERS THEN + RAISE LOG 'attempt failed % with error: %', retry_counter + 1, SQLERRM; + -- When an exception occurs, wait for 5 seconds and then retry + PERFORM pg_sleep(5); + -- Refresh to get the latest pg_stat_activity + PERFORM pg_stat_clear_snapshot(); + retry_counter := retry_counter + 1; + END; + END LOOP; + + -- all retries failed + RAISE EXCEPTION 'failed to retrieve orphaned files after 10 minutes of retries.'; +END; +$$ LANGUAGE plpgsql; + +-- we'll use a specific tablespace to test +CREATE TABLESPACE checkfile_ts LOCATION '@testtablespace@'; +set default_tablespace = checkfile_ts; + +-- create a table that we'll delete the files to test missing files. +-- this have to be created beforehand in order for the tablespace directories to be created. +CREATE TABLE checkmissing_heap(a int, b int, c int); +insert into checkmissing_heap select i,i,i from generate_series(1,100)i; + +-- +-- Tests for orphaned files +-- + +-- go to seg1's data directory for the tablespace we just created +\cd @testtablespace@ +select dbid from gp_segment_configuration where content = 1 and role = 'p' \gset +\cd :dbid +select get_tablespace_version_directory_name() as version_dir \gset +\cd :version_dir +select oid from pg_database where datname = current_database() \gset +\cd :oid + +-- create some orphaned files +\! touch 987654 +\! touch 987654.3 + +-- check orphaned files, note that this forces a checkpoint internally. +set client_min_messages = ERROR; +select gp_segment_id, filename from run_orphaned_files_view(); +reset client_min_messages; + +-- test moving the orphaned files + +-- firstly, should not move anything if the target directory doesn't exist +select * from gp_move_orphaned_files('@testtablespace@/non_exist_dir'); +select gp_segment_id, filename from run_orphaned_files_view(); + +-- should also fail to move if no proper permission to the target directory +\! mkdir @testtablespace@/moving_orphaned_file_test +\! chmod 000 @testtablespace@/moving_orphaned_file_test +select * from gp_move_orphaned_files('@testtablespace@/moving_orphaned_file_test'); +select gp_segment_id, filename from run_orphaned_files_view(); + +-- should not allow non-superuser to run, +-- though it would complain as soon as non-superuser tries to lock pg_class in gp_move_orphaned_files +create role check_file_test_role nosuperuser; +set role = check_file_test_role; +select * from gp_move_orphaned_files('@testtablespace@/moving_orphaned_file_test'); +reset role; +drop role check_file_test_role; + +\! chmod 700 @testtablespace@/moving_orphaned_file_test +-- should correctly move the orphaned files, +-- filter out exact paths as that could vary +\a +select gp_segment_id, move_success, regexp_replace(oldpath, '^.*/(.+)$', '\1') as oldpath, regexp_replace(newpath, '^.*/(.+)$', '\1') as newpath +from gp_move_orphaned_files('@testtablespace@/moving_orphaned_file_test'); +\a + +-- The moved orphaned files are in the target directory tree with a name that indicates its original location in data directory +\cd @testtablespace@/moving_orphaned_file_test/ + +-- should see the orphaned files being moved +\! ls +-- no orphaned files can be found now +select gp_segment_id, filename from run_orphaned_files_view(); + +-- should not affect existing tables +select count(*) from checkmissing_heap; + +-- go back to the valid data directory +\cd @testtablespace@ +select dbid from gp_segment_configuration where content = 1 and role = 'p' \gset +\cd :dbid +select get_tablespace_version_directory_name() as version_dir \gset +\cd :version_dir +select oid from pg_database where datname = current_database() \gset +\cd :oid + +-- +-- Tests for missing files +-- + +-- Now remove the data file for the table we just created. +-- But check to see if the working directory is what we expect (under +-- the test tablespace). Also just delete one and only one file that +-- is number-named. +\! if pwd | grep -q "^@testtablespace@/.*$"; then find . -maxdepth 1 -type f -regex '.*\/[0-9]+' -exec rm {} \; -quit; fi + +-- now create AO/CO tables and delete only their extended files +CREATE TABLE checkmissing_ao(a int, b int, c int) WITH (appendonly=true, orientation=row); +CREATE TABLE checkmissing_co(a int, b int, c int) WITH (appendonly=true, orientation=column); +insert into checkmissing_ao select i,i,i from generate_series(1,100)i; +insert into checkmissing_co select i,i,i from generate_series(1,100)i; + +-- Now remove the extended data file '.1' for the AO/CO tables we just created. +-- Still, check to see if the working directory is what we expect, and only +-- delete exact two '.1' files. +\! if pwd | grep -q "^@testtablespace@/.*$"; then find . -maxdepth 1 -type f -regex '.*\/[0-9]+\.1' -exec rm {} \; -quit; fi +\! if pwd | grep -q "^@testtablespace@/.*$"; then find . -maxdepth 1 -type f -regex '.*\/[0-9]+\.1' -exec rm {} \; -quit; fi + +-- create some normal tables +CREATE TABLE checknormal_heap(a int, b int, c int); +CREATE TABLE checknormal_ao(a int, b int, c int) WITH (appendonly=true, orientation=row); +CREATE TABLE checknormal_co(a int, b int, c int) WITH (appendonly=true, orientation=column); +insert into checknormal_heap select i,i,i from generate_series(1,100)i; +insert into checknormal_ao select i,i,i from generate_series(1,100)i; +insert into checknormal_co select i,i,i from generate_series(1,100)i; + +-- check non-extended files +select gp_segment_id, regexp_replace(filename, '\d+', 'x'), relname from gp_check_missing_files; + +SET client_min_messages = ERROR; + +-- check extended files +select gp_segment_id, regexp_replace(filename, '\d+', 'x'), relname from gp_check_missing_files_ext; + +RESET client_min_messages; + +-- cleanup +drop table checkmissing_heap; +drop table checkmissing_ao; +drop table checkmissing_co; +drop table checknormal_heap; +drop table checknormal_ao; +drop table checknormal_co; + +\! rm -rf @testtablespace@/*; + +DROP TABLESPACE checkfile_ts; +DROP EXTENSION gp_check_functions; + diff --git a/src/test/regress/input/gp_tablespace.source b/src/test/regress/input/gp_tablespace.source index 07ab138e577f..c118dd56c6ab 100644 --- a/src/test/regress/input/gp_tablespace.source +++ b/src/test/regress/input/gp_tablespace.source @@ -35,10 +35,8 @@ BEGIN END; $$ language plpgsql; -CREATE OR REPLACE FUNCTION get_tablespace_version_directory_name() - RETURNS TEXT - AS '@abs_builddir@/regress.so', 'get_tablespace_version_directory_name' - LANGUAGE C; +-- to get function get_tablespace_version_directory_name() +CREATE EXTENSION gp_check_functions; -- create tablespaces we can use @@ -211,4 +209,5 @@ CREATE TABLE t_dir_empty(a int); \! rm -rf @testtablespace@/*; DROP TABLE IF EXISTS t_dir_empty; DROP TABLESPACE testspace_dir_empty; +DROP EXTENSION gp_check_functions; diff --git a/src/test/regress/output/alter_db_set_tablespace.source b/src/test/regress/output/alter_db_set_tablespace.source index c1f40d30fc84..ba330572f29e 100644 --- a/src/test/regress/output/alter_db_set_tablespace.source +++ b/src/test/regress/output/alter_db_set_tablespace.source @@ -11,10 +11,8 @@ -- end_ignore CREATE SCHEMA adst; SET search_path TO adst,public; -CREATE OR REPLACE FUNCTION get_tablespace_version_directory_name() - RETURNS TEXT -AS '@abs_builddir@/regress.so', 'get_tablespace_version_directory_name' - LANGUAGE C; +-- to get function get_tablespace_version_directory_name() +CREATE EXTENSION gp_check_functions; -- start_ignore CREATE LANGUAGE plpythonu; -- end_ignore @@ -1431,7 +1429,7 @@ DROP TABLESPACE adst_destination_tablespace; -- Final cleanup DROP SCHEMA adst CASCADE; NOTICE: drop cascades to 5 other objects -DETAIL: drop cascades to function get_tablespace_version_directory_name() +DETAIL: drop cascades to extension gp_check_functions drop cascades to function setup_tablespace_location_dir_for_test(text) drop cascades to function setup() drop cascades to function list_db_tablespace(text,text) diff --git a/src/test/regress/output/external_table.source b/src/test/regress/output/external_table.source index 9d78d6a30468..71b91bcf95f0 100644 --- a/src/test/regress/output/external_table.source +++ b/src/test/regress/output/external_table.source @@ -394,6 +394,16 @@ SELECT encoding from gp_dist_random('pg_exttable') where urilocation='{gpfdist:/ DROP EXTERNAL TABLE issue_9727; RESET client_encoding; +-- Test "DROP OWNED BY" when everything of the protocol is granted to some user. +-- GitHub Issue #12748: https://github.com/greenplum-db/gpdb/issues/12748 +CREATE TRUSTED PROTOCOL dummy_protocol_issue_12748 (readfunc = 'read_from_file', writefunc = 'write_to_file'); +CREATE ROLE test_role_issue_12748; +NOTICE: resource queue required -- using default resource queue "pg_default" +GRANT ALL ON PROTOCOL dummy_protocol_issue_12748 TO test_role_issue_12748; +DROP OWNED BY test_role_issue_12748; +-- Clean up. +DROP ROLE test_role_issue_12748; +DROP PROTOCOL dummy_protocol_issue_12748; -- -- WET tests -- diff --git a/src/test/regress/output/gp_check_files.source b/src/test/regress/output/gp_check_files.source new file mode 100644 index 000000000000..cff402742105 --- /dev/null +++ b/src/test/regress/output/gp_check_files.source @@ -0,0 +1,198 @@ +-- Test views/functions to check missing/orphaned data files +-- start_matchsubs +-- m/aoseg_\d+/ +-- s/aoseg_\d+/aoseg_xxx/g +-- m/aocsseg_\d+/ +-- s/aocsseg_\d+/aocsseg_xxx/g +-- m/aovisimap_\d+/ +-- s/aovisimap_\d+/aovisimap_xxx/g +-- m/seg1_pg_tblspc_.*/ +-- s/seg1_pg_tblspc_.*/seg1_pg_tblspc_XXX/g +-- m/ERROR\: could not rename .*/ +-- s/ERROR\: could not rename .*/ERROR\: could not rename XXX/g +-- m/ERROR\: cannot rename .*/ +-- s/ERROR\: cannot rename .*/ERROR\: cannot rename XXX/g +-- end_matchsubs +create extension gp_check_functions; +-- helper function to repeatedly run gp_check_orphaned_files for up to 10 minutes, +-- in case any flakiness happens (like background worker makes LOCK pg_class unsuccessful etc.) +CREATE OR REPLACE FUNCTION run_orphaned_files_view() +RETURNS TABLE(gp_segment_id INT, filename TEXT) AS $$ +DECLARE + retry_counter INT := 0; +BEGIN + WHILE retry_counter < 120 LOOP + BEGIN + RETURN QUERY SELECT q.gp_segment_id, q.filename FROM gp_check_orphaned_files q; + RETURN; -- If successful + EXCEPTION + WHEN OTHERS THEN + RAISE LOG 'attempt failed % with error: %', retry_counter + 1, SQLERRM; + -- When an exception occurs, wait for 5 seconds and then retry + PERFORM pg_sleep(5); + -- Refresh to get the latest pg_stat_activity + PERFORM pg_stat_clear_snapshot(); + retry_counter := retry_counter + 1; + END; + END LOOP; + + -- all retries failed + RAISE EXCEPTION 'failed to retrieve orphaned files after 10 minutes of retries.'; +END; +$$ LANGUAGE plpgsql; +-- we'll use a specific tablespace to test +CREATE TABLESPACE checkfile_ts LOCATION '@testtablespace@'; +set default_tablespace = checkfile_ts; +-- create a table that we'll delete the files to test missing files. +-- this have to be created beforehand in order for the tablespace directories to be created. +CREATE TABLE checkmissing_heap(a int, b int, c int); +insert into checkmissing_heap select i,i,i from generate_series(1,100)i; +-- +-- Tests for orphaned files +-- +-- go to seg1's data directory for the tablespace we just created +\cd @testtablespace@ +select dbid from gp_segment_configuration where content = 1 and role = 'p' \gset +\cd :dbid +select get_tablespace_version_directory_name() as version_dir \gset +\cd :version_dir +select oid from pg_database where datname = current_database() \gset +\cd :oid +-- create some orphaned files +\! touch 987654 +\! touch 987654.3 +-- check orphaned files, note that this forces a checkpoint internally. +set client_min_messages = ERROR; +select gp_segment_id, filename from run_orphaned_files_view(); + gp_segment_id | filename +---------------+---------- + 1 | 987654.3 + 1 | 987654 +(2 rows) + +reset client_min_messages; +-- test moving the orphaned files +-- firstly, should not move anything if the target directory doesn't exist +select * from gp_move_orphaned_files('@testtablespace@/non_exist_dir'); +ERROR: could not rename XXX +select gp_segment_id, filename from run_orphaned_files_view(); + gp_segment_id | filename +---------------+---------- + 1 | 987654.3 + 1 | 987654 +(2 rows) + +-- should also fail to move if no proper permission to the target directory +\! mkdir @testtablespace@/moving_orphaned_file_test +\! chmod 000 @testtablespace@/moving_orphaned_file_test +select * from gp_move_orphaned_files('@testtablespace@/moving_orphaned_file_test'); +ERROR: cannot rename XXX +CONTEXT: PL/pgSQL function gp_move_orphaned_files(text) line 20 at RETURN QUERY +select gp_segment_id, filename from run_orphaned_files_view(); + gp_segment_id | filename +---------------+---------- + 1 | 987654.3 + 1 | 987654 +(2 rows) + +-- should not allow non-superuser to run, +-- though it would complain as soon as non-superuser tries to lock pg_class in gp_move_orphaned_files +create role check_file_test_role nosuperuser; +set role = check_file_test_role; +select * from gp_move_orphaned_files('@testtablespace@/moving_orphaned_file_test'); +ERROR: permission denied for relation pg_class +CONTEXT: SQL statement "LOCK TABLE pg_class IN SHARE MODE NOWAIT" +PL/pgSQL function gp_move_orphaned_files(text) line 4 at SQL statement +reset role; +drop role check_file_test_role; +\! chmod 700 @testtablespace@/moving_orphaned_file_test +-- should correctly move the orphaned files, +-- filter out exact paths as that could vary +\a +select gp_segment_id, move_success, regexp_replace(oldpath, '^.*/(.+)$', '\1') as oldpath, regexp_replace(newpath, '^.*/(.+)$', '\1') as newpath +from gp_move_orphaned_files('@testtablespace@/moving_orphaned_file_test'); +gp_segment_id|move_success|oldpath|newpath +1|t|987654|seg1_pg_tblspc_17816_GPDB_6_302307241_17470_987654 +1|t|987654.3|seg1_pg_tblspc_17816_GPDB_6_302307241_17470_987654.3 +(2 rows) +\a +-- The moved orphaned files are in the target directory tree with a name that indicates its original location in data directory +\cd @testtablespace@/moving_orphaned_file_test/ +-- should see the orphaned files being moved +\! ls +seg1_pg_tblspc_37385_GPDB_6_302307241_37039_987654 +seg1_pg_tblspc_37385_GPDB_6_302307241_37039_987654.3 +-- no orphaned files can be found now +select gp_segment_id, filename from run_orphaned_files_view(); + gp_segment_id | filename +---------------+---------- +(0 rows) + +-- should not affect existing tables +select count(*) from checkmissing_heap; + count +------- + 100 +(1 row) + +-- go back to the valid data directory +\cd @testtablespace@ +select dbid from gp_segment_configuration where content = 1 and role = 'p' \gset +\cd :dbid +select get_tablespace_version_directory_name() as version_dir \gset +\cd :version_dir +select oid from pg_database where datname = current_database() \gset +\cd :oid +-- +-- Tests for missing files +-- +-- Now remove the data file for the table we just created. +-- But check to see if the working directory is what we expect (under +-- the test tablespace). Also just delete one and only one file that +-- is number-named. +\! if pwd | grep -q "^@testtablespace@/.*$"; then find . -maxdepth 1 -type f -regex '.*\/[0-9]+' -exec rm {} \; -quit; fi +-- now create AO/CO tables and delete only their extended files +CREATE TABLE checkmissing_ao(a int, b int, c int) WITH (appendonly=true, orientation=row); +CREATE TABLE checkmissing_co(a int, b int, c int) WITH (appendonly=true, orientation=column); +insert into checkmissing_ao select i,i,i from generate_series(1,100)i; +insert into checkmissing_co select i,i,i from generate_series(1,100)i; +-- Now remove the extended data file '.1' for the AO/CO tables we just created. +-- Still, check to see if the working directory is what we expect, and only +-- delete exact two '.1' files. +\! if pwd | grep -q "^@testtablespace@/.*$"; then find . -maxdepth 1 -type f -regex '.*\/[0-9]+\.1' -exec rm {} \; -quit; fi +\! if pwd | grep -q "^@testtablespace@/.*$"; then find . -maxdepth 1 -type f -regex '.*\/[0-9]+\.1' -exec rm {} \; -quit; fi +-- create some normal tables +CREATE TABLE checknormal_heap(a int, b int, c int); +CREATE TABLE checknormal_ao(a int, b int, c int) WITH (appendonly=true, orientation=row); +CREATE TABLE checknormal_co(a int, b int, c int) WITH (appendonly=true, orientation=column); +insert into checknormal_heap select i,i,i from generate_series(1,100)i; +insert into checknormal_ao select i,i,i from generate_series(1,100)i; +insert into checknormal_co select i,i,i from generate_series(1,100)i; +-- check non-extended files +select gp_segment_id, regexp_replace(filename, '\d+', 'x'), relname from gp_check_missing_files; + gp_segment_id | regexp_replace | relname +---------------+----------------+------------------- + 1 | x | checkmissing_heap +(1 row) + +SET client_min_messages = ERROR; +-- check extended files +select gp_segment_id, regexp_replace(filename, '\d+', 'x'), relname from gp_check_missing_files_ext; + gp_segment_id | regexp_replace | relname +---------------+----------------+------------------- + 1 | x | checkmissing_heap + 1 | x.1 | checkmissing_ao + 1 | x.1 | checkmissing_co +(3 rows) + +RESET client_min_messages; +-- cleanup +drop table checkmissing_heap; +drop table checkmissing_ao; +drop table checkmissing_co; +drop table checknormal_heap; +drop table checknormal_ao; +drop table checknormal_co; +\! rm -rf @testtablespace@/*; +DROP TABLESPACE checkfile_ts; +DROP EXTENSION gp_check_functions; diff --git a/src/test/regress/output/gp_tablespace.source b/src/test/regress/output/gp_tablespace.source index 982d4d531cd7..941c4c79a99a 100644 --- a/src/test/regress/output/gp_tablespace.source +++ b/src/test/regress/output/gp_tablespace.source @@ -33,10 +33,8 @@ BEGIN return has_init_file_for_oid(relation_id); END; $$ language plpgsql; -CREATE OR REPLACE FUNCTION get_tablespace_version_directory_name() - RETURNS TEXT - AS '@abs_builddir@/regress.so', 'get_tablespace_version_directory_name' - LANGUAGE C; +-- to get function get_tablespace_version_directory_name() +CREATE EXTENSION gp_check_functions; -- create tablespaces we can use CREATE TABLESPACE testspace LOCATION '@testtablespace@'; CREATE TABLESPACE ul_testspace LOCATION '@testtablespace@_unlogged'; @@ -385,3 +383,4 @@ CREATE TABLE t_dir_empty(a int); \! rm -rf @testtablespace@/*; DROP TABLE IF EXISTS t_dir_empty; DROP TABLESPACE testspace_dir_empty; +DROP EXTENSION gp_check_functions; diff --git a/src/test/regress/regress_gp.c b/src/test/regress/regress_gp.c index 43aba7ceead2..1f8321129d67 100644 --- a/src/test/regress/regress_gp.c +++ b/src/test/regress/regress_gp.c @@ -2149,13 +2149,6 @@ broken_int4out(PG_FUNCTION_ARGS) return DirectFunctionCall1(int4out, Int32GetDatum(arg)); } -PG_FUNCTION_INFO_V1(get_tablespace_version_directory_name); -Datum -get_tablespace_version_directory_name(PG_FUNCTION_ARGS) -{ - PG_RETURN_TEXT_P(CStringGetTextDatum(GP_TABLESPACE_VERSION_DIRECTORY)); -} - PG_FUNCTION_INFO_V1(gp_tablespace_temptablespaceOid); Datum gp_tablespace_temptablespaceOid(PG_FUNCTION_ARGS) diff --git a/src/test/regress/sql/.gitignore b/src/test/regress/sql/.gitignore index eced9e142fe2..7ec5f933e613 100644 --- a/src/test/regress/sql/.gitignore +++ b/src/test/regress/sql/.gitignore @@ -38,6 +38,7 @@ bb_mpph.sql transient_types.sql hooktest.sql gpcopy.sql +gp_check_files.sql trigger_sets_oid.sql query_info_hook_test.sql gp_tablespace.sql diff --git a/src/test/regress/sql/bfv_olap.sql b/src/test/regress/sql/bfv_olap.sql index 01124ffbaa2b..8c2c290bc4bf 100644 --- a/src/test/regress/sql/bfv_olap.sql +++ b/src/test/regress/sql/bfv_olap.sql @@ -420,7 +420,6 @@ select * from (select sum(a.salary) over(), count(*) from t2_github_issue_10143 a group by a.salary) T; --- this query currently falls back, needs to be fixed select (select rn from (select row_number() over () as rn, name from t1_github_issue_10143 where code = a.code diff --git a/src/test/regress/sql/bfv_planner.sql b/src/test/regress/sql/bfv_planner.sql index c0a34800a9b8..c0fbdd03d84f 100644 --- a/src/test/regress/sql/bfv_planner.sql +++ b/src/test/regress/sql/bfv_planner.sql @@ -316,6 +316,56 @@ explain (costs off) select * from t_hashdist cross join (select a, count(1) as s -- limit explain (costs off) select * from t_hashdist cross join (select * from generate_series(1, 10) limit 1) x; +set gp_cte_sharing = on; + +-- ensure that the volatile function is executed on one segment if it is in the CTE target list +explain (costs off, verbose) with cte as ( + select a * random() as a from generate_series(1, 5) a +) +select * from cte join (select * from t_hashdist join cte using(a)) b using(a); + +set gp_cte_sharing = off; + +explain (costs off, verbose) with cte as ( + select a, a * random() from generate_series(1, 5) a +) +select * from cte join t_hashdist using(a); + +reset gp_cte_sharing; + +-- ensure that the volatile function is executed on one segment if it is in the union target list +explain (costs off, verbose) select * from ( + select random() as a from generate_series(1, 5) + union + select random() as a from generate_series(1, 5) +) +a join t_hashdist on a.a = t_hashdist.a; + +-- ensure that the volatile function is executed on one segment if it is in target list of subplan of multiset function +explain (costs off, verbose) select * from ( + SELECT count(*) as a FROM anytable_out( TABLE( SELECT random()::int from generate_series(1, 5) a ) ) +) a join t_hashdist using(a); + +-- if there is a volatile function in the target list of a plan with the locus type +-- General or Segment General, then such a plan should be executed on single +-- segment, since it is assumed that nodes with such locus types will give the same +-- result on all segments, which is impossible for a volatile function. +-- start_ignore +drop table if exists d; +-- end_ignore +create table d (b int, a int default 1) distributed by (b); + +insert into d select * from generate_series(0, 20) j; +-- change distribution without reorganize +alter table d set distributed randomly; + +with cte as ( + select a as a, a * random() as rand from generate_series(0, 3)a +) +select count(distinct(rand)) from cte join d on cte.a = d.a; + +drop table d; + -- CTAS on general locus into replicated table create temp SEQUENCE test_seq; explain (costs off) create table t_rep as select nextval('test_seq') from (select generate_series(1,10)) t1 distributed replicated; diff --git a/src/test/regress/sql/gp_locale.sql b/src/test/regress/sql/gp_locale.sql new file mode 100644 index 000000000000..444352c9eddf --- /dev/null +++ b/src/test/regress/sql/gp_locale.sql @@ -0,0 +1,61 @@ +-- ORCA uses functions (e.g. vswprintf) to translation to wide character +-- format. But those libraries may fail if the current locale cannot handle the +-- character set. This test checks that even when those libraries fail, ORCA is +-- still able to generate plans. + +-- +-- Create a database that sets the minimum locale +-- +DROP DATABASE IF EXISTS test_locale; +CREATE DATABASE test_locale WITH LC_COLLATE='C' LC_CTYPE='C' TEMPLATE=template0; +\c test_locale + +-- +-- drop/add/remove columns +-- +CREATE TABLE hi_안녕세계 (a int, 안녕세계1 text, 안녕세계2 text, 안녕세계3 text) DISTRIBUTED BY (a); +ALTER TABLE hi_안녕세계 DROP COLUMN 안녕세계2; +ALTER TABLE hi_안녕세계 ADD COLUMN 안녕세계2_ADD_COLUMN text; +ALTER TABLE hi_안녕세계 RENAME COLUMN 안녕세계3 TO こんにちわ3; + +INSERT INTO hi_안녕세계 VALUES(1, '안녕세계1 first', '안녕세2 first', '안녕세계3 first'); +INSERT INTO hi_안녕세계 VALUES(42, '안녕세계1 second', '안녕세2 second', '안녕세계3 second'); + +-- +-- Try various queries containing multibyte character set and check the column +-- name output +-- +SET optimizer_trace_fallback=on; + +-- DELETE +DELETE FROM hi_안녕세계 WHERE a=42; + +-- UPDATE +UPDATE hi_안녕세계 SET 안녕세계1='안녕세계1 first UPDATE' WHERE 안녕세계1='안녕세계1 first'; + +-- SELECT +SELECT * FROM hi_안녕세계; + +SELECT 안녕세계1 || こんにちわ3 FROM hi_안녕세계; + +-- SELECT ALIAS +SELECT 안녕세계1 AS 안녕세계1_Alias FROM hi_안녕세계; + +-- SUBQUERY +SELECT * FROM (SELECT 안녕세계1 FROM hi_안녕세계) t; + +SELECT (SELECT こんにちわ3 FROM hi_안녕세계) FROM (SELECT 1) AS q; + +SELECT (SELECT (SELECT こんにちわ3 FROM hi_안녕세계) FROM hi_안녕세계) FROM (SELECT 1) AS q; + +-- CTE +WITH cte AS +(SELECT 안녕세계1, こんにちわ3 FROM hi_안녕세계) SELECT * FROM cte WHERE 안녕세계1 LIKE '안녕세계1%'; + +WITH cte(안녕세계x, こんにちわx) AS +(SELECT 안녕세계1, こんにちわ3 FROM hi_안녕세계) SELECT * FROM cte WHERE 안녕세계x LIKE '안녕세계1%'; + +-- JOIN +SELECT * FROM hi_안녕세계 hi_안녕세계1, hi_안녕세계 hi_안녕세계2 WHERE hi_안녕세계1.안녕세계1 LIKE '%UPDATE'; + +RESET optimizer_trace_fallback; diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql index 170e28d19ec8..e8dd415553a5 100644 --- a/src/test/regress/sql/matview.sql +++ b/src/test/regress/sql/matview.sql @@ -253,3 +253,15 @@ refresh materialized view mat_view_github_issue_11956; drop materialized view mat_view_github_issue_11956; drop table t_github_issue_11956; + +-- test REFRESH MATERIALIZED VIEW on AO table with index +-- more details could be found at https://github.com/greenplum-db/gpdb/issues/16447 +CREATE TABLE base_table (idn character varying(10) NOT NULL); +INSERT INTO base_table select i from generate_series(1, 5000) i; +CREATE MATERIALIZED VIEW base_view WITH (APPENDONLY=true) AS SELECT tt1.idn AS idn_ban FROM base_table tt1; +CREATE INDEX test_id1 on base_view using btree(idn_ban); +REFRESH MATERIALIZED VIEW base_view ; +SELECT * FROM base_view where idn_ban = '10'; + +DROP MATERIALIZED VIEW base_view; +DROP TABLE base_table; diff --git a/src/test/regress/sql/qp_dropped_cols.sql b/src/test/regress/sql/qp_dropped_cols.sql index 6823131c0993..5a93bd5aec72 100644 --- a/src/test/regress/sql/qp_dropped_cols.sql +++ b/src/test/regress/sql/qp_dropped_cols.sql @@ -8687,3 +8687,140 @@ DELETE FROM dist_key_dropped_pt WHERE b=6; -- the tables, or the pg_upgrade test fails. set client_min_messages='warning'; drop schema qp_dropped_cols cascade; + +-- Test modifying DML on leaf partition when parent has dropped columns and +-- the partition has not. Ensure that DML commands pass without execution +-- errors and produce valid results. +RESET search_path; +-- start_ignore +DROP TABLE IF EXISTS t_part_dropped; +-- end_ignore +CREATE TABLE t_part_dropped (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); + +ALTER TABLE t_part_dropped DROP c2; +ALTER TABLE t_part_dropped ADD PARTITION p2 VALUES (2); + +-- Partition selection should go smoothly when inserting into leaf +-- partition with different attribute structure. +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_dropped VALUES (1, 2, 4); +INSERT INTO t_part_dropped VALUES (1, 2, 4); + +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 4); +INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 4); + +INSERT INTO t_part_dropped_1_prt_p2 VALUES (1, 2, 0); + +-- Ensure that split update on leaf and root partitions does not +-- throw partition selection error in both planners. +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped_1_prt_p2 SET c1 = 2; +UPDATE t_part_dropped_1_prt_p2 SET c1 = 2; + +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped SET c1 = 3; +UPDATE t_part_dropped SET c1 = 3; + +-- Ensure that split update on leaf partition does not throw constraint error +-- (executor does not choose the wrong partition at insert stage of update). +INSERT INTO t_part_dropped VALUES (1, 2, 0); +UPDATE t_part_dropped_1_prt_p2 SET c1 = 2 WHERE c4 = 0; + +SELECT count(*) FROM t_part_dropped_1_prt_p2; + +-- Split update on root relation should choose the correct partition +-- at insert (executor doesn't put the tuple to wrong partition for legacy +-- planner case). +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_dropped SET c1 = 3 WHERE c4 = 0; +UPDATE t_part_dropped SET c1 = 3 WHERE c4 = 0; + +SELECT count(*) FROM t_part_dropped_1_prt_p2; +SELECT * FROM t_part_dropped_1_prt_p0; + +-- For ORCA the partition selection error should not occur. +EXPLAIN (COSTS OFF, VERBOSE) DELETE FROM t_part_dropped_1_prt_p2; +DELETE FROM t_part_dropped_1_prt_p2; + +DROP TABLE t_part_dropped; + +-- Test modifying DML on leaf partition after it was exchanged with a relation, +-- that contained dropped columns. Ensure that DML commands pass without +-- execution errors and produce valid results. +-- start_ignore +DROP TABLE IF EXISTS t_part; +DROP TABLE IF EXISTS t_new_part; +-- end_ignore +CREATE TABLE t_part (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); + +ALTER TABLE t_part ADD PARTITION p2 VALUES (2); +CREATE TABLE t_new_part (c1 int, c11 int, c2 int, c3 int, c4 int); +ALTER TABLE t_new_part DROP c11; +ALTER TABLE t_part EXCHANGE PARTITION FOR (2) WITH TABLE t_new_part; + +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part VALUES (1, 5, 2, 5); +INSERT INTO t_part VALUES (1, 5, 2, 5); + +EXPLAIN (COSTS OFF, VERBOSE) INSERT INTO t_part_1_prt_p2 VALUES (1, 5, 2, 5); +INSERT INTO t_part_1_prt_p2 VALUES (1, 5, 2, 5); + +-- Ensure that split update on leaf and root partitions does not +-- throw partition selection error in both planners. +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part_1_prt_p2 SET c1 = 2; +UPDATE t_part_1_prt_p2 SET c1 = 2; + +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part SET c1 = 3; +UPDATE t_part SET c1 = 3; + +-- Ensure that split update on leaf partition does not throw constraint error +-- (executor does not choose the wrong partition at insert stage of update). +INSERT INTO t_part VALUES (1, 0, 2, 0); +UPDATE t_part_1_prt_p2 SET c1 = 2 WHERE c4 = 0; + +SELECT count(*) FROM t_part_1_prt_p2; + +-- For ORCA the partition selection error should not occur. +EXPLAIN (COSTS OFF, VERBOSE) DELETE FROM t_part_1_prt_p2; +DELETE FROM t_part_1_prt_p2; + +DROP TABLE t_part; +DROP TABLE t_new_part; + +-- Test split update execution of a plan from legacy planner in case +-- when parent relation has several partitions, and one of them has +-- physically-different attribute structure from parent's due to +-- dropped columns. Ensure that split update does not reconstruct tuple +-- of correct (without dropped attributes) partition. +CREATE TABLE t_part (c1 int, c2 int, c3 int, c4 int) DISTRIBUTED BY (c1) +PARTITION BY LIST (c3) (PARTITION p0 VALUES (0)); + +-- Legacy planner UPDATE's plan consists of several subplans (partitioned +-- relations are considered in inheritance planner), and their execution +-- order varies depending on the order the partitions have been added. +-- Therefore, we add each partition through EXCHANGE to get UPDATE's +-- test plan in a form such that the t_new_part0 update comes first, and the +-- t_new_part2 comes second. This aspect is crucial because executor's +-- partitions related logic depended on that fact, what led to the +-- issue this test demonstrates. +-- This paritition is not compatible with the parent due to dropped columns +CREATE TABLE t_new_part0 (c1 int, c11 int, c2 int, c3 int, c4 int); +ALTER TABLE t_new_part0 drop c11; +ALTER TABLE t_part EXCHANGE PARTITION FOR (0) WITH TABLE t_new_part0; + +-- This partition is compatible with the parent. +ALTER TABLE t_part ADD PARTITION p2 VALUES (2); +CREATE TABLE t_new_part2 (c1 int, c2 int, c3 int, c4 int); +ALTER TABLE t_part EXCHANGE PARTITION FOR (2) WITH TABLE t_new_part2; + +-- Insert into correct partition, and perform split update on root, +-- that will execute split update on each subplan in case of inheritance +-- plan (legacy planner). Ensure that split update does not reconstruct the +-- tuple at insert. +INSERT INTO t_part VALUES (1, 4, 2, 2); + +EXPLAIN (COSTS OFF, VERBOSE) UPDATE t_part SET c1 = 3; +UPDATE t_part SET c1 = 3; + +SELECT * FROM t_part_1_prt_p2; + +DROP TABLE t_part; +DROP TABLE t_new_part0; +DROP TABLE t_new_part2; diff --git a/src/test/regress/sql/rpt.sql b/src/test/regress/sql/rpt.sql index 1e1b479d286c..ad91b109be74 100644 --- a/src/test/regress/sql/rpt.sql +++ b/src/test/regress/sql/rpt.sql @@ -542,9 +542,27 @@ select c from rep_tab where c in (select distinct a from dist_tab); explain select c from rep_tab where c in (select distinct d from rand_tab); select c from rep_tab where c in (select distinct d from rand_tab); +-- test for optimizer_enable_replicated_table +explain (costs off) select * from rep_tab; +set optimizer_enable_replicated_table=off; +set optimizer_trace_fallback=on; +explain (costs off) select * from rep_tab; +reset optimizer_trace_fallback; +reset optimizer_enable_replicated_table; + +-- Ensure plan with Gather Motion node is generated. +drop table if exists t; +create table t (i int, j int) distributed replicated; +insert into t values (1, 2); +explain (costs off) select j, (select j) AS "Correlated Field" from t; +select j, (select j) AS "Correlated Field" from t; +explain (costs off) select j, (select 5) AS "Uncorrelated Field" from t; +select j, (select 5) AS "Uncorrelated Field" from t; + -- -- Check sub-selects with distributed replicated tables and volatile functions -- +drop table if exists t; create table t (i int) distributed replicated; create table t1 (a int) distributed by (a); create table t2 (a int, b float) distributed replicated; @@ -562,6 +580,54 @@ explain (costs off, verbose) insert into t2 (a, b) select i, random() from t; explain (costs off, verbose) select * from t1 where a in (select f(i) from t where i=a and f(i) > 0); -- ensure we do not break broadcast motion explain (costs off, verbose) select * from t1 where 1 <= ALL (select i from t group by i having random() > 0); + +set gp_cte_sharing = on; + +-- ensure that the volatile function is executed on one segment if it is in the CTE target list +explain (costs off, verbose) with cte as ( + select a * random() as a from t2 +) +select * from cte join (select * from t1 join cte using(a)) b using(a); + +set gp_cte_sharing = off; + +explain (costs off, verbose) with cte as ( + select a, a * random() from t2 +) +select * from cte join t1 using(a); + +reset gp_cte_sharing; + +-- ensure that the volatile function is executed on one segment if it is in target list of subplan of multiset function +explain (costs off, verbose) select * from ( + SELECT count(*) as a FROM anytable_out( TABLE( SELECT random()::int from t2 ) ) +) a join t1 using(a); + +-- if there is a volatile function in the target list of a plan with the locus type +-- General or Segment General, then such a plan should be executed on single +-- segment, since it is assumed that nodes with such locus types will give the same +-- result on all segments, which is impossible for a volatile function. +-- start_ignore +drop table if exists d; +drop table if exists r; +-- end_ignore +create table r (a int, b int) distributed replicated; +create table d (b int, a int default 1) distributed by (b); + +insert into d select * from generate_series(0, 20) j; +-- change distribution without reorganize +alter table d set distributed randomly; + +insert into r values (1, 1), (2, 2), (3, 3); + +with cte as ( + select a, b * random() as rand from r +) +select count(distinct(rand)) from cte join d on cte.a = d.a; + +drop table r; +drop table d; + drop table if exists t; drop table if exists t1; drop table if exists t2; diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql index 19265c9e78e0..4b6ca3c5d90c 100644 --- a/src/test/regress/sql/strings.sql +++ b/src/test/regress/sql/strings.sql @@ -644,6 +644,13 @@ SELECT encode(overlay(E'Th\\000omas'::bytea placing E'Th\\001omas'::bytea from 2 SELECT encode(overlay(E'Th\\000omas'::bytea placing E'\\002\\003'::bytea from 8),'escape'); SELECT encode(overlay(E'Th\\000omas'::bytea placing E'\\002\\003'::bytea from 5 for 3),'escape'); +-- copy unknown-type column from targetlist rather than reference to subquery outputs +CREATE DOMAIN public.date_timestamp AS timestamp without time zone; +create table dt1(a int, b int, c public.date_timestamp, d public.date_timestamp); +insert into dt1 values(1, 1, now(), now()); +insert into dt1 select a, b, 'Thu Sep 14 03:19:54 EDT 2023' as c, 'Thu Sep 14 03:19:54 EDT 2023' as d from dt1; +DROP TABLE dt1; +DROP DOMAIN public.date_timestamp; -- Clean up GPDB-added tables DROP TABLE char_strings_tbl; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index 45da8e2f6fd6..e2ce210d0599 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -520,7 +520,11 @@ commit; -- Ensure that both planners produce valid plans for the query with the nested -- SubLink, which contains attributes referenced in query's GROUP BY clause. --- The inner part of SubPlan should contain only t.j. +-- Due to presence of non-grouping columns in targetList, ORCA performs query +-- normalization, during which ORCA establishes a correspondence between vars +-- from targetlist entries to grouping attributes. And this process should +-- correctly handle nested structures. The inner part of SubPlan in the test +-- should contain only t.j. -- start_ignore drop table if exists t; -- end_ignore @@ -540,8 +544,12 @@ group by i, j; -- Ensure that both planners produce valid plans for the query with the nested -- SubLink when this SubLink is inside the GROUP BY clause. Attribute, which is --- not grouping column, is added to query targetList to make ORCA perform query --- normalization. For ORCA the fallback shouldn't occur. +-- not grouping column (1 as c), is added to query targetList to make ORCA +-- perform query normalization. During normalization ORCA modifies the vars of +-- the grouping elements of targetList in order to produce a new Query tree. +-- The modification of vars inside nested part of SubLinks should be handled +-- correctly. ORCA shouldn't fall back due to missing variable entry as a result +-- of incorrect query normalization. explain (verbose, costs off) select j, 1 as c, (select j from (select j) q2) q1 @@ -554,8 +562,9 @@ from t group by j, q1; -- Ensure that both planners produce valid plans for the query with the nested --- SubLink, and this SubLink is under the aggregation. For ORCA the fallback --- shouldn't occur. +-- SubLink, and this SubLink is under aggregation. ORCA shouldn't fall back due +-- to missing variable entry as a result of incorrect query normalization. ORCA +-- should correctly process args of the aggregation during normalization. explain (verbose, costs off) select (select max((select t.i))) from t; diff --git a/src/test/regress/sql/subselect_gp.sql b/src/test/regress/sql/subselect_gp.sql index 1e50eb45904b..d41f299a2592 100644 --- a/src/test/regress/sql/subselect_gp.sql +++ b/src/test/regress/sql/subselect_gp.sql @@ -1216,6 +1216,67 @@ select * from r where b in (select b from s where c=10 order by c); explain (costs off) select * from r where b in (select b from s where c=10 order by c limit 2); select * from r where b in (select b from s where c=10 order by c limit 2); +-- Test nested query with aggregate inside a sublink, +-- ORCA should correctly normalize the aggregate expression inside the +-- sublink's nested query and the column variable accessed in aggregate should +-- be accessible to the aggregate after the normalization of query. +-- If the query is not supported, ORCA should gracefully fallback to postgres +explain (COSTS OFF) with t0 AS ( + SELECT + ROW_TO_JSON((SELECT x FROM (SELECT max(t.b)) x)) + AS c + FROM r + JOIN s ON true + JOIN s as t ON true + ) +SELECT c FROM t0; + +-- +-- Test case for ORCA semi join with random table +-- See https://github.com/greenplum-db/gpdb/issues/16611 +-- +--- case for random distribute +create table table_left (l1 int, l2 int) distributed by (l1); +create table table_right (r1 int, r2 int) distributed randomly; +create index table_right_idx on table_right(r1); +insert into table_left values (1,1); +insert into table_right select i, i from generate_series(1, 300) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; + +--- make sure the same value (1,1) rows are inserted into different segments +select count(distinct gp_segment_id) > 1 from table_right where r1 = 1; +analyze table_left; +analyze table_right; + +-- two types of semi join tests +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); +select * from table_left where exists (select 1 from table_right where l1 = r1); +explain (costs off) select * from table_left where l1 in (select r1 from table_right); +select * from table_left where exists (select 1 from table_right where l1 = r1); + +--- case for replicate distribute +alter table table_right set distributed replicated; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); +select * from table_left where exists (select 1 from table_right where l1 = r1); +explain (costs off) select * from table_left where l1 in (select r1 from table_right); +select * from table_left where exists (select 1 from table_right where l1 = r1); + +--- case for partition table with random distribute +drop table table_right; +create table table_right (r1 int, r2 int) distributed randomly partition by range (r1) ( start (0) end (300) every (100)); +create index table_right_idx on table_right(r1); +insert into table_right select i, i from generate_series(1, 299) i; +insert into table_right select 1, 1 from generate_series(1, 100) i; +analyze table_right; +explain (costs off) select * from table_left where exists (select 1 from table_right where l1 = r1); +select * from table_left where exists (select 1 from table_right where l1 = r1); +explain (costs off) select * from table_left where l1 in (select r1 from table_right); +select * from table_left where exists (select 1 from table_right where l1 = r1); + +-- clean up +drop table table_left; +drop table table_right; + -- Test that Explicit Redistribute Motion is applied properly for -- queries that have modifying operation inside a SubPlan. That -- requires the ModifyTable's top Flow node to be copied correctly inside diff --git a/src/test/regress/sql/truncate_gp.sql b/src/test/regress/sql/truncate_gp.sql index f0b8b3e2c6b5..e79417cddc99 100644 --- a/src/test/regress/sql/truncate_gp.sql +++ b/src/test/regress/sql/truncate_gp.sql @@ -86,3 +86,4 @@ end; -- the heap table segment file size after truncate should be zero select stat_table_segfile_size('regression', 'truncate_with_create_heap'); + diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql index 72b89a0f0c97..8d0b38522dbf 100644 --- a/src/test/regress/sql/with.sql +++ b/src/test/regress/sql/with.sql @@ -1083,3 +1083,42 @@ WITH cte AS ( RESET optimizer; DROP TABLE d; + +-- Test if sharing is disabled for a SegmentGeneral CTE to avoid deadlock if CTE is +-- executed with 1-gang and joined with n-gang +SET optimizer = off; +--start_ignore +DROP TABLE IF EXISTS d; +DROP TABLE IF EXISTS r; +--end_ignore + +CREATE TABLE d (a int, b int) DISTRIBUTED BY (a); +INSERT INTO d VALUES ( 1, 2 ),( 2, 3 ); +CREATE TABLE r (a int, b int) DISTRIBUTED REPLICATED; +INSERT INTO r VALUES ( 1, 2 ),( 3, 4 ); + +EXPLAIN (COSTS off) +WITH cte AS ( + SELECT count(*) a FROM r +) SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + +WITH cte AS ( + SELECT count(*) a FROM r +) SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + +-- Test if sharing is disabled for a General CTE to avoid deadlock if CTE is +-- executed with coordinator gang and joined with n-gang +EXPLAIN (COSTS OFF) +WITH cte AS ( + SELECT count(*) a FROM (VALUES ( 1, 2 ),( 3, 4 )) v +) +SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + +WITH cte AS ( + SELECT count(*) a FROM (VALUES ( 1, 2 ),( 3, 4 )) v +) +SELECT * FROM cte JOIN (SELECT * FROM d JOIN cte USING (a) LIMIT 1) d_join_cte USING (a); + +RESET optimizer; +DROP TABLE d; +DROP TABLE r; diff --git a/src/test/regress/sql/zlib.sql b/src/test/regress/sql/zlib.sql index 97720cf7ed6d..431cd6311244 100644 --- a/src/test/regress/sql/zlib.sql +++ b/src/test/regress/sql/zlib.sql @@ -23,6 +23,38 @@ INSERT INTO test_zlib_hashjoin SELECT i,i,i,i,i,i,i,i FROM (select generate_series(1, nsegments * 333333) as i from (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +-- start_ignore +create language plpythonu; +-- end_ignore + +-- Check if compressed work file count is limited to file_count_limit +-- If the parameter is_comp_buff_limit is true, it means the comp_workfile_created +-- must be smaller than file_count_limit because some work files are not compressed; +-- If the parameter is_comp_buff_limit is false, it means the comp_workfile_created +-- must be equal to file_count_limit because all work files are compressed. +create or replace function check_workfile_compressed(explain_query text, + is_comp_buff_limit bool) +returns setof int as +$$ +import re +rv = plpy.execute(explain_query) +search_text = 'Work file set' +result = [] +for i in range(len(rv)): + cur_line = rv[i]['QUERY PLAN'] + if search_text.lower() in cur_line.lower(): + p = re.compile('(\d+) files \((\d+) compressed\)') + m = p.search(cur_line) + workfile_created = int(m.group(1)) + comp_workfile_created = int(m.group(2)) + if is_comp_buff_limit: + result.append(int(comp_workfile_created < workfile_created)) + else: + result.append(int(comp_workfile_created == workfile_created)) +return result +$$ +language plpythonu; + SET statement_mem=5000; --Fail after workfile creation and before add it to workfile set @@ -86,3 +118,86 @@ drop table test_zlib; drop table test_zlib_t1; select gp_inject_fault('workfile_creation_failure', 'reset', 2); + +-- Test gp_workfile_compression_overhead_limit to control the memory limit used by +-- compressed temp file + +DROP TABLE IF EXISTS test_zlib_memlimit; +create table test_zlib_memlimit(a int, b text, c timestamp) distributed by (a); +insert into test_zlib_memlimit select id, 'test ' || id, clock_timestamp() from + (select generate_series(1, nsegments * 30000) as id from + (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +insert into test_zlib_memlimit select 1,'test', now() from + (select generate_series(1, nsegments * 2000) as id from + (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +insert into test_zlib_memlimit select id, 'test ' || id, clock_timestamp() from + (select generate_series(1, nsegments * 3000) as id from + (select count(*) as nsegments from gp_segment_configuration where role='p' and content >= 0) foo) bar; +analyze test_zlib_memlimit; + +set statement_mem='4500kB'; +set gp_workfile_compression=on; +set gp_workfile_limit_files_per_query=0; + +-- Run the query with a large value of gp_workfile_compression_overhead_limit +-- The compressed file number should be equal to total work file number + +set gp_workfile_compression_overhead_limit=2048000; + +select * from check_workfile_compressed(' +explain (analyze) +with B as (select distinct a+1 as a,b,c from test_zlib_memlimit) +,C as (select distinct a+2 as a,b,c from test_zlib_memlimit) +,D as (select a+3 as a,b,c from test_zlib_memlimit) +,E as (select a+4 as a,b,c from test_zlib_memlimit) +,F as (select (a+5)::text as a,b,c from test_zlib_memlimit) +select count(*) from test_zlib_memlimit A +inner join B on A.a = B.a +inner join C on A.a = C.a +inner join D on A.a = D.a +inner join E on A.a = E.a +inner join F on A.a::text = F.a ;', +false) limit 6; + +-- Run the query with a smaller value of gp_workfile_compression_overhead_limit +-- The compressed file number should be less than total work file number + +set gp_workfile_compression_overhead_limit=1000; + +select * from check_workfile_compressed(' +explain (analyze) +with B as (select distinct a+1 as a,b,c from test_zlib_memlimit) +,C as (select distinct a+2 as a,b,c from test_zlib_memlimit) +,D as (select a+3 as a,b,c from test_zlib_memlimit) +,E as (select a+4 as a,b,c from test_zlib_memlimit) +,F as (select (a+5)::text as a,b,c from test_zlib_memlimit) +select count(*) from test_zlib_memlimit A +inner join B on A.a = B.a +inner join C on A.a = C.a +inner join D on A.a = D.a +inner join E on A.a = E.a +inner join F on A.a::text = F.a ;', +true) limit 6; + +-- Run the query with gp_workfile_compression_overhead_limit=0, which means +-- no limit +-- The compressed file number should be equal to total work file number + +set gp_workfile_compression_overhead_limit=0; + +select * from check_workfile_compressed(' +explain (analyze) +with B as (select distinct a+1 as a,b,c from test_zlib_memlimit) +,C as (select distinct a+2 as a,b,c from test_zlib_memlimit) +,D as (select a+3 as a,b,c from test_zlib_memlimit) +,E as (select a+4 as a,b,c from test_zlib_memlimit) +,F as (select (a+5)::text as a,b,c from test_zlib_memlimit) +select count(*) from test_zlib_memlimit A +inner join B on A.a = B.a +inner join C on A.a = C.a +inner join D on A.a = D.a +inner join E on A.a = E.a +inner join F on A.a::text = F.a ;', +false) limit 6; + +DROP TABLE test_zlib_memlimit;