Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix queryId calculation for queries with grouping clause #665

Merged
merged 17 commits into from
Jan 25, 2024

Conversation

whitehawk
Copy link

@whitehawk whitehawk commented Dec 21, 2023

Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Problem description:
Following warning messages appear
    "WARNING: unrecognized node type: 925"
    "WARNING: unrecognized node type: 924"
if 'pg_stat_statements' extension is enabled and query,
containing GROUP BY clause or GROUPING function, is executed.

Cause:
In JumbleExpr, called from JumbleQuery, there is no handling
for node tags T_GroupingClause and T_GroupingFunc.

Fix:
Added handling logic for missed node tags.
    For T_GroupingFunc we append to the jumble the tag itself
and list of arguments. Field 'ngrpcols' is not appended to the
jumble, because it can be deduced from the list of arguments
in grouping clause. Plus added handling for T_Integer tag,
because it is required to parse list of arguments.
    For T_GroupingClause we append to the jumble the tag itself,
grouping type and list of groupsets. Field 'location' is not
appended to the jumble, because it is the textual location from
parser and is not semantically significant.
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/60788

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/909714

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/909715

If enabled, pg_stat_statements may cause warnings in case jumbling logic
doesn't handle some part of a query. This simple test will check that
no warnings are generated on different queries. Currently added 1 query
with GROUP clause and GROUPING function, as it recently caused such issues.
This test later may be improved by adding more test queries.
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/60873

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/915047

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/915048

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/60887

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/915839

@BenderArenadata
Copy link

Failed job Behave tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/917776

@BenderArenadata
Copy link

Failed job Behave tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/920047

Modified the pg_stat_statements test to enable the pg_stat_statements
extension automatically at the beginning of the test, and disable the
extension once the test is complete.
Also added minor refactoring of recently added code in JumbleExpr - moved
all new code to the end of the switch case in order to simplify the patch.
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61084

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/928426

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/928427

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/928417

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/928525

@KnightMurloc
Copy link

Can you provide a test not only to check for the absence of warnings, but also that the query text in pg_stat_statements differs with and without a patch (if possible). To check the correctness of the logic implementation.

src/backend/utils/misc/queryjumble.c Outdated Show resolved Hide resolved
src/backend/utils/misc/queryjumble.c Outdated Show resolved Hide resolved
@KnightMurloc
Copy link

it seems to me that the problem is more that queryId is calculated incorrectly (for example, the group by type is not taken into account). it is worth mentioning this as a problem in the description and testing it.

With this update the test for pg_stat_statements verifies that
queryId for queries with semantically different GROUP BY options are
different (thus these queries reside in seperate entries of pg_stat_statements
statistics).
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61416

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/949680

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/949681

@whitehawk whitehawk changed the title Fix warnings in Jumbling logic Fix queryId calculation for queries with grouping clause Jan 10, 2024
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62408

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/992650

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/992651

@BenderArenadata
Copy link

Failed job Behave tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/992648

@whitehawk whitehawk dismissed stale reviews from KnightMurloc and RekGRpth via c3bd1fb January 24, 2024 23:25
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/62478

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/996082

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/996083

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/996076

@BenderArenadata
Copy link

Failed job Behave tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/996080

@andr-sokolov andr-sokolov merged commit fa44e50 into adb-6.x-dev Jan 25, 2024
5 checks passed
@andr-sokolov andr-sokolov deleted the ADBDEV-4486 branch January 25, 2024 09:33
@Stolb27 Stolb27 mentioned this pull request Mar 13, 2024
silent-observer pushed a commit that referenced this pull request Nov 13, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Case for T_Integer is not necessary since GPDB 7 uses T_IntList, removed.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, changed to match implementation from GPDB 6.
4. Test renamed to gp_pg_stat_statements, since there were already tests
   present in GPDB 7.
5. Currently the tests for pg_stat_statements do not set up
   shared_preload_libraries and they don't disable optimizer. Changed the new
   test to match existing ones, added a comment about optimizer to Makefile.
6. Fixed test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 14, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Case for T_Integer is not necessary since GPDB 7 uses T_IntList, removed.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, changed to match implementation from GPDB 6.
4. Test renamed to jumble, since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled because they do not set up
   shared_preload_libraries and they don't disable optimizer. Added files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allowed to
   remove some initialization and cleanup from the new test.
6. Fixed test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 15, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Case for T_Integer is not necessary since GPDB 7 uses T_IntList, removed.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, changed to match implementation from GPDB 6.
4. Test renamed to jumble, since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled because they do not set up
   shared_preload_libraries and they don't disable optimizer. Added files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allowed to
   remove some initialization and cleanup from the new test.
6. Fixed test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 15, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Case for T_Integer is not necessary since GPDB 7 uses T_IntList, removed.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, changed to match implementation from GPDB 6.
4. Test renamed to jumble, since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled because they do not set up
   shared_preload_libraries and they don't disable optimizer. Added files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allowed to
   remove some initialization and cleanup from the new test.
6. Fixed test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 15, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Case for T_Integer is not necessary since GPDB 7 uses T_IntList, removed.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, changed to match implementation from GPDB 6.
4. Test renamed to jumble, since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled because they do not set up
   shared_preload_libraries and they don't disable optimizer. Added files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allowed to
   remove some initialization and cleanup from the new test.
6. Fixed test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 18, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Case for T_Integer is not necessary since GPDB 7 uses T_IntList, removed.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, changed to match implementation from GPDB 6.
4. Test renamed to jumble, since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled because they do not set up
   shared_preload_libraries and they don't disable optimizer. Added files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allowed to
   remove some initialization and cleanup from the new test.
6. Fixed test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 19, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Remove case for T_Integer because it is not necessary since GPDB 7 uses
   T_IntList instead.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, change to match implementation from GPDB 6.
4. Rename test to jumble since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled before because they do not
   set up shared_preload_libraries and they don't disable optimizer. Add files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allows to
   remove some initialization and cleanup from the new test.
6. Fix test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 20, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Remove case for T_Integer because it is not necessary since GPDB 7 uses
   T_IntList instead.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, change to match implementation from GPDB 6.
4. Rename test to jumble since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled before because they do not
   set up shared_preload_libraries and they don't disable optimizer. Add files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allows to
   remove some initialization and cleanup from the new test.
6. Fix test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 25, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tag and T_TableValueExpr and
T_GroupingSet is handled incorrectly. Jumble hashing is used as queryId.
Error message "unrecognized RTE kind: 7" caused by not handled range table
type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingSet we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Remove case for T_Integer because it is not necessary since GPDB 7 uses
   T_IntList instead.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, change to match implementation from GPDB 6.
4. Rename test to jumble since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled before because they do not
   set up shared_preload_libraries and they don't disable optimizer. Add files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allows to
   remove some initialization and cleanup from the new test.
6. Fix test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
silent-observer pushed a commit that referenced this pull request Nov 25, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tag and T_TableValueExpr and
T_GroupingSet is handled incorrectly. Jumble hashing is used as queryId.
Error message "unrecognized RTE kind: 7" caused by not handled range table
type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingSet we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Remove case for T_Integer because it is not necessary since GPDB 7 uses
   T_IntList instead.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, change to match implementation from GPDB 6.
4. Rename test to jumble since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled before because they do not
   set up shared_preload_libraries and they don't disable optimizer. Add files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allows to
   remove some initialization and cleanup from the new test.
6. Fix test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
hilltracer pushed a commit to hilltracer/gpdb that referenced this pull request Dec 9, 2024
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tag and T_TableValueExpr and
T_GroupingSet is handled incorrectly. Jumble hashing is used as queryId.
Error message "unrecognized RTE kind: 7" caused by not handled range table
type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingSet we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Remove case for T_Integer because it is not necessary since GPDB 7 uses
   T_IntList instead.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, change to match implementation from GPDB 6.
4. Rename test to jumble since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled before because they do not
   set up shared_preload_libraries and they don't disable optimizer. Add files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allows to
   remove some initialization and cleanup from the new test.
6. Fix test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants