Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-5479: Fix queryId calculation for queries with grouping clause #1119

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

silent-observer
Copy link

@silent-observer silent-observer commented Nov 13, 2024

Fix queryId calculation for queries with grouping clause (#665)

Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tag and T_TableValueExpr and
T_GroupingSet is handled incorrectly. Jumble hashing is used as queryId.
Error message "unrecognized RTE kind: 7" caused by not handled range table
type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingSet we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:

  1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
  2. Remove case for T_Integer because it is not necessary since GPDB 7 uses
    T_IntList instead.
  3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
    GPDB 7, change to match implementation from GPDB 6.
  4. Rename test to jumble since there were already tests present in GPDB 7.
  5. The tests in pg_stat_statements were disabled before because they do not
    set up shared_preload_libraries and they don't disable optimizer. Add files
    enable_pg_stat_statements and disable_pg_stat_statements to adapt
    pg_stat_statements tests to current testing utilities. This allows to
    remove some initialization and cleanup from the new test.
  6. Fix test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
    the table has no rows in it.

(cherry picked from commit fa44e50)


Note: Do not squash to preserve authorship
To test comment out the NO_INSTALLCHECK = 1 in contrib/pg_stat_statements/Makefile, and then run

gpconfig -c shared_preload_libraries -v 'pg_stat_statements'
gpconfig -c optimizer -v 'off'
gpstop -raq -M fast
cd contrib/pg_stat_statements
make installcheck

@silent-observer silent-observer marked this pull request as ready for review November 14, 2024 05:50
@silent-observer silent-observer force-pushed the ADBDEV-5479 branch 4 times, most recently from 56dfd47 to 255da63 Compare November 18, 2024 06:04
@whitehawk

This comment was marked as resolved.

@silent-observer silent-observer force-pushed the ADBDEV-5479 branch 2 times, most recently from a258684 to 2e77455 Compare November 20, 2024 06:42
@KnightMurloc
Copy link

Why do we need a new test file? Why can't we use an existing one?

@silent-observer
Copy link
Author

This test file is completely unrelated to the existing one: pg_stat_statements.sql tests the extension itself while jumble.sql tests jumbling logic that is currently in GPDB core (but not easily accessible outside of pg_stat_statements, that's why the test is here)

@KnightMurloc
Copy link

I'm not sure how good an idea it is to take the description of the commit from 6X as it is and just add a list of changes at the end. Considering that the patch is significantly different from the 6X. It seems to me that the description should be updated.

Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tag and T_TableValueExpr and
T_GroupingSet is handled incorrectly. Jumble hashing is used as queryId.
Error message "unrecognized RTE kind: 7" caused by not handled range table
type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingSet we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.

Changes from original commit:
1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed.
2. Remove case for T_Integer because it is not necessary since GPDB 7 uses
   T_IntList instead.
3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in
   GPDB 7, change to match implementation from GPDB 6.
4. Rename test to jumble since there were already tests present in GPDB 7.
5. The tests in pg_stat_statements were disabled before because they do not
   set up shared_preload_libraries and they don't disable optimizer. Add files
   enable_pg_stat_statements and disable_pg_stat_statements to adapt
   pg_stat_statements tests to current testing utilities. This allows to
   remove some initialization and cleanup from the new test.
6. Fix test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if
   the table has no rows in it.

(cherry picked from commit fa44e50)
@silent-observer silent-observer enabled auto-merge (rebase) November 25, 2024 08:04
@silent-observer silent-observer merged commit e8aa067 into adb-7.2.0 Nov 25, 2024
5 checks passed
@silent-observer silent-observer deleted the ADBDEV-5479 branch November 25, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants