Skip to content

Commit

Permalink
Fix queryId calculation for queries with grouping clause (#665)
Browse files Browse the repository at this point in the history
Problem description:
if 'pg_stat_statements' extension is enabled and a query containing GROUP BY
clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is
executed, queryId calculation doesn't take into account these grouping clause
parameters and GROUPING functions. So, semantically different queries are
treated by pg_stat_statements as equivalent queries. Plus warning messages
appear during jumbling.
Also, jumbling logic doesn't handle properly queries with group_id() function
call and calls of functions with 'anytable' parameter.
First type of queries causes warnings.
Second type of query causes error message "unrecognized RTE kind: 7".

Expected correct behavior: queries with different grouping clauses or different
usage of the GROUPING function are treated by pg_stat_statements as different
queries, and no warning messages appear during the query execution.

Cause:
In JumbleExpr there is no handling for node tags T_GroupingClause,
T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used
as queryId. Error message "unrecognized RTE kind: 7" caused by not handled
range table type RTE_TABLEFUNCTION in JumbleRangeTable.

Fix:
Handling logic for missed node tags was added.
According to comments in queryjumble.c, main guideline how to handle
query tree is: "Rule of thumb for what to include is that we should ignore
anything not semantically significant (such as alias names) as well as
anything that can be deduced from child nodes (else we'd just be double-
hashing that piece of information)."

For T_GroupingFunc we append to the jumble the list of
arguments. Field 'ngrpcols' is not appended to the jumble, because it can be
deduced from the list of groupsets in grouping clause. 'ngrpcols' is the
number of unique grouping attributes in grouping clause. Equivalent queries
must have the same groupsets in grouping clause. Thus they will have the same
'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant.
Plus handling for the T_Integer tag was added, because it is required to parse
the list of arguments. List of arguments (field 'args' of structure
GroupingFunc) is a list of T_Integer. The T_Integer element is the index of
GROUPING function parameter inside the array of unique grouping attributes
from grouping clause. We need to jumble value of this T_Integer, because
changing parameter in GROUPING function changes this index (and definitely
changes semantic of the query).

For T_GroupingClause we append to the jumble the grouping type
and list of groupsets. Field 'location' is not appended to the jumble because
it is the textual location from parser and is not semantically significant.

For T_GroupId - this tag was added to switch case inside JumbleExpr to
suppress warning. The node tag for it was already handled. Struct GroupId
doesn't have any additional fields, that could be added to jumble.

For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the
jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and
'subquery' fields (other significant fields are null). But 'subquery' is
duplicated in TableValueExpr below, so do not jumble it.

For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery
that is inside "TABLE()" statement) was added to the jumble.
  • Loading branch information
whitehawk authored Jan 25, 2024
1 parent 85fb13c commit fa44e50
Show file tree
Hide file tree
Showing 5 changed files with 230 additions and 0 deletions.
3 changes: 3 additions & 0 deletions GNUmakefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ $(call recurse,all install,src config)
all:
$(MAKE) -C contrib/amcheck all
$(MAKE) -C contrib/auto_explain all
$(MAKE) -C contrib/pg_stat_statements all
$(MAKE) -C contrib/citext all
$(MAKE) -C contrib/file_fdw all
$(MAKE) -C contrib/formatter all
Expand Down Expand Up @@ -67,6 +68,7 @@ html man:
install:
$(MAKE) -C contrib/amcheck $@
$(MAKE) -C contrib/auto_explain $@
$(MAKE) -C contrib/pg_stat_statements $@
$(MAKE) -C contrib/citext $@
#$(MAKE) -C contrib/file_fdw $@ # GPDB_91_MERGE_FIXME: disable installation until it's officially supported.
$(MAKE) -C contrib/formatter $@
Expand Down Expand Up @@ -174,6 +176,7 @@ $(call recurse,check-world,src/test src/pl src/interfaces/ecpg contrib src/bin g
# recipe body).
ICW_TARGETS = src/test src/pl src/interfaces/gppc
ICW_TARGETS += contrib/amcheck contrib/auto_explain contrib/citext
ICW_TARGETS += contrib/pg_stat_statements
ICW_TARGETS += contrib/formatter_fixedwidth
ICW_TARGETS += contrib/extprotocol
ICW_TARGETS += contrib/pg_trgm contrib/btree_gin contrib/isn
Expand Down
2 changes: 2 additions & 0 deletions contrib/pg_stat_statements/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ EXTENSION = pg_stat_statements
DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql

REGRESS = test_pg_stat_statements

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
Expand Down
135 changes: 135 additions & 0 deletions contrib/pg_stat_statements/expected/test_pg_stat_statements.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
CREATE TABLE t(a int, b text) DISTRIBUTED BY (a);
-- Known issue: query is not added to pg_stat_statements statistics in
-- case it is planned by GPORCA. So disable GPORCA during tests.
SET optimizer=off;
SELECT pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

SELECT GROUPING (a) FROM t GROUP BY ROLLUP(a, b);
grouping
----------
(0 rows)

-- launch not equivalent query
SELECT GROUPING (b) FROM t GROUP BY ROLLUP(a, b);
grouping
----------
(0 rows)

-- check group_id() in a query
SELECT group_id() FROM t GROUP BY ROLLUP(a, b);
?column?
----------
(0 rows)

-- check that queries have separate entries
SELECT query, calls FROM pg_stat_statements ORDER BY query;
query | calls
---------------------------------------------------+-------
SELECT group_id() FROM t GROUP BY ROLLUP(a, b); | 1
SELECT GROUPING (a) FROM t GROUP BY ROLLUP(a, b); | 1
SELECT GROUPING (b) FROM t GROUP BY ROLLUP(a, b); | 1
SELECT pg_stat_statements_reset(); | 1
(4 rows)

SELECT pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

-- check that different grouping options result in separate entries
SELECT COUNT (*) FROM t GROUP BY ROLLUP(a, b);
count
-------
(0 rows)

SELECT COUNT (*) FROM t GROUP BY CUBE(a, b);
count
-------
(0 rows)

SELECT COUNT (*) FROM t GROUP BY GROUPING SETS(a, b);
count
-------
(0 rows)

SELECT COUNT (*) FROM t GROUP BY GROUPING SETS((a), (a, b));
count
-------
(0 rows)

SELECT COUNT (*) FROM t GROUP BY a, b;
count
-------
(0 rows)

SELECT query, calls FROM pg_stat_statements ORDER BY query;
query | calls
--------------------------------------------------------------+-------
SELECT COUNT (*) FROM t GROUP BY a, b; | 1
SELECT COUNT (*) FROM t GROUP BY CUBE(a, b); | 1
SELECT COUNT (*) FROM t GROUP BY GROUPING SETS((a), (a, b)); | 1
SELECT COUNT (*) FROM t GROUP BY GROUPING SETS(a, b); | 1
SELECT COUNT (*) FROM t GROUP BY ROLLUP(a, b); | 1
SELECT pg_stat_statements_reset(); | 1
(6 rows)

-- check several parameters options in ROLLUP
-- all should result in separate entries
SELECT pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

SELECT COUNT (*) FROM t GROUP BY ROLLUP(a, b);
count
-------
(0 rows)

SELECT COUNT (*) FROM t GROUP BY ROLLUP(b);
count
-------
(0 rows)

SELECT query, calls FROM pg_stat_statements ORDER BY query;
query | calls
------------------------------------------------+-------
SELECT COUNT (*) FROM t GROUP BY ROLLUP(a, b); | 1
SELECT COUNT (*) FROM t GROUP BY ROLLUP(b); | 1
SELECT pg_stat_statements_reset(); | 1
(3 rows)

--- check anytable parameter for a function
SELECT pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

-- call of anytable_out will cause an error,
-- thus prevent actual call by adding FALSE condition
SELECT * FROM anytable_out(TABLE(SELECT * FROM t)) WHERE 1 = 0;
anytable_out
--------------
(0 rows)

SELECT * FROM anytable_out(TABLE(SELECT * FROM t WHERE a=0)) WHERE 1 = 0;
anytable_out
--------------
(0 rows)

SELECT query, calls FROM pg_stat_statements ORDER BY query;
query | calls
---------------------------------------------------------------------------+-------
SELECT * FROM anytable_out(TABLE(SELECT * FROM t)) WHERE ? = ?; | 1
SELECT * FROM anytable_out(TABLE(SELECT * FROM t WHERE a=?)) WHERE ? = ?; | 1
SELECT pg_stat_statements_reset(); | 1
(3 rows)

RESET optimizer;
DROP TABLE t;
61 changes: 61 additions & 0 deletions contrib/pg_stat_statements/sql/test_pg_stat_statements.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
-- start_ignore
\! gpconfig -c shared_preload_libraries -v 'pg_stat_statements';
\! gpstop -raq -M fast;
\c
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
DROP TABLE IF EXISTS t;
-- end_ignore
CREATE TABLE t(a int, b text) DISTRIBUTED BY (a);

-- Known issue: query is not added to pg_stat_statements statistics in
-- case it is planned by GPORCA. So disable GPORCA during tests.
SET optimizer=off;

SELECT pg_stat_statements_reset();

SELECT GROUPING (a) FROM t GROUP BY ROLLUP(a, b);
-- launch not equivalent query
SELECT GROUPING (b) FROM t GROUP BY ROLLUP(a, b);
-- check group_id() in a query
SELECT group_id() FROM t GROUP BY ROLLUP(a, b);

-- check that queries have separate entries
SELECT query, calls FROM pg_stat_statements ORDER BY query;

SELECT pg_stat_statements_reset();

-- check that different grouping options result in separate entries
SELECT COUNT (*) FROM t GROUP BY ROLLUP(a, b);
SELECT COUNT (*) FROM t GROUP BY CUBE(a, b);
SELECT COUNT (*) FROM t GROUP BY GROUPING SETS(a, b);
SELECT COUNT (*) FROM t GROUP BY GROUPING SETS((a), (a, b));
SELECT COUNT (*) FROM t GROUP BY a, b;

SELECT query, calls FROM pg_stat_statements ORDER BY query;

-- check several parameters options in ROLLUP
-- all should result in separate entries
SELECT pg_stat_statements_reset();

SELECT COUNT (*) FROM t GROUP BY ROLLUP(a, b);
SELECT COUNT (*) FROM t GROUP BY ROLLUP(b);

SELECT query, calls FROM pg_stat_statements ORDER BY query;

--- check anytable parameter for a function
SELECT pg_stat_statements_reset();

-- call of anytable_out will cause an error,
-- thus prevent actual call by adding FALSE condition
SELECT * FROM anytable_out(TABLE(SELECT * FROM t)) WHERE 1 = 0;
SELECT * FROM anytable_out(TABLE(SELECT * FROM t WHERE a=0)) WHERE 1 = 0;

SELECT query, calls FROM pg_stat_statements ORDER BY query;

RESET optimizer;

DROP TABLE t;
-- start_ignore
\! gpconfig -r shared_preload_libraries;
\! gpstop -raq -M fast;
-- end_ignore
29 changes: 29 additions & 0 deletions src/backend/utils/misc/queryjumble.c
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ JumbleRangeTable(JumbleState *jstate, List *rtable)
APP_JUMB(rte->jointype);
break;
case RTE_FUNCTION:
case RTE_TABLEFUNCTION:
JumbleExpr(jstate, (Node *) rte->functions);
break;
case RTE_VALUES:
Expand Down Expand Up @@ -597,6 +598,34 @@ JumbleExpr(JumbleState *jstate, Node *node)
JumbleExpr(jstate, rtfunc->funcexpr);
}
break;
case T_GroupId:
/*
* Struct GroupId has only the node tag, already added to the jumble
* in the code above. So just leave case empty to suppress warning.
*/
break;
case T_Integer:
APP_JUMB(intVal(node));
break;
case T_TableValueExpr:
{
TableValueExpr *tve = (TableValueExpr *) node;
JumbleQueryInternal(jstate, (Query *) tve->subquery);
}
break;
case T_GroupingClause:
{
GroupingClause *gc = (GroupingClause *) node;
APP_JUMB(gc->groupType);
JumbleExpr(jstate, (Node *) gc->groupsets);
}
break;
case T_GroupingFunc:
{
GroupingFunc *gf = (GroupingFunc *) node;
JumbleExpr(jstate, (Node *) gf->args);
}
break;
default:
/* Only a warning, since we can stumble along anyway */
elog(WARNING, "unrecognized node type: %d",
Expand Down

0 comments on commit fa44e50

Please sign in to comment.