Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix queryId calculation for queries with grouping clause (#665)
Problem description: if 'pg_stat_statements' extension is enabled and a query containing GROUP BY clause with ROLLUP, CUBE or GROUPING SETS or with GROUPING function is executed, queryId calculation doesn't take into account these grouping clause parameters and GROUPING functions. So, semantically different queries are treated by pg_stat_statements as equivalent queries. Plus warning messages appear during jumbling. Also, jumbling logic doesn't handle properly queries with group_id() function call and calls of functions with 'anytable' parameter. First type of queries causes warnings. Second type of query causes error message "unrecognized RTE kind: 7". Expected correct behavior: queries with different grouping clauses or different usage of the GROUPING function are treated by pg_stat_statements as different queries, and no warning messages appear during the query execution. Cause: In JumbleExpr there is no handling for node tags T_GroupingClause, T_GroupingFunc, T_GroupId and T_TableValueExpr. Jumble hashing is used as queryId. Error message "unrecognized RTE kind: 7" caused by not handled range table type RTE_TABLEFUNCTION in JumbleRangeTable. Fix: Handling logic for missed node tags was added. According to comments in queryjumble.c, main guideline how to handle query tree is: "Rule of thumb for what to include is that we should ignore anything not semantically significant (such as alias names) as well as anything that can be deduced from child nodes (else we'd just be double- hashing that piece of information)." For T_GroupingFunc we append to the jumble the list of arguments. Field 'ngrpcols' is not appended to the jumble, because it can be deduced from the list of groupsets in grouping clause. 'ngrpcols' is the number of unique grouping attributes in grouping clause. Equivalent queries must have the same groupsets in grouping clause. Thus they will have the same 'ngrpcols'. So, adding 'ngrpcols' to jumble is redundant. Plus handling for the T_Integer tag was added, because it is required to parse the list of arguments. List of arguments (field 'args' of structure GroupingFunc) is a list of T_Integer. The T_Integer element is the index of GROUPING function parameter inside the array of unique grouping attributes from grouping clause. We need to jumble value of this T_Integer, because changing parameter in GROUPING function changes this index (and definitely changes semantic of the query). For T_GroupingClause we append to the jumble the grouping type and list of groupsets. Field 'location' is not appended to the jumble because it is the textual location from parser and is not semantically significant. For T_GroupId - this tag was added to switch case inside JumbleExpr to suppress warning. The node tag for it was already handled. Struct GroupId doesn't have any additional fields, that could be added to jumble. For RTE_TABLEFUNCTION - 'functions' field of RangeTblEntry was added to the jumble. RangeTblEntry in case of RTE_TABLEFUNCTION uses 'functions' and 'subquery' fields (other significant fields are null). But 'subquery' is duplicated in TableValueExpr below, so do not jumble it. For T_TableValueExpr - 'subquery' field of TableValueExpr (the subquery that is inside "TABLE()" statement) was added to the jumble. Changes from original commit: 1. Cases for T_GroupId and T_GroupingFunc are already present, no fix needed. 2. Case for T_Integer is not necessary since GPDB 7 uses T_IntList, removed. 3. T_GroupingClause is now named T_GroupingSet. Its case was not correct in GPDB 7, changed to match implementation from GPDB 6. 4. Test renamed to gp_pg_stat_statements, since there were already tests present in GPDB 7. 5. Currently the tests for pg_stat_statements do not set up shared_preload_libraries and they don't disable optimizer. Changed the new test to match existing ones, added a comment about optimizer to Makefile. 6. Fixed test output for GPDB 7. Notably, ROLLUP now outputs 1 row even if the table has no rows in it. (cherry picked from commit fa44e50)
- Loading branch information