digoal
2020-12-09
PostgreSQL , 选择性 , 多列统计信息 , 自定义统计信息
PG 14 支持通过多个自定义多字段组合统计信息来评估and, or等复杂条件的选择性.
Improve estimation of OR clauses using multiple extended statistics.
author Dean Rasheed <dean.a.rasheed@gmail.com>
Wed, 9 Dec 2020 03:39:24 +0800 (19:39 +0000)
committer Dean Rasheed <dean.a.rasheed@gmail.com>
Wed, 9 Dec 2020 03:39:24 +0800 (19:39 +0000)
commit 88b0898fe35a5a0325fca21bd4f3ed6dffb364c1
tree fd18e47ff06c40a6349c27f30b29ad57deedde82 tree | snapshot
parent f2a69b352de1dffc534c4835010e736018aa94de commit | diff
Improve estimation of OR clauses using multiple extended statistics.
When estimating an OR clause using multiple extended statistics
objects, treat the estimates for each set of clauses for each
statistics object as independent of one another. The overlap estimates
produced for each statistics object do not apply to clauses covered by
other statistics objects.
Dean Rasheed, reviewed by Tomas Vondra.
Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
Improve estimation of ANDs under ORs using extended statistics.
author Dean Rasheed <dean.a.rasheed@gmail.com>
Wed, 9 Dec 2020 04:10:11 +0800 (20:10 +0000)
committer Dean Rasheed <dean.a.rasheed@gmail.com>
Wed, 9 Dec 2020 04:10:11 +0800 (20:10 +0000)
commit 4f5760d4afa9423fe4d38e4cbec48bf5e793e7e5
tree e2f545d18e44849f81d593c181c08c9bcb77126d tree | snapshot
parent 88b0898fe35a5a0325fca21bd4f3ed6dffb364c1 commit | diff
Improve estimation of ANDs under ORs using extended statistics.
Formerly, extended statistics only handled clauses that were
RestrictInfos. However, the restrictinfo machinery doesn't create
sub-AND RestrictInfos for AND clauses underneath OR clauses.
Therefore teach extended statistics to handle bare AND clauses,
looking for compatible RestrictInfo clauses underneath them.
Dean Rasheed, reviewed by Tomas Vondra.
Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
选择性评估的sql函数如下.
这个函数改动一下就可以用于翻页的记录数估算.
1 -- Generic extended statistics support
2 --
3 -- Note: tables for which we check estimated row counts should be created
4 -- with autovacuum_enabled = off, so that we don't have unstable results
5 -- from auto-analyze happening when we didn't expect it.
6 --
7 -- check the number of estimated/actual rows in the top node
8 create function check_estimated_rows(text) returns table (estimated int, actual int)
9 language plpgsql as
10 $$
11 declare
12 ln text;
13 tmp text[];
14 first_row bool := true;
15 begin
16 for ln in
17 execute format('explain analyze %s', $1)
18 loop
19 if first_row then
20 first_row := false;
21 tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
22 return query select tmp[1]::int, tmp[2]::int;
23 end if;
24 end loop;
25 end;
26 $$;
从优化结果来看, 与实际的返回记录数相差无几.
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index dbbe984..6e1c4f3 100644 (file)
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1706,7 +1706,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 A
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
estimated | actual
-----------+--------
- 1714 | 1572
+ 1571 | 1572
(1 row)
DROP TABLE mcv_lists_multi;
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 6e1c4f3..7bfeaf8 100644 (file)
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1594,13 +1594,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0 AND c = 0) OR (a = 1 AND b = 1 AND c = 1) OR (a = 2 AND b = 2 AND c = 2)');
estimated | actual
-----------+--------
- 300 | 306
+ 306 | 306
(1 row)
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');
estimated | actual
-----------+--------
- 306 | 102
+ 108 | 102
(1 row)
DROP TABLE mcv_lists_partial;
函数改动如下, 可以用于翻页的记录数估算.
create function check_estimated_rows(text) returns table (estimated int)
language plpgsql as
$$
declare
ln text;
tmp text[];
first_row bool := true;
begin
for ln in
execute format('explain %s', $1)
loop
if first_row then
first_row := false;
tmp := regexp_match(ln, 'rows=(\d*)');
return query select tmp[1]::int;
end if;
end loop;
end;
$$;
postgres=# select * from check_estimated_rows($_$select * from pg_class$_$);
estimated
-----------
459146
(1 row)
postgres=# select count(*) from pg_class;
count
--------
459150
(1 row)
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.