Skip to content

Files

Latest commit

Dec 15, 2023
f400df2 · Dec 15, 2023

History

History
201 lines (154 loc) · 7.74 KB

20201209_02.md

File metadata and controls

201 lines (154 loc) · 7.74 KB

PostgreSQL 14 preview - 增强or, and多条件选择性评估 - 使用multiple extended statistics

作者

digoal

日期

2020-12-09

标签

PostgreSQL , 选择性 , 多列统计信息 , 自定义统计信息


背景

PG 14 支持通过多个自定义多字段组合统计信息来评估and, or等复杂条件的选择性.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=88b0898fe35a5a0325fca21bd4f3ed6dffb364c1

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4f5760d4afa9423fe4d38e4cbec48bf5e793e7e5

Improve estimation of OR clauses using multiple extended statistics.    
author	Dean Rasheed <dean.a.rasheed@gmail.com>	    
Wed, 9 Dec 2020 03:39:24 +0800 (19:39 +0000)    
committer	Dean Rasheed <dean.a.rasheed@gmail.com>	    
Wed, 9 Dec 2020 03:39:24 +0800 (19:39 +0000)    
commit	88b0898fe35a5a0325fca21bd4f3ed6dffb364c1    
tree	fd18e47ff06c40a6349c27f30b29ad57deedde82	tree | snapshot    
parent	f2a69b352de1dffc534c4835010e736018aa94de	commit | diff    
Improve estimation of OR clauses using multiple extended statistics.    
    
When estimating an OR clause using multiple extended statistics    
objects, treat the estimates for each set of clauses for each    
statistics object as independent of one another. The overlap estimates    
produced for each statistics object do not apply to clauses covered by    
other statistics objects.    
    
Dean Rasheed, reviewed by Tomas Vondra.    
    
Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com    
Improve estimation of ANDs under ORs using extended statistics.    
author	Dean Rasheed <dean.a.rasheed@gmail.com>	    
Wed, 9 Dec 2020 04:10:11 +0800 (20:10 +0000)    
committer	Dean Rasheed <dean.a.rasheed@gmail.com>	    
Wed, 9 Dec 2020 04:10:11 +0800 (20:10 +0000)    
commit	4f5760d4afa9423fe4d38e4cbec48bf5e793e7e5    
tree	e2f545d18e44849f81d593c181c08c9bcb77126d	tree | snapshot    
parent	88b0898fe35a5a0325fca21bd4f3ed6dffb364c1	commit | diff    
Improve estimation of ANDs under ORs using extended statistics.    
    
Formerly, extended statistics only handled clauses that were    
RestrictInfos. However, the restrictinfo machinery doesn't create    
sub-AND RestrictInfos for AND clauses underneath OR clauses.    
Therefore teach extended statistics to handle bare AND clauses,    
looking for compatible RestrictInfo clauses underneath them.    
    
Dean Rasheed, reviewed by Tomas Vondra.    
    
Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com    

选择性评估的sql函数如下.

这个函数改动一下就可以用于翻页的记录数估算.

   1 -- Generic extended statistics support    
   2 --    
   3 -- Note: tables for which we check estimated row counts should be created    
   4 -- with autovacuum_enabled = off, so that we don't have unstable results    
   5 -- from auto-analyze happening when we didn't expect it.    
   6 --    
   7 -- check the number of estimated/actual rows in the top node    
   8 create function check_estimated_rows(text) returns table (estimated int, actual int)    
   9 language plpgsql as    
  10 $$    
  11 declare    
  12     ln text;    
  13     tmp text[];    
  14     first_row bool := true;    
  15 begin    
  16     for ln in    
  17         execute format('explain analyze %s', $1)    
  18     loop    
  19         if first_row then    
  20             first_row := false;    
  21             tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');    
  22             return query select tmp[1]::int, tmp[2]::int;    
  23         end if;    
  24     end loop;    
  25 end;    
  26 $$;    

从优化结果来看, 与实际的返回记录数相差无几.

diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out    
index dbbe984..6e1c4f3 100644 (file)    
--- a/src/test/regress/expected/stats_ext.out    
+++ b/src/test/regress/expected/stats_ext.out    
@@ -1706,7 +1706,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 A    
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');    
  estimated | actual     
 -----------+--------    
-      1714 |   1572    
+      1571 |   1572    
 (1 row)    
     
 DROP TABLE mcv_lists_multi;    
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out    
index 6e1c4f3..7bfeaf8 100644 (file)    
--- a/src/test/regress/expected/stats_ext.out    
+++ b/src/test/regress/expected/stats_ext.out    
@@ -1594,13 +1594,13 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE a = 0    
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0 AND c = 0) OR (a = 1 AND b = 1 AND c = 1) OR (a = 2 AND b = 2 AND c = 2)');    
  estimated | actual     
 -----------+--------    
-       300 |    306    
+       306 |    306    
 (1 row)    
     
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');    
  estimated | actual     
 -----------+--------    
-       306 |    102    
+       108 |    102    
 (1 row)    
     
 DROP TABLE mcv_lists_partial;    

函数改动如下, 可以用于翻页的记录数估算.

create function check_estimated_rows(text) returns table (estimated int)    
language plpgsql as    
$$    
declare    
       ln text;    
       tmp text[];    
       first_row bool := true;    
   begin    
       for ln in    
           execute format('explain %s', $1)    
       loop    
           if first_row then    
               first_row := false;    
               tmp := regexp_match(ln, 'rows=(\d*)');    
               return query select tmp[1]::int;    
           end if;    
       end loop;    
   end;    
$$;    
  
  
postgres=# select * from check_estimated_rows($_$select * from pg_class$_$);  
 estimated   
-----------  
    459146  
(1 row)  
  
  
postgres=# select count(*) from pg_class;  
 count    
--------  
 459150  
(1 row)  

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat