new doc

digoal · digoal · commit 1eeec5dc0b3a · 2024-05-11T21:39:01.000+08:00
diff --git a/202405/20240511_01.md b/202405/20240511_01.md
@@ -0,0 +1,240 @@
+## DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱     
+                
+### 作者                
+digoal                
+                
+### 日期                
+2024-05-10                
+                
+### 标签                
+PostgreSQL , PolarDB , DuckDB , explain , node , 路径 , 可能性 , 成本因子 , 选择性 , 统计信息 , 成本公式 , 成本常数    
+----                
+                
+## 背景                
+[视频回放]()                
+                
+CBO(基于代价的优化)是目前数据库优化器用来选择最优执行路径的常用方法.  
+  
+方法是有了, 但是要得到正确(最优)的执行路径, 取决于代价计算的准确性.  
+  
+例如一条SQL既能使用索引扫描, 又能使用位图扫描, 还能使用全表扫描. 这三种扫描方法的代价是怎么估算的? 公式.   
+- 公式设计影响了代价的计算结果,     
+  - 公式里面用到的统计信息准不准(决定了选择性),    
+  - 公式里面的系数(成本因子, 如离散扫描代价, 顺序扫描代价等),    
+  ```  
+  # - Planner Cost Constants -  
+  #seq_page_cost = 1.0                    # measured on an arbitrary scale  
+  #random_page_cost = 4.0                 # same scale as above  
+  #cpu_tuple_cost = 0.01                  # same scale as above  
+  #cpu_index_tuple_cost = 0.005           # same scale as above  
+  #cpu_operator_cost = 0.0025             # same scale as above  
+  #parallel_setup_cost = 1000.0   # same scale as above  
+  #parallel_tuple_cost = 0.1              # same scale as above  
+  #min_parallel_table_scan_size = 8MB  
+  #min_parallel_index_scan_size = 512kB  
+  #effective_cache_size = 4GB  
+  #jit_above_cost = 100000                # perform JIT compilation if available  
+                                          # and query more expensive than this;  
+                                          # -1 disables  
+  #jit_inline_above_cost = 500000         # inline small functions if query is  
+                                          # more expensive than this; -1 disables  
+  #jit_optimize_above_cost = 500000       # use expensive JIT optimizations if  
+                                          # query is more expensive than this;  
+                                          # -1 disables  
+  ```  
+  - 公式里面的常数(如`hash join`的死成本等).     
+  
+每个操作都会涉及到代价的计算, 例如扫描有代价, 计算有代价, 索引有选择代价, 关联有代价等等.  每个操作的代价计算都有不同的公式.  公式可以简单的表达为 ``` cost = startup cost + func(成本因子 * 统计信息->选择性, 常数) ```    
+  
+代价计算公式是代码写死的, 记录选择性计算公式是写死的, 成本常数是代码写死的. 如果要影响执行计划的选择, 用户能干什么?  
+- 1 让统计信息更准确, 通常能做的也比较少, 可以修改`default_statistics_target`增加柱状图bucket个数.     
+- 2 让代价因子更准确(严格来说让这些因子更适合你的硬件环境, 例如机械盘的random_page_cost比seq_page_cost大很多, 而ssd的random_page_cost比seq_page_cost差不多)  
+- 3 直接给HINT, 例如使用pg_hint_plan插件  
+- 4 修改GUC开关, 禁止某些路径, 例如`enable_hashjoin=off enable_indexscan=off ... `  
+  
+本文主要讨论2, 之前写过一些文章来教大家校对因子.    
+- [《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》](../201311/20131126_03.md)    
+- [《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》](../201404/20140423_01.md)    
+- [《DB吐槽大会,第12期 - 没有自动成本校准器》](../202108/20210830_03.md)    
+  
+但是我这一期想吐槽的是PG没有内置的profiling功能, 才导致了因子校对比较麻烦. 槽点:   
+- 1、explain 不能打印出所有可能的执行计划, 以及对应的代价、估算的行数等.   
+- 2、没有打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等   
+- 3、explain analyze后, 对于有多个执行路径时, 如果真实情况和代价估算有出入, 导致选择了错误的执行计划时.  不能给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确.   
+  
+我希望PG能做到这些, 对于优化者体验会好很多.     
+  
+在explain时有开关来控制, 增加以下输出:    
+- 1、打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等   
+- 2、打印不同可选路径(例如每一种join)的: 代价, 选择性, 和实际耗时    
+- 3、分析代价和实际耗时的匹配度, 给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确. 给出建议: 加大 `default_statistics_target` ? 修改xxx代价因子到多少?    
+  
+一些代价计算相关的代码:    
+  
+https://doxygen.postgresql.org/structJoinCostWorkspace.html  
+```  
+Data Fields  
+Cost 	startup_cost  
+Cost 	total_cost  
+Cost 	run_cost  
+Cost 	inner_run_cost  
+Cost 	inner_rescan_run_cost  
+Cardinality 	outer_rows  
+Cardinality 	inner_rows  
+Cardinality 	outer_skip_rows  
+Cardinality 	inner_skip_rows  
+int 	numbuckets  
+int 	numbatches  
+Cardinality 	inner_rows_total  
+```  
+  
+https://doxygen.postgresql.org/costsize_8c_source.html  
+```  
+ /*  
+  * initial_cost_hashjoin  
+  *    Preliminary estimate of the cost of a hashjoin path.  
+  *  
+  * This must quickly produce lower-bound estimates of the path's startup and  
+  * total costs.  If we are unable to eliminate the proposed path from  
+  * consideration using the lower bounds, final_cost_hashjoin will be called  
+  * to obtain the final estimates.  
+  *  
+  * The exact division of labor between this function and final_cost_hashjoin  
+  * is private to them, and represents a tradeoff between speed of the initial  
+  * estimate and getting a tight lower bound.  We choose to not examine the  
+  * join quals here (other than by counting the number of hash clauses),  
+  * so we can't do much with CPU costs.  We do assume that  
+  * ExecChooseHashTableSize is cheap enough to use here.  
+  *  
+  * 'workspace' is to be filled with startup_cost, total_cost, and perhaps  
+  *      other data to be used by final_cost_hashjoin  
+  * 'jointype' is the type of join to be performed  
+  * 'hashclauses' is the list of joinclauses to be used as hash clauses  
+  * 'outer_path' is the outer input to the join  
+  * 'inner_path' is the inner input to the join  
+  * 'extra' contains miscellaneous information about the join  
+  * 'parallel_hash' indicates that inner_path is partial and that a shared  
+  *      hash table will be built in parallel  
+  */  
+ void  
+ initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,  
+                       JoinType jointype,  
+                       List *hashclauses,  
+                       Path *outer_path, Path *inner_path,  
+                       JoinPathExtraData *extra,  
+                       bool parallel_hash)  
+ {  
+     Cost        startup_cost = 0;  
+     Cost        run_cost = 0;  
+     double      outer_path_rows = outer_path->rows;  
+     double      inner_path_rows = inner_path->rows;  
+     double      inner_path_rows_total = inner_path_rows;  
+     int         num_hashclauses = list_length(hashclauses);  
+     int         numbuckets;  
+     int         numbatches;  
+     int         num_skew_mcvs;  
+     size_t      space_allowed;  /* unused */  
+    
+     /* cost of source data */  
+     startup_cost += outer_path->startup_cost;  
+     run_cost += outer_path->total_cost - outer_path->startup_cost;  
+     startup_cost += inner_path->total_cost;  
+    
+     /*  
+      * Cost of computing hash function: must do it once per input tuple. We  
+      * charge one cpu_operator_cost for each column's hash function.  Also,  
+      * tack on one cpu_tuple_cost per inner row, to model the costs of  
+      * inserting the row into the hashtable.  
+      *  
+      * XXX when a hashclause is more complex than a single operator, we really  
+      * should charge the extra eval costs of the left or right side, as  
+      * appropriate, here.  This seems more work than it's worth at the moment.  
+      */  
+     startup_cost += (cpu_operator_cost * num_hashclauses + cpu_tuple_cost)  
+         * inner_path_rows;  
+     run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;  
+    
+     /*  
+      * If this is a parallel hash build, then the value we have for  
+      * inner_rows_total currently refers only to the rows returned by each  
+      * participant.  For shared hash table size estimation, we need the total  
+      * number, so we need to undo the division.  
+      */  
+     if (parallel_hash)  
+         inner_path_rows_total *= get_parallel_divisor(inner_path);  
+    
+     /*  
+      * Get hash table size that executor would use for inner relation.  
+      *  
+      * XXX for the moment, always assume that skew optimization will be  
+      * performed.  As long as SKEW_HASH_MEM_PERCENT is small, it's not worth  
+      * trying to determine that for sure.  
+      *  
+      * XXX at some point it might be interesting to try to account for skew  
+      * optimization in the cost estimate, but for now, we don't.  
+      */  
+     ExecChooseHashTableSize(inner_path_rows_total,  
+                             inner_path->pathtarget->width,  
+                             true,   /* useskew */  
+                             parallel_hash,  /* try_combined_hash_mem */  
+                             outer_path->parallel_workers,  
+                             &space_allowed,  
+                             &numbuckets,  
+                             &numbatches,  
+                             &num_skew_mcvs);  
+    
+     /*  
+      * If inner relation is too big then we will need to "batch" the join,  
+      * which implies writing and reading most of the tuples to disk an extra  
+      * time.  Charge seq_page_cost per page, since the I/O should be nice and  
+      * sequential.  Writing the inner rel counts as startup cost, all the rest  
+      * as run cost.  
+      */  
+     if (numbatches > 1)  
+     {  
+         double      outerpages = page_size(outer_path_rows,  
+                                            outer_path->pathtarget->width);  
+         double      innerpages = page_size(inner_path_rows,  
+                                            inner_path->pathtarget->width);  
+    
+         startup_cost += seq_page_cost * innerpages;  
+         run_cost += seq_page_cost * (innerpages + 2 * outerpages);  
+     }  
+    
+     /* CPU costs left for later */  
+    
+     /* Public result fields */  
+     workspace->startup_cost = startup_cost;  
+     workspace->total_cost = startup_cost + run_cost;  
+     /* Save private data for final_cost_hashjoin */  
+     workspace->run_cost = run_cost;  
+     workspace->numbuckets = numbuckets;  
+     workspace->numbatches = numbatches;  
+     workspace->inner_rows_total = inner_path_rows_total;  
+ }  
+```  
+  
+  
+#### [期望 PostgreSQL|开源PolarDB 增加什么功能?](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216")
+  
+  
+#### [PolarDB 开源数据库](https://openpolardb.com/home "57258f76c37864c6e6d23383d05714ea")
+  
+  
+#### [PolarDB 学习图谱](https://www.aliyun.com/database/openpolardb/activity "8642f60e04ed0c814bf9cb9677976bd4")
+  
+  
+#### [购买PolarDB云服务折扣活动进行中, 55元起](https://www.aliyun.com/activity/new/polardb-yunparter?userCode=bsb3t4al "e0495c413bedacabb75ff1e880be465a")
+  
+  
+#### [PostgreSQL 解决方案集合](../201706/20170601_02.md "40cff096e9ed7122c512b35d8561d9c8")
+  
+  
+#### [德哥 / digoal's Github - 公益是一辈子的事.](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")
+  
+  
+#### [About 德哥](https://github.com/digoal/blog/blob/master/me/readme.md "a37735981e7704886ffd590565582dd0")
+  
+  
+![digoal's wechat](../pic/digoal_weixin.jpg "f7ad92eeba24523fd47a6e1a0e691b59")
+  
diff --git a/202405/readme.md b/202405/readme.md
@@ -2,6 +2,7 @@
   
 ### 文章列表  
 ----  
+##### 20240511_01.md   [《DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱》](20240511_01.md)  
 ##### 20240510_05.md   [《DB吐槽大会,第95期 - PG wire protocol 扩展性可以更开放》](20240510_05.md)  
 ##### 20240510_04.md   [《DB吐槽大会,第94期 - PG 数据湖(datalake)特性体验较差》](20240510_04.md)  
 ##### 20240510_03.md   [《DB吐槽大会,第93期 - PG unlogged table转logged会写大量REDO/WAL》](20240510_03.md)  
diff --git a/README.md b/README.md
@@ -98,6 +98,7 @@ digoal's|PostgreSQL|文章|归类
   
 ### 所有文档如下  
 ----  
+##### 202405/20240511_01.md   [《DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱》](202405/20240511_01.md)  
 ##### 202405/20240510_05.md   [《DB吐槽大会,第95期 - PG wire protocol 扩展性可以更开放》](202405/20240510_05.md)  
 ##### 202405/20240510_04.md   [《DB吐槽大会,第94期 - PG 数据湖(datalake)特性体验较差》](202405/20240510_04.md)  
 ##### 202405/20240510_03.md   [《DB吐槽大会,第93期 - PG unlogged table转logged会写大量REDO/WAL》](202405/20240510_03.md)