|
| 1 | +## DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱 |
| 2 | + |
| 3 | +### 作者 |
| 4 | +digoal |
| 5 | + |
| 6 | +### 日期 |
| 7 | +2024-05-10 |
| 8 | + |
| 9 | +### 标签 |
| 10 | +PostgreSQL , PolarDB , DuckDB , explain , node , 路径 , 可能性 , 成本因子 , 选择性 , 统计信息 , 成本公式 , 成本常数 |
| 11 | +---- |
| 12 | + |
| 13 | +## 背景 |
| 14 | +[视频回放]() |
| 15 | + |
| 16 | +CBO(基于代价的优化)是目前数据库优化器用来选择最优执行路径的常用方法. |
| 17 | + |
| 18 | +方法是有了, 但是要得到正确(最优)的执行路径, 取决于代价计算的准确性. |
| 19 | + |
| 20 | +例如一条SQL既能使用索引扫描, 又能使用位图扫描, 还能使用全表扫描. 这三种扫描方法的代价是怎么估算的? 公式. |
| 21 | +- 公式设计影响了代价的计算结果, |
| 22 | + - 公式里面用到的统计信息准不准(决定了选择性), |
| 23 | + - 公式里面的系数(成本因子, 如离散扫描代价, 顺序扫描代价等), |
| 24 | + ``` |
| 25 | + # - Planner Cost Constants - |
| 26 | + #seq_page_cost = 1.0 # measured on an arbitrary scale |
| 27 | + #random_page_cost = 4.0 # same scale as above |
| 28 | + #cpu_tuple_cost = 0.01 # same scale as above |
| 29 | + #cpu_index_tuple_cost = 0.005 # same scale as above |
| 30 | + #cpu_operator_cost = 0.0025 # same scale as above |
| 31 | + #parallel_setup_cost = 1000.0 # same scale as above |
| 32 | + #parallel_tuple_cost = 0.1 # same scale as above |
| 33 | + #min_parallel_table_scan_size = 8MB |
| 34 | + #min_parallel_index_scan_size = 512kB |
| 35 | + #effective_cache_size = 4GB |
| 36 | + #jit_above_cost = 100000 # perform JIT compilation if available |
| 37 | + # and query more expensive than this; |
| 38 | + # -1 disables |
| 39 | + #jit_inline_above_cost = 500000 # inline small functions if query is |
| 40 | + # more expensive than this; -1 disables |
| 41 | + #jit_optimize_above_cost = 500000 # use expensive JIT optimizations if |
| 42 | + # query is more expensive than this; |
| 43 | + # -1 disables |
| 44 | + ``` |
| 45 | + - 公式里面的常数(如`hash join`的死成本等). |
| 46 | + |
| 47 | +每个操作都会涉及到代价的计算, 例如扫描有代价, 计算有代价, 索引有选择代价, 关联有代价等等. 每个操作的代价计算都有不同的公式. 公式可以简单的表达为 ``` cost = startup cost + func(成本因子 * 统计信息->选择性, 常数) ``` |
| 48 | + |
| 49 | +代价计算公式是代码写死的, 记录选择性计算公式是写死的, 成本常数是代码写死的. 如果要影响执行计划的选择, 用户能干什么? |
| 50 | +- 1 让统计信息更准确, 通常能做的也比较少, 可以修改`default_statistics_target`增加柱状图bucket个数. |
| 51 | +- 2 让代价因子更准确(严格来说让这些因子更适合你的硬件环境, 例如机械盘的random_page_cost比seq_page_cost大很多, 而ssd的random_page_cost比seq_page_cost差不多) |
| 52 | +- 3 直接给HINT, 例如使用pg_hint_plan插件 |
| 53 | +- 4 修改GUC开关, 禁止某些路径, 例如`enable_hashjoin=off enable_indexscan=off ... ` |
| 54 | + |
| 55 | +本文主要讨论2, 之前写过一些文章来教大家校对因子. |
| 56 | +- [《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》](../201311/20131126_03.md) |
| 57 | +- [《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》](../201404/20140423_01.md) |
| 58 | +- [《DB吐槽大会,第12期 - 没有自动成本校准器》](../202108/20210830_03.md) |
| 59 | + |
| 60 | +但是我这一期想吐槽的是PG没有内置的profiling功能, 才导致了因子校对比较麻烦. 槽点: |
| 61 | +- 1、explain 不能打印出所有可能的执行计划, 以及对应的代价、估算的行数等. |
| 62 | +- 2、没有打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等 |
| 63 | +- 3、explain analyze后, 对于有多个执行路径时, 如果真实情况和代价估算有出入, 导致选择了错误的执行计划时. 不能给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确. |
| 64 | + |
| 65 | +我希望PG能做到这些, 对于优化者体验会好很多. |
| 66 | + |
| 67 | +在explain时有开关来控制, 增加以下输出: |
| 68 | +- 1、打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等 |
| 69 | +- 2、打印不同可选路径(例如每一种join)的: 代价, 选择性, 和实际耗时 |
| 70 | +- 3、分析代价和实际耗时的匹配度, 给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确. 给出建议: 加大 `default_statistics_target` ? 修改xxx代价因子到多少? |
| 71 | + |
| 72 | +一些代价计算相关的代码: |
| 73 | + |
| 74 | +https://doxygen.postgresql.org/structJoinCostWorkspace.html |
| 75 | +``` |
| 76 | +Data Fields |
| 77 | +Cost startup_cost |
| 78 | +Cost total_cost |
| 79 | +Cost run_cost |
| 80 | +Cost inner_run_cost |
| 81 | +Cost inner_rescan_run_cost |
| 82 | +Cardinality outer_rows |
| 83 | +Cardinality inner_rows |
| 84 | +Cardinality outer_skip_rows |
| 85 | +Cardinality inner_skip_rows |
| 86 | +int numbuckets |
| 87 | +int numbatches |
| 88 | +Cardinality inner_rows_total |
| 89 | +``` |
| 90 | + |
| 91 | +https://doxygen.postgresql.org/costsize_8c_source.html |
| 92 | +``` |
| 93 | + /* |
| 94 | + * initial_cost_hashjoin |
| 95 | + * Preliminary estimate of the cost of a hashjoin path. |
| 96 | + * |
| 97 | + * This must quickly produce lower-bound estimates of the path's startup and |
| 98 | + * total costs. If we are unable to eliminate the proposed path from |
| 99 | + * consideration using the lower bounds, final_cost_hashjoin will be called |
| 100 | + * to obtain the final estimates. |
| 101 | + * |
| 102 | + * The exact division of labor between this function and final_cost_hashjoin |
| 103 | + * is private to them, and represents a tradeoff between speed of the initial |
| 104 | + * estimate and getting a tight lower bound. We choose to not examine the |
| 105 | + * join quals here (other than by counting the number of hash clauses), |
| 106 | + * so we can't do much with CPU costs. We do assume that |
| 107 | + * ExecChooseHashTableSize is cheap enough to use here. |
| 108 | + * |
| 109 | + * 'workspace' is to be filled with startup_cost, total_cost, and perhaps |
| 110 | + * other data to be used by final_cost_hashjoin |
| 111 | + * 'jointype' is the type of join to be performed |
| 112 | + * 'hashclauses' is the list of joinclauses to be used as hash clauses |
| 113 | + * 'outer_path' is the outer input to the join |
| 114 | + * 'inner_path' is the inner input to the join |
| 115 | + * 'extra' contains miscellaneous information about the join |
| 116 | + * 'parallel_hash' indicates that inner_path is partial and that a shared |
| 117 | + * hash table will be built in parallel |
| 118 | + */ |
| 119 | + void |
| 120 | + initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace, |
| 121 | + JoinType jointype, |
| 122 | + List *hashclauses, |
| 123 | + Path *outer_path, Path *inner_path, |
| 124 | + JoinPathExtraData *extra, |
| 125 | + bool parallel_hash) |
| 126 | + { |
| 127 | + Cost startup_cost = 0; |
| 128 | + Cost run_cost = 0; |
| 129 | + double outer_path_rows = outer_path->rows; |
| 130 | + double inner_path_rows = inner_path->rows; |
| 131 | + double inner_path_rows_total = inner_path_rows; |
| 132 | + int num_hashclauses = list_length(hashclauses); |
| 133 | + int numbuckets; |
| 134 | + int numbatches; |
| 135 | + int num_skew_mcvs; |
| 136 | + size_t space_allowed; /* unused */ |
| 137 | + |
| 138 | + /* cost of source data */ |
| 139 | + startup_cost += outer_path->startup_cost; |
| 140 | + run_cost += outer_path->total_cost - outer_path->startup_cost; |
| 141 | + startup_cost += inner_path->total_cost; |
| 142 | + |
| 143 | + /* |
| 144 | + * Cost of computing hash function: must do it once per input tuple. We |
| 145 | + * charge one cpu_operator_cost for each column's hash function. Also, |
| 146 | + * tack on one cpu_tuple_cost per inner row, to model the costs of |
| 147 | + * inserting the row into the hashtable. |
| 148 | + * |
| 149 | + * XXX when a hashclause is more complex than a single operator, we really |
| 150 | + * should charge the extra eval costs of the left or right side, as |
| 151 | + * appropriate, here. This seems more work than it's worth at the moment. |
| 152 | + */ |
| 153 | + startup_cost += (cpu_operator_cost * num_hashclauses + cpu_tuple_cost) |
| 154 | + * inner_path_rows; |
| 155 | + run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows; |
| 156 | + |
| 157 | + /* |
| 158 | + * If this is a parallel hash build, then the value we have for |
| 159 | + * inner_rows_total currently refers only to the rows returned by each |
| 160 | + * participant. For shared hash table size estimation, we need the total |
| 161 | + * number, so we need to undo the division. |
| 162 | + */ |
| 163 | + if (parallel_hash) |
| 164 | + inner_path_rows_total *= get_parallel_divisor(inner_path); |
| 165 | + |
| 166 | + /* |
| 167 | + * Get hash table size that executor would use for inner relation. |
| 168 | + * |
| 169 | + * XXX for the moment, always assume that skew optimization will be |
| 170 | + * performed. As long as SKEW_HASH_MEM_PERCENT is small, it's not worth |
| 171 | + * trying to determine that for sure. |
| 172 | + * |
| 173 | + * XXX at some point it might be interesting to try to account for skew |
| 174 | + * optimization in the cost estimate, but for now, we don't. |
| 175 | + */ |
| 176 | + ExecChooseHashTableSize(inner_path_rows_total, |
| 177 | + inner_path->pathtarget->width, |
| 178 | + true, /* useskew */ |
| 179 | + parallel_hash, /* try_combined_hash_mem */ |
| 180 | + outer_path->parallel_workers, |
| 181 | + &space_allowed, |
| 182 | + &numbuckets, |
| 183 | + &numbatches, |
| 184 | + &num_skew_mcvs); |
| 185 | + |
| 186 | + /* |
| 187 | + * If inner relation is too big then we will need to "batch" the join, |
| 188 | + * which implies writing and reading most of the tuples to disk an extra |
| 189 | + * time. Charge seq_page_cost per page, since the I/O should be nice and |
| 190 | + * sequential. Writing the inner rel counts as startup cost, all the rest |
| 191 | + * as run cost. |
| 192 | + */ |
| 193 | + if (numbatches > 1) |
| 194 | + { |
| 195 | + double outerpages = page_size(outer_path_rows, |
| 196 | + outer_path->pathtarget->width); |
| 197 | + double innerpages = page_size(inner_path_rows, |
| 198 | + inner_path->pathtarget->width); |
| 199 | + |
| 200 | + startup_cost += seq_page_cost * innerpages; |
| 201 | + run_cost += seq_page_cost * (innerpages + 2 * outerpages); |
| 202 | + } |
| 203 | + |
| 204 | + /* CPU costs left for later */ |
| 205 | + |
| 206 | + /* Public result fields */ |
| 207 | + workspace->startup_cost = startup_cost; |
| 208 | + workspace->total_cost = startup_cost + run_cost; |
| 209 | + /* Save private data for final_cost_hashjoin */ |
| 210 | + workspace->run_cost = run_cost; |
| 211 | + workspace->numbuckets = numbuckets; |
| 212 | + workspace->numbatches = numbatches; |
| 213 | + workspace->inner_rows_total = inner_path_rows_total; |
| 214 | + } |
| 215 | +``` |
| 216 | + |
| 217 | + |
| 218 | +#### [期望 PostgreSQL|开源PolarDB 增加什么功能?](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216") |
| 219 | + |
| 220 | + |
| 221 | +#### [PolarDB 开源数据库](https://openpolardb.com/home "57258f76c37864c6e6d23383d05714ea") |
| 222 | + |
| 223 | + |
| 224 | +#### [PolarDB 学习图谱](https://www.aliyun.com/database/openpolardb/activity "8642f60e04ed0c814bf9cb9677976bd4") |
| 225 | + |
| 226 | + |
| 227 | +#### [购买PolarDB云服务折扣活动进行中, 55元起](https://www.aliyun.com/activity/new/polardb-yunparter?userCode=bsb3t4al "e0495c413bedacabb75ff1e880be465a") |
| 228 | + |
| 229 | + |
| 230 | +#### [PostgreSQL 解决方案集合](../201706/20170601_02.md "40cff096e9ed7122c512b35d8561d9c8") |
| 231 | + |
| 232 | + |
| 233 | +#### [德哥 / digoal's Github - 公益是一辈子的事.](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae") |
| 234 | + |
| 235 | + |
| 236 | +#### [About 德哥](https://github.com/digoal/blog/blob/master/me/readme.md "a37735981e7704886ffd590565582dd0") |
| 237 | + |
| 238 | + |
| 239 | + |
| 240 | + |
0 commit comments