Skip to content

Commit 1eeec5d

Browse files
committed
new doc
1 parent 849d09d commit 1eeec5d

File tree

3 files changed

+242
-0
lines changed

3 files changed

+242
-0
lines changed

202405/20240511_01.md

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
## DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱
2+
3+
### 作者
4+
digoal
5+
6+
### 日期
7+
2024-05-10
8+
9+
### 标签
10+
PostgreSQL , PolarDB , DuckDB , explain , node , 路径 , 可能性 , 成本因子 , 选择性 , 统计信息 , 成本公式 , 成本常数
11+
----
12+
13+
## 背景
14+
[视频回放]()
15+
16+
CBO(基于代价的优化)是目前数据库优化器用来选择最优执行路径的常用方法.
17+
18+
方法是有了, 但是要得到正确(最优)的执行路径, 取决于代价计算的准确性.
19+
20+
例如一条SQL既能使用索引扫描, 又能使用位图扫描, 还能使用全表扫描. 这三种扫描方法的代价是怎么估算的? 公式.
21+
- 公式设计影响了代价的计算结果,
22+
- 公式里面用到的统计信息准不准(决定了选择性),
23+
- 公式里面的系数(成本因子, 如离散扫描代价, 顺序扫描代价等),
24+
```
25+
# - Planner Cost Constants -
26+
#seq_page_cost = 1.0 # measured on an arbitrary scale
27+
#random_page_cost = 4.0 # same scale as above
28+
#cpu_tuple_cost = 0.01 # same scale as above
29+
#cpu_index_tuple_cost = 0.005 # same scale as above
30+
#cpu_operator_cost = 0.0025 # same scale as above
31+
#parallel_setup_cost = 1000.0 # same scale as above
32+
#parallel_tuple_cost = 0.1 # same scale as above
33+
#min_parallel_table_scan_size = 8MB
34+
#min_parallel_index_scan_size = 512kB
35+
#effective_cache_size = 4GB
36+
#jit_above_cost = 100000 # perform JIT compilation if available
37+
# and query more expensive than this;
38+
# -1 disables
39+
#jit_inline_above_cost = 500000 # inline small functions if query is
40+
# more expensive than this; -1 disables
41+
#jit_optimize_above_cost = 500000 # use expensive JIT optimizations if
42+
# query is more expensive than this;
43+
# -1 disables
44+
```
45+
- 公式里面的常数(如`hash join`的死成本等).
46+
47+
每个操作都会涉及到代价的计算, 例如扫描有代价, 计算有代价, 索引有选择代价, 关联有代价等等. 每个操作的代价计算都有不同的公式. 公式可以简单的表达为 ``` cost = startup cost + func(成本因子 * 统计信息->选择性, 常数) ```
48+
49+
代价计算公式是代码写死的, 记录选择性计算公式是写死的, 成本常数是代码写死的. 如果要影响执行计划的选择, 用户能干什么?
50+
- 1 让统计信息更准确, 通常能做的也比较少, 可以修改`default_statistics_target`增加柱状图bucket个数.
51+
- 2 让代价因子更准确(严格来说让这些因子更适合你的硬件环境, 例如机械盘的random_page_cost比seq_page_cost大很多, 而ssd的random_page_cost比seq_page_cost差不多)
52+
- 3 直接给HINT, 例如使用pg_hint_plan插件
53+
- 4 修改GUC开关, 禁止某些路径, 例如`enable_hashjoin=off enable_indexscan=off ... `
54+
55+
本文主要讨论2, 之前写过一些文章来教大家校对因子.
56+
- [《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》](../201311/20131126_03.md)
57+
- [《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》](../201404/20140423_01.md)
58+
- [《DB吐槽大会,第12期 - 没有自动成本校准器》](../202108/20210830_03.md)
59+
60+
但是我这一期想吐槽的是PG没有内置的profiling功能, 才导致了因子校对比较麻烦. 槽点:
61+
- 1、explain 不能打印出所有可能的执行计划, 以及对应的代价、估算的行数等.
62+
- 2、没有打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等
63+
- 3、explain analyze后, 对于有多个执行路径时, 如果真实情况和代价估算有出入, 导致选择了错误的执行计划时. 不能给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确.
64+
65+
我希望PG能做到这些, 对于优化者体验会好很多.
66+
67+
在explain时有开关来控制, 增加以下输出:
68+
- 1、打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等
69+
- 2、打印不同可选路径(例如每一种join)的: 代价, 选择性, 和实际耗时
70+
- 3、分析代价和实际耗时的匹配度, 给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确. 给出建议: 加大 `default_statistics_target` ? 修改xxx代价因子到多少?
71+
72+
一些代价计算相关的代码:
73+
74+
https://doxygen.postgresql.org/structJoinCostWorkspace.html
75+
```
76+
Data Fields
77+
Cost startup_cost
78+
Cost total_cost
79+
Cost run_cost
80+
Cost inner_run_cost
81+
Cost inner_rescan_run_cost
82+
Cardinality outer_rows
83+
Cardinality inner_rows
84+
Cardinality outer_skip_rows
85+
Cardinality inner_skip_rows
86+
int numbuckets
87+
int numbatches
88+
Cardinality inner_rows_total
89+
```
90+
91+
https://doxygen.postgresql.org/costsize_8c_source.html
92+
```
93+
/*
94+
* initial_cost_hashjoin
95+
* Preliminary estimate of the cost of a hashjoin path.
96+
*
97+
* This must quickly produce lower-bound estimates of the path's startup and
98+
* total costs. If we are unable to eliminate the proposed path from
99+
* consideration using the lower bounds, final_cost_hashjoin will be called
100+
* to obtain the final estimates.
101+
*
102+
* The exact division of labor between this function and final_cost_hashjoin
103+
* is private to them, and represents a tradeoff between speed of the initial
104+
* estimate and getting a tight lower bound. We choose to not examine the
105+
* join quals here (other than by counting the number of hash clauses),
106+
* so we can't do much with CPU costs. We do assume that
107+
* ExecChooseHashTableSize is cheap enough to use here.
108+
*
109+
* 'workspace' is to be filled with startup_cost, total_cost, and perhaps
110+
* other data to be used by final_cost_hashjoin
111+
* 'jointype' is the type of join to be performed
112+
* 'hashclauses' is the list of joinclauses to be used as hash clauses
113+
* 'outer_path' is the outer input to the join
114+
* 'inner_path' is the inner input to the join
115+
* 'extra' contains miscellaneous information about the join
116+
* 'parallel_hash' indicates that inner_path is partial and that a shared
117+
* hash table will be built in parallel
118+
*/
119+
void
120+
initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
121+
JoinType jointype,
122+
List *hashclauses,
123+
Path *outer_path, Path *inner_path,
124+
JoinPathExtraData *extra,
125+
bool parallel_hash)
126+
{
127+
Cost startup_cost = 0;
128+
Cost run_cost = 0;
129+
double outer_path_rows = outer_path->rows;
130+
double inner_path_rows = inner_path->rows;
131+
double inner_path_rows_total = inner_path_rows;
132+
int num_hashclauses = list_length(hashclauses);
133+
int numbuckets;
134+
int numbatches;
135+
int num_skew_mcvs;
136+
size_t space_allowed; /* unused */
137+
138+
/* cost of source data */
139+
startup_cost += outer_path->startup_cost;
140+
run_cost += outer_path->total_cost - outer_path->startup_cost;
141+
startup_cost += inner_path->total_cost;
142+
143+
/*
144+
* Cost of computing hash function: must do it once per input tuple. We
145+
* charge one cpu_operator_cost for each column's hash function. Also,
146+
* tack on one cpu_tuple_cost per inner row, to model the costs of
147+
* inserting the row into the hashtable.
148+
*
149+
* XXX when a hashclause is more complex than a single operator, we really
150+
* should charge the extra eval costs of the left or right side, as
151+
* appropriate, here. This seems more work than it's worth at the moment.
152+
*/
153+
startup_cost += (cpu_operator_cost * num_hashclauses + cpu_tuple_cost)
154+
* inner_path_rows;
155+
run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;
156+
157+
/*
158+
* If this is a parallel hash build, then the value we have for
159+
* inner_rows_total currently refers only to the rows returned by each
160+
* participant. For shared hash table size estimation, we need the total
161+
* number, so we need to undo the division.
162+
*/
163+
if (parallel_hash)
164+
inner_path_rows_total *= get_parallel_divisor(inner_path);
165+
166+
/*
167+
* Get hash table size that executor would use for inner relation.
168+
*
169+
* XXX for the moment, always assume that skew optimization will be
170+
* performed. As long as SKEW_HASH_MEM_PERCENT is small, it's not worth
171+
* trying to determine that for sure.
172+
*
173+
* XXX at some point it might be interesting to try to account for skew
174+
* optimization in the cost estimate, but for now, we don't.
175+
*/
176+
ExecChooseHashTableSize(inner_path_rows_total,
177+
inner_path->pathtarget->width,
178+
true, /* useskew */
179+
parallel_hash, /* try_combined_hash_mem */
180+
outer_path->parallel_workers,
181+
&space_allowed,
182+
&numbuckets,
183+
&numbatches,
184+
&num_skew_mcvs);
185+
186+
/*
187+
* If inner relation is too big then we will need to "batch" the join,
188+
* which implies writing and reading most of the tuples to disk an extra
189+
* time. Charge seq_page_cost per page, since the I/O should be nice and
190+
* sequential. Writing the inner rel counts as startup cost, all the rest
191+
* as run cost.
192+
*/
193+
if (numbatches > 1)
194+
{
195+
double outerpages = page_size(outer_path_rows,
196+
outer_path->pathtarget->width);
197+
double innerpages = page_size(inner_path_rows,
198+
inner_path->pathtarget->width);
199+
200+
startup_cost += seq_page_cost * innerpages;
201+
run_cost += seq_page_cost * (innerpages + 2 * outerpages);
202+
}
203+
204+
/* CPU costs left for later */
205+
206+
/* Public result fields */
207+
workspace->startup_cost = startup_cost;
208+
workspace->total_cost = startup_cost + run_cost;
209+
/* Save private data for final_cost_hashjoin */
210+
workspace->run_cost = run_cost;
211+
workspace->numbuckets = numbuckets;
212+
workspace->numbatches = numbatches;
213+
workspace->inner_rows_total = inner_path_rows_total;
214+
}
215+
```
216+
217+
218+
#### [期望 PostgreSQL|开源PolarDB 增加什么功能?](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216")
219+
220+
221+
#### [PolarDB 开源数据库](https://openpolardb.com/home "57258f76c37864c6e6d23383d05714ea")
222+
223+
224+
#### [PolarDB 学习图谱](https://www.aliyun.com/database/openpolardb/activity "8642f60e04ed0c814bf9cb9677976bd4")
225+
226+
227+
#### [购买PolarDB云服务折扣活动进行中, 55元起](https://www.aliyun.com/activity/new/polardb-yunparter?userCode=bsb3t4al "e0495c413bedacabb75ff1e880be465a")
228+
229+
230+
#### [PostgreSQL 解决方案集合](../201706/20170601_02.md "40cff096e9ed7122c512b35d8561d9c8")
231+
232+
233+
#### [德哥 / digoal's Github - 公益是一辈子的事.](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")
234+
235+
236+
#### [About 德哥](https://github.com/digoal/blog/blob/master/me/readme.md "a37735981e7704886ffd590565582dd0")
237+
238+
239+
![digoal's wechat](../pic/digoal_weixin.jpg "f7ad92eeba24523fd47a6e1a0e691b59")
240+

202405/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
### 文章列表
44
----
5+
##### 20240511_01.md [《DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱》](20240511_01.md)
56
##### 20240510_05.md [《DB吐槽大会,第95期 - PG wire protocol 扩展性可以更开放》](20240510_05.md)
67
##### 20240510_04.md [《DB吐槽大会,第94期 - PG 数据湖(datalake)特性体验较差》](20240510_04.md)
78
##### 20240510_03.md [《DB吐槽大会,第93期 - PG unlogged table转logged会写大量REDO/WAL》](20240510_03.md)

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ digoal's|PostgreSQL|文章|归类
9898

9999
### 所有文档如下
100100
----
101+
##### 202405/20240511_01.md [《DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱》](202405/20240511_01.md)
101102
##### 202405/20240510_05.md [《DB吐槽大会,第95期 - PG wire protocol 扩展性可以更开放》](202405/20240510_05.md)
102103
##### 202405/20240510_04.md [《DB吐槽大会,第94期 - PG 数据湖(datalake)特性体验较差》](202405/20240510_04.md)
103104
##### 202405/20240510_03.md [《DB吐槽大会,第93期 - PG unlogged table转logged会写大量REDO/WAL》](202405/20240510_03.md)

0 commit comments

Comments
 (0)