Skip to content

Commit 6859e2f

Browse files
authored
Update 20210922_03.md
1 parent 4ac4755 commit 6859e2f

File tree

1 file changed

+105
-13
lines changed

1 file changed

+105
-13
lines changed

202109/20210922_03.md

Lines changed: 105 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,26 +20,27 @@ PostgreSQL , awr , 分析 , baseline , benchmark
2020
- 为什么要定义基准? 有了基准才有度量标准, 不会出现为了优化而优化的情况, 走火入魔.
2121
- [《产品经理:依存度、规模化、可度量、周期性》](../202012/20201225_02.md)
2222
- 宏观分析较弱
23-
- `pg_stat_statements` 分析出来TOP SQL, 它合理不合理都需要深入分析.
23+
- `pg_stat_statements` 分析出来TOP SQL, 它合理不合理都需要深入分析. 例如总耗时最多的sql, 它是否已经优化到极限了? 其实不看一下执行计划也不好下判断. 是不是可以加一个维度, SQL还有多大的优化提升空间?
2424
- 缺乏`p99, p90`这类指标
2525
- 微观分析较弱
26-
- 例如连接数激增, 是业务行为还是RT抖动、锁等待导致?
26+
- 例如连接数激增, 是业务行为还是RT抖动、锁等待导致?
2727
- SQL性能问题如何分析? 怎么优化?
28-
- 缺乏如何优化的报告
29-
- 缺乏可提升多少的预测
28+
- 缺乏报告来指出每一条SQL应该如何优化.
29+
- 缺乏可提升多少性能的预测(例如展示耗时、IO、CPU开销、cost等维度的提升预期)
3030
- 为什么需要可预期? 只有可预期的目标才不是盲盒, 才不会有惊讶, 才能用于决策到底要不要实施优化. 预期目标需要有数据支撑、逻辑支撑.
3131
- PG 性能问题发现和分析能力较弱 2
32-
- 没有内置的 AWR, 较难发现宏观问题 (等待事件)
33-
- 没有内置的 性能洞察, 较难发现指定时间段的问题
32+
- 没有内置的 AWR, 较难发现宏观问题 (例如等待事件)
33+
- 没有内置的 性能洞察, 较难发现指定时间段的问题, 特别是业务曾经出现抖动的时间段.
3434
- 缺少 trace功能, 类似Oracle (10046, 10053) , 较难诊断SQL问题. trace dev para, auto explain, rsq, lock, query, iotiming, ...
35-
- 很多诊断需要编译时定义宏 `TRACE_SORT、LOCK_DEBUG、BTREE_BUILD_STATS、WAL_DEBUG`
35+
- 很多诊断需要在编译时定义宏(`pg_config_manual.h`) `TRACE_SORT、LOCK_DEBUG、BTREE_BUILD_STATS、WAL_DEBUG`
3636
- https://www.postgresql.org/docs/14/runtime-config-developer.html
37-
- 内置的probe使用门槛非常高, 需要开启debug 、需要使用dtrace或者stap 设置探针进行分析
37+
- 内置的probe使用门槛非常高, 需要开启debug 、需要使用dtrace或者stap(systemtap) 设置探针进行分析
3838
- 只有重量锁等待日志打印, 缺少LW锁等待统计, 并且没有视图分析SQL级别的等待事件, 等待事件需要到log中查询,
3939
- 开启 `log_lock_waits` , 并且只 记录超过 `deadlock_timeout` 的SQL.
4040
- 对SQL的单点问题分析较弱, SQL在过去发生的问题很难分析. (是执行计划的问题、锁等待的问题、还是资源竞争的问题?)
4141
- 只有锁等待可能被记录下来, 执行计划不会被记录, 每个执行node花费的时间、扫描的blocks、返回的记录数, OP耗费的时间等都需要记录分析 :
42-
- 执行计划的记录需要开启auto explain , 设置执行超时阈值, 并且需要等待问题再次发生, 而且不能针对单个sql的超时时间进行阈值设置, 只能设置全局阈值. 每条SQL的执行时长需求不一样, 单个阈值无法满足需求. (例如有些SQL就是分析型的, 本身就慢. )
42+
- 执行计划的记录需要开启auto explain , 设置执行超时阈值, 并且需要等待问题再次发生, 而且不能针对单个sql的超时时间进行阈值设置, 只能设置全局阈值. 每条SQL的执行时长需求不一样, 单个阈值无法满足需求. (例如有些SQL就是分析型的, 本身就慢. )
43+
- 对优化/诊断感兴趣可以阅读本文相关章节: [《2019-PostgreSQL 2天体系化培训 - 适合DBA》](../201901/20190105_01.md)
4344

4445
2、问题点背后涉及的技术原理
4546
- 1 基准是什么? 如何定义?
@@ -88,7 +89,8 @@ PostgreSQL , awr , 分析 , baseline , benchmark
8889
- [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 5 - read|write relation》](../201310/20131016_05.md)
8990
- pg_profile 基准采集和对比插件, 但是还不够完善, 图表也不够美观(很难发现问题所在).
9091
- https://github.com/zubkov-andrei/pg_profile
91-
92+
- 很多企业被逼自己开发PG的监控、诊断、优化、审计等工具, 例如海信聚好看的dbdoctor已商业化(也有个人免费版), pawsql, d-smart等. 感兴趣的朋友可以找找.
93+
9294
6、业务上避免这个坑牺牲了什么, 会引入什么新的问题
9395
- 门槛非常高, 而且有些需求不能很好的解决:
9496
- 基准
@@ -99,9 +101,99 @@ PostgreSQL , awr , 分析 , baseline , benchmark
99101
- debug编译、宏、`auto_explain`、iotiming都会引入开销
100102

101103
7、数据库未来产品迭代如何修复这个坑
102-
- 希望全方位内置基准、性能洞察、分析、自动化推荐优化方法、优化效果预测等能力.
103-
104-
104+
- 希望全方位内置基准、性能洞察、分析、自动化推荐优化方法、优化效果预测等能力.
105+
106+
107+
systemtap参考:
108+
##### 201810/20181025_01.md [[转载] systemtap 跟踪分析 PostgreSQL》](../201810/20181025_01.md)
109+
##### 201505/20150509_01.md [《PostgreSQL 代码性能诊断之 - OProfile & Systemtap》](../201505/20150509_01.md)
110+
##### 201311/20131127_01.md [《stap trace blockdev's iops》](../201311/20131127_01.md)
111+
##### 201311/20131126_02.md [《USE blockdev --setra 0 and systemtap test real BLOCKDEV iops》](../201311/20131126_02.md)
112+
##### 201311/20131126_01.md [《设置进程亲和 - numactl 或 taskset - retrieve or set a process's CPU affinity (affect SYSTEMTAP TIME)》](../201311/20131126_01.md)
113+
##### 201311/20131121_02.md [《Systemtap examples, Identifying Contended User-Space Locks》](../201311/20131121_02.md)
114+
##### 201311/20131121_01.md [《Systemtap examples, Profiling - 6 Tracking System Call Volume Per Process》](../201311/20131121_01.md)
115+
##### 201311/20131120_03.md [《Systemtap examples, Profiling - 5 Tracking Most Frequently Used System Calls》](../201311/20131120_03.md)
116+
##### 201311/20131120_02.md [《Systemtap examples, Profiling - 4 Monitoring Polling Applications》](../201311/20131120_02.md)
117+
##### 201311/20131120_01.md [《Systemtap examples, Profiling - 3 Determining Time Spent in Kernel and User Space》](../201311/20131120_01.md)
118+
##### 201311/20131119_06.md [《Systemtap examples, Profiling - 2 Call Graph Tracing》](../201311/20131119_06.md)
119+
##### 201311/20131119_05.md [《Systemtap examples, Profiling - 1 Counting Function Calls Made》](../201311/20131119_05.md)
120+
##### 201311/20131119_04.md [《Systemtap examples, DISK IO - 7 Periodically Print I/O Block Time》](../201311/20131119_04.md)
121+
##### 201311/20131119_03.md [《Systemtap examples, DISK IO - 6 Monitoring Changes to File Attributes》](../201311/20131119_03.md)
122+
##### 201311/20131119_02.md [《Systemtap examples, DISK IO - 5 Monitoring Reads and Writes to a File》](../201311/20131119_02.md)
123+
##### 201311/20131119_01.md [《Systemtap examples, DISK IO - 4 I/O Monitoring (By Device)》](../201311/20131119_01.md)
124+
##### 201311/20131118_02.md [《Systemtap examples, DISK IO - 3 Track Cumulative IO》](../201311/20131118_02.md)
125+
##### 201311/20131118_01.md [《Systemtap examples, DISK IO - 2 Tracking I/O Time For Each File Read or Write》](../201311/20131118_01.md)
126+
##### 201311/20131115_01.md [《Systemtap examples, DISK IO - 1 Summarizing Disk Read/Write Traffic》](../201311/20131115_01.md)
127+
##### 201311/20131114_06.md [《Systemtap kernel.trace("\*") events source code》](../201311/20131114_06.md)
128+
##### 201311/20131114_05.md [《Systemtap examples, Network - 5 Monitoring Network Packets Drops in Kernel》](../201311/20131114_05.md)
129+
##### 201311/20131114_04.md [《Systemtap examples, Network - 4 Monitoring TCP Packets》](../201311/20131114_04.md)
130+
##### 201311/20131114_03.md [《Systemtap examples, Network - 3 Monitoring Incoming TCP Connections》](../201311/20131114_03.md)
131+
##### 201311/20131114_02.md [《Systemtap examples, Network - 2 Tracing Functions Called in Network Socket Code》](../201311/20131114_02.md)
132+
##### 201311/20131114_01.md [《Systemtap examples, Network - 1 Network Profiling》](../201311/20131114_01.md)
133+
##### 201311/20131112_01.md [《SystemTap Errors Introduce》](../201311/20131112_01.md)
134+
##### 201311/20131111_01.md [《SystemTap User-Space Stack Backtraces for x86 processors arch only》](../201311/20131111_01.md)
135+
##### 201311/20131107_01.md [《Systemtap Function thread_indent:string(delta:long)》](../201311/20131107_01.md)
136+
##### 201311/20131106_01.md [《SystemTap Flight Recorder Mode》](../201311/20131106_01.md)
137+
##### 201310/20131018_03.md [《PostgreSQL Dynamic Tracing using systemtap env prepare》](../201310/20131018_03.md)
138+
##### 201310/20131018_02.md [《Systemtap: PostgreSQL probe, USE @var("varname") or $varname get all local and global variables》](../201310/20131018_02.md)
139+
##### 201310/20131018_01.md [《Systemtap EXP: fix process probe global variables output BUG?(PostgreSQL checkpoint__done)》](../201310/20131018_01.md)
140+
##### 201310/20131017_04.md [《SystemTap Tapset: common used functions - 2》](../201310/20131017_04.md)
141+
##### 201310/20131017_03.md [《SystemTap Tapset: common used functions - 1》](../201310/20131017_03.md)
142+
##### 201310/20131017_02.md [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 7 - others(statement,xlog,sort)》](../201310/20131017_02.md)
143+
##### 201310/20131017_01.md [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 6 - lock》](../201310/20131017_01.md)
144+
##### 201310/20131016_05.md [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 5 - read|write relation》](../201310/20131016_05.md)
145+
##### 201310/20131016_04.md [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 4 - buffer》](../201310/20131016_04.md)
146+
##### 201310/20131016_03.md [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 3 - checkpoint》](../201310/20131016_03.md)
147+
##### 201310/20131016_02.md [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 2 - query》](../201310/20131016_02.md)
148+
##### 201310/20131016_01.md [《Systemtap EXP: PostgreSQL IN-BUILD mark Class 1 - transaction》](../201310/20131016_01.md)
149+
##### 201310/20131015_05.md [《Systemtap EXP: trace PostgreSQL netflow per session or per sql》](../201310/20131015_05.md)
150+
##### 201310/20131015_04.md [《Systemtap EXP: trace PostgreSQL instruction or block of instructions per sql or per session》](../201310/20131015_04.md)
151+
##### 201310/20131015_03.md [《Systemtap EXP: Trace PostgreSQL iostat per SQL statement 2》](../201310/20131015_03.md)
152+
##### 201310/20131015_02.md [《Systemtap EXP: Trace PostgreSQL iostat per SQL statement 1》](../201310/20131015_02.md)
153+
##### 201310/20131015_01.md [《Systemtap: Generating Instrumentation module(.ko) for Other Computers》](../201310/20131015_01.md)
154+
##### 201310/20131014_04.md [《Systemtap : stap PROCESSING 5 steps introduce》](../201310/20131014_04.md)
155+
##### 201310/20131014_03.md [《Systemtap BUG? : stap "-R no effect"》](../201310/20131014_03.md)
156+
##### 201310/20131014_02.md [《Systemtap Example : OUTPUT hist_linear for processes io size and io time "use @entry"》](../201310/20131014_02.md)
157+
##### 201310/20131014_01.md [《Systemtap(2.4) Example : array aggregate elements sorted by statistic operator (EXP. output TOPn IO processes)》](../201310/20131014_01.md)
158+
##### 201310/20131013_02.md [《PostgreSQL Systemtap example : Customize probe "SEE salted md5 value transfered on network"》](../201310/20131013_02.md)
159+
##### 201310/20131013_01.md [《Systemtap(2.4) fixed BUG(1.8) : delete from statistics(aggregates) type stored in array elements》](../201310/20131013_01.md)
160+
##### 201310/20131012_02.md [《Systemtap(1.8) BUG? : delete from statistics(aggregates) type stored in array elements》](../201310/20131012_02.md)
161+
##### 201310/20131012_01.md [《PostgreSQL Systemtap example : connection|close and session duration static》](../201310/20131012_01.md)
162+
##### 201310/20131011_01.md [《PostgreSQL Systemtap example : Customize probe "connect and disconnect"》](../201310/20131011_01.md)
163+
##### 201310/20131010_02.md [《PostgreSQL Systemtap example : autovacuum_naptime & databases in cluster》](../201310/20131010_02.md)
164+
##### 201310/20131010_01.md [《Systemtap Formatted output》](../201310/20131010_01.md)
165+
##### 201310/20131009_03.md [《Systemtap Statistics (aggregates) Data Type》](../201310/20131009_03.md)
166+
##### 201310/20131009_02.md [《Systemtap Associative array Data Type》](../201310/20131009_02.md)
167+
##### 201310/20131009_01.md [《Systemtap Statement types》](../201310/20131009_01.md)
168+
##### 201310/20131008_03.md [《Systemtap Preprocessor macros》](../201310/20131008_03.md)
169+
##### 201310/20131008_02.md [《Systemtap parse preprocessing stage - Conditional compilation》](../201310/20131008_02.md)
170+
##### 201310/20131008_01.md [《Systemtap Language elements - 1》](../201310/20131008_01.md)
171+
##### 201310/20131007_05.md [《Systemtap Special probe points (begin, end, error, never)》](../201310/20131007_05.md)
172+
##### 201310/20131007_04.md [《Systemtap Timer probes》](../201310/20131007_04.md)
173+
##### 201310/20131007_03.md [《Systemtap Syscall probes》](../201310/20131007_03.md)
174+
##### 201310/20131007_02.md [《Systemtap kernel Trace probes》](../201310/20131007_02.md)
175+
##### 201310/20131007_01.md [《Systemtap kernel Marker probes》](../201310/20131007_01.md)
176+
##### 201310/20131006_01.md [《Systemtap PROCFS probes》](../201310/20131006_01.md)
177+
##### 201309/20130930_03.md [《Systemtap Userspace probing - 4》](../201309/20130930_03.md)
178+
##### 201309/20130930_02.md [《Systemtap Userspace probing - 3》](../201309/20130930_02.md)
179+
##### 201309/20130930_01.md [《Systemtap Userspace probing - 2》](../201309/20130930_01.md)
180+
##### 201309/20130929_03.md [《Systemtap Userspace probing - 1》](../201309/20130929_03.md)
181+
##### 201309/20130929_02.md [《Systemtap DWARF-less probing (kprobe)》](../201309/20130929_02.md)
182+
##### 201309/20130929_01.md [《systemtap Built-in probe point types (DWARF-based kernel or module probes)》](../201309/20130929_01.md)
183+
##### 201309/20130913_01.md [《systemtap Auxiliary functions and Embedded C》](../201309/20130913_01.md)
184+
##### 201309/20130912_03.md [《systemtap local & global variables》](../201309/20130912_03.md)
185+
##### 201309/20130912_02.md [《systemtap probe aliases (Prologue-style = & Epilogue-style +=) and suffixes》](../201309/20130912_02.md)
186+
##### 201309/20130912_01.md [《systemtap probe point's "context variables" or "target variables"》](../201309/20130912_01.md)
187+
##### 201309/20130911_01.md [《systemtap probe point followed by ! or ? or "if (expr)"》](../201309/20130911_01.md)
188+
##### 201309/20130910_03.md [《find systemtap pre-built probe points & probe points reference manual》](../201309/20130910_03.md)
189+
##### 201309/20130910_02.md [《systemtap SAFETY AND SECURITY》](../201309/20130910_02.md)
190+
##### 201309/20130910_01.md [《systemtap optimized for variables》](../201309/20130910_01.md)
191+
##### 201309/20130903_05.md [《systemtap receive strings from address》](../201309/20130903_05.md)
192+
##### 201309/20130903_04.md [《use systemtap statistics vs pgbench progress output》](../201309/20130903_04.md)
193+
##### 201309/20130903_03.md [《Systemtap statistics type example》](../201309/20130903_03.md)
194+
##### 201309/20130903_02.md [《Systemtap supported data type (long,string,array,statistic), note don't support numeric except long》](../201309/20130903_02.md)
195+
##### 201309/20130903_01.md [《Eclipse Systemtap IDE》](../201309/20130903_01.md)
196+
##### 201308/20130814_02.md [《PostgreSQL SystemTap on Linux - 1》](../201308/20130814_02.md)
105197

106198
#### [PostgreSQL 许愿链接](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216")
107199
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。[开不开森](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216").

0 commit comments

Comments
 (0)