Skip to content

Commit bdce652

Browse files
committed
improve
1 parent 318b229 commit bdce652

File tree

5 files changed

+212
-3
lines changed

5 files changed

+212
-3
lines changed

201806/20180620_02.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 阿里云HybridDB for PG 使用空间紧张问题的建议处理流程与方法
1+
## 阿里云HybridDB for PG 空间紧张的解法 - 冷热分离、空间锁定、分区、压缩
22
33
### 作者
44
digoal

201806/20180622_02.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
## PostgreSQL 锁等待排查实践 - 珍藏级 - process xxx1 acquired RowExclusiveLock on relation xxx2 of database xxx3 after xxx4 ms at xxx
2+
3+
### 作者
4+
digoal
5+
6+
### 日期
7+
2018-06-22
8+
9+
### 标签
10+
PostgreSQL , 锁等待 , ddl , 大锁 , 雪崩 , lock_timeout , deadlock_timeout , AB表切换 , 分区表
11+
12+
----
13+
14+
## 背景
15+
当SQL请求锁等待超过deadlock_timeout指定的时间时,报类似如下日志:
16+
17+
```
18+
LOG: process xxx1 acquired RowExclusiveLock on relation xxx2 of database xxx3 after xxx4 ms at xxx
19+
STATEMENT: INSERT ...........
20+
```
21+
22+
解释:
23+
24+
xxx1进程请求位于数据库xxx3中的xxx2对象的RowExclusiveLock锁,已等待xxx4秒。
25+
26+
同学们可能会纳闷,怎么insert也会等待?
27+
28+
其实不管什么操作,都有可能出现等待,只要请求的锁与已有或已经在队列中的LOCK级别有冲突就会出现等待。
29+
30+
![pic](20180622_02_pic_001.jpg)
31+
32+
https://www.postgresql.org/docs/10/static/explicit-locking.html
33+
34+
src/include/storage/lockdefs.h
35+
36+
```
37+
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
38+
#define NoLock 0
39+
40+
#define AccessShareLock 1 /* SELECT */
41+
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
42+
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
43+
#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL),ANALYZE, CREATE INDEX
44+
* CONCURRENTLY */
45+
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
46+
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW
47+
* SHARE */
48+
#define ExclusiveLock 7 /* blocks ROW SHARE/SELECT...FOR UPDATE */
49+
#define AccessExclusiveLock 8 /* ALTER TABLE, DROP TABLE, VACUUM FULL,
50+
* and unqualified LOCK TABLE */
51+
```
52+
53+
## 如何分析? - 实例讲解
54+
请参考万能文章:
55+
56+
[《PostgreSQL 锁等待监控 珍藏级SQL - 谁堵塞了谁》](../201705/20170521_01.md)
57+
58+
1、开启审计日志
59+
60+
```
61+
log_destination = 'csvlog'
62+
logging_collector = on
63+
log_truncate_on_rotation = on
64+
log_statement = 'all'
65+
```
66+
67+
2、psql 挂一个打印锁等待的窗口
68+
69+
```
70+
psql
71+
72+
with
73+
t_wait as
74+
(
75+
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
76+
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
77+
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
78+
from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted
79+
),
80+
t_run as
81+
(
82+
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
83+
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
84+
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
85+
from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted
86+
),
87+
t_overlap as
88+
(
89+
select r.* from t_wait w join t_run r on
90+
(
91+
r.locktype is not distinct from w.locktype and
92+
r.database is not distinct from w.database and
93+
r.relation is not distinct from w.relation and
94+
r.page is not distinct from w.page and
95+
r.tuple is not distinct from w.tuple and
96+
r.virtualxid is not distinct from w.virtualxid and
97+
r.transactionid is not distinct from w.transactionid and
98+
r.classid is not distinct from w.classid and
99+
r.objid is not distinct from w.objid and
100+
r.objsubid is not distinct from w.objsubid and
101+
r.pid <> w.pid
102+
)
103+
),
104+
t_unionall as
105+
(
106+
select r.* from t_overlap r
107+
union all
108+
select w.* from t_wait w
109+
)
110+
select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid,
111+
string_agg(
112+
'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)||
113+
'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)||
114+
'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)||
115+
'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)||
116+
'SQL (Current SQL in Transaction): '||chr(10)||
117+
case when query is null then 'NULL' else query::text end,
118+
chr(10)||'--------'||chr(10)
119+
order by
120+
( case mode
121+
when 'INVALID' then 0
122+
when 'AccessShareLock' then 1
123+
when 'RowShareLock' then 2
124+
when 'RowExclusiveLock' then 3
125+
when 'ShareUpdateExclusiveLock' then 4
126+
when 'ShareLock' then 5
127+
when 'ShareRowExclusiveLock' then 6
128+
when 'ExclusiveLock' then 7
129+
when 'AccessExclusiveLock' then 8
130+
else 0
131+
end ) desc,
132+
(case when granted then 0 else 1 end)
133+
) as lock_conflict
134+
from t_unionall
135+
group by
136+
locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
137+
138+
\watch 0.2
139+
```
140+
141+
3、tail 挂一个日志观测窗口
142+
143+
```
144+
for ((i=1;i>0;i=1)); do grep RowExclusiveLock *.csv ; sleep 0.2; done
145+
146+
147+
148+
for ((i=1;i>0;i=1)); do grep acquired *.csv ; sleep 0.2; done
149+
```
150+
151+
4、发现问题
152+
153+
在业务运行过程中,如果问题复现,一定能观测到日志。
154+
155+
通过锁等待的窗口,观测到其中一个会话对表xxx持有了```accessExclusiveLock```,也就是排他锁。通过前面的锁冲突表,你会发现这个锁和所有锁都冲突(实际上```DDL,VACUUM FULL```等操作都会持有排它锁,或者人为的发出```lock table xxx in access exclusive mode;```
156+
157+
所以,当然会堵塞其他对该表的INSERT操作了。
158+
159+
4\.1、根据锁等待的窗口拿到持有锁的PID,到审计日志里面查看这个PID在当前事务中,前面都发起了什么SQL。
160+
161+
4\.2、找到问题根源,原来这个事务发起了```ALTER TABLE XXX RENAME TO XXXXX;```的动作。
162+
163+
**这个事件也引起了INSERT变慢(实际上是锁等待,实践都花在了等待上面)的问题。**
164+
165+
很显然,业务上应该经常会有一些触发改表名的动作,比如为了防止一个表太大,经常做AB表切换的动作。
166+
167+
虽然切换表名只是修改元数据,但是这个瞬间的锁,在高并发的业务场景中,也会带来堵塞危害。
168+
169+
建议用户采用分区表,而不要在高并发业务中频繁使用DDL这样的大锁高危操作。
170+
171+
## 小结
172+
本文讲解了锁等待问题的排查方法。原因实际上是用户在业务中使用了AB表切换,虽然切换表名只是修改元数据,但是这个瞬间的锁,在高并发的业务场景中,也会带来堵塞危害。
173+
174+
建议用户采用分区表,而不要在高并发业务中频繁使用DDL这样的大锁高危操作。
175+
176+
[《PostgreSQL 9.x, 10, 11 hash分区表 用法举例》](../201805/20180524_05.md)
177+
178+
[《PostgreSQL 11 preview - 分区表 增强 汇总》](../201805/20180519_01.md)
179+
180+
[《PostgreSQL 查询涉及分区表过多导致的性能问题 - 性能诊断与优化(大量BIND, spin lock, SLEEP进程)》](../201801/20180124_01.md)
181+
182+
[《PostgreSQL 商用版本EPAS(阿里云ppas(Oracle 兼容版)) - 分区表性能优化 (堪比pg_pathman)》](../201801/20180122_03.md)
183+
184+
[《PostgreSQL 传统 hash 分区方法和性能》](../201711/20171122_02.md)
185+
186+
[《PostgreSQL 10 内置分区 vs pg_pathman perf profiling》](../201710/20171015_01.md)
187+
188+
[《PostgreSQL 10.0 preview 功能增强 - 内置分区表》](../201612/20161215_01.md)
189+
190+
[《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》](../201610/20161024_01.md)
191+
192+
另外。大锁操作,一定要注意防止雪崩。
193+
194+
最后,遇到问题要冷静思考,不要轻易认为数据库有问题。
195+
196+
197+
## 参考
198+
[《PostgreSQL 锁等待监控 珍藏级SQL - 谁堵塞了谁》](../201705/20170521_01.md)
199+
200+
[《PostgreSQL 设置单条SQL的执行超时 - 防雪崩》](201712/20171211_02.md)
201+
202+
[《如何防止数据库雪崩(泛洪 flood)》](201609/20160909_01.md)
203+
204+
205+
206+
<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a>
207+

201806/20180622_02_pic_001.jpg

115 KB
Loading

201806/readme.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@
22

33
### 文章列表
44
----
5+
##### 20180622_02.md [《PostgreSQL 锁等待排查实践 - 珍藏级 - process xxx1 acquired RowExclusiveLock on relation xxx2 of database xxx3 after xxx4 ms at xxx》](20180622_02.md)
56
##### 20180622_01.md [《PostgreSQL 空间包含 OR 组合查询 、 IN 大量重复值 CASE 优化 - 消重、消bitmapOr、消filter》](20180622_01.md)
67
##### 20180621_03.md [《在PostgreSQL中跑后台长任务的方法 - 使用dblink异步接口》](20180621_03.md)
78
##### 20180621_02.md [[转载] gdb 调试打印完整字符串》](20180621_02.md)
89
##### 20180621_01.md [《Greenplum 逻辑备份增强 - 备份期间不锁metadata(pg_class),不堵塞DDL》](20180621_01.md)
910
##### 20180620_04.md [《PostgreSQL 对象权限如何在元数据中获取 - 权限解读、定制化导出权限》](20180620_04.md)
1011
##### 20180620_03.md [《线下PG迁移到阿里云RDS PG - 兼容性、性能评估、迁移》](20180620_03.md)
11-
##### 20180620_02.md [《阿里云HybridDB for PG 使用空间紧张问题的建议处理流程与方法](20180620_02.md)
12+
##### 20180620_02.md [《阿里云HybridDB for PG 空间紧张的解法 - 冷热分离、空间锁定、分区、压缩](20180620_02.md)
1213
##### 20180620_01.md [《Greenplum segment级问题的排查 - gp_session_role=utility (含csvlog日志格式解读)》](20180620_01.md)
1314
##### 20180619_01.md [[未完待续] PostgreSQL pglogical 逻辑复制实现跨版本升级》](20180619_01.md)
1415
##### 20180616_01.md [《PostgreSQL 开启“审计日志、时间记录”带来的性能影响有多少?》](20180616_01.md)

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,14 @@ digoal's|PostgreSQL|文章|归类
3131

3232
### 未归类文档如下
3333
----
34+
##### 201806/20180622_02.md [《PostgreSQL 锁等待排查实践 - 珍藏级 - process xxx1 acquired RowExclusiveLock on relation xxx2 of database xxx3 after xxx4 ms at xxx》](201806/20180622_02.md)
3435
##### 201806/20180622_01.md [《PostgreSQL 空间包含 OR 组合查询 、 IN 大量重复值 CASE 优化 - 消重、消bitmapOr、消filter》](201806/20180622_01.md)
3536
##### 201806/20180621_03.md [《在PostgreSQL中跑后台长任务的方法 - 使用dblink异步接口》](201806/20180621_03.md)
3637
##### 201806/20180621_02.md [[转载] gdb 调试打印完整字符串》](201806/20180621_02.md)
3738
##### 201806/20180621_01.md [《Greenplum 逻辑备份增强 - 备份期间不锁metadata(pg_class),不堵塞DDL》](201806/20180621_01.md)
3839
##### 201806/20180620_04.md [《PostgreSQL 对象权限如何在元数据中获取 - 权限解读、定制化导出权限》](201806/20180620_04.md)
3940
##### 201806/20180620_03.md [《线下PG迁移到阿里云RDS PG - 兼容性、性能评估、迁移》](201806/20180620_03.md)
40-
##### 201806/20180620_02.md [《阿里云HybridDB for PG 使用空间紧张问题的建议处理流程与方法](201806/20180620_02.md)
41+
##### 201806/20180620_02.md [《阿里云HybridDB for PG 空间紧张的解法 - 冷热分离、空间锁定、分区、压缩](201806/20180620_02.md)
4142
##### 201806/20180620_01.md [《Greenplum segment级问题的排查 - gp_session_role=utility (含csvlog日志格式解读)》](201806/20180620_01.md)
4243
##### 201806/20180619_01.md [[未完待续] PostgreSQL pglogical 逻辑复制实现跨版本升级》](201806/20180619_01.md)
4344
##### 201806/20180616_01.md [《PostgreSQL 开启“审计日志、时间记录”带来的性能影响有多少?》](201806/20180616_01.md)

0 commit comments

Comments
 (0)