Skip to content

Commit f7a1d5b

Browse files
digoal zhoudigoal zhou
authored andcommitted
new doc
1 parent b64930b commit f7a1d5b

22 files changed

+208
-16
lines changed

201606/20160626_01.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ https://yq.aliyun.com/articles/38377
4747
逻辑更复杂一些的场景,需要将数据取到应用端,在应用端处理,这会涉及到move data,也会较大程度的放大网络RT。move data的模式正在逐渐成为影响用户体验、效率,浪费成本的罪魁祸首。
4848

4949
![2](https://oss-cn-hangzhou.aliyuncs.com/yqfiles/62dfdae72172a450199e8101efc7c28f8fba0e16.png)
50-
如果能把数据库打造成为同事具备数据存储、管理与处理能力为一体的产品。在数据库硬件资源充足的情况下,把一些数据库能处理的逻辑交给数据库处理,将极大的降低延迟,在高并发低延迟的应用场景非常有效。
50+
如果能把数据库打造成为同时具备数据存储、管理与处理能力为一体的产品。在数据库硬件资源充足的情况下,把一些数据库能处理的逻辑交给数据库处理,将极大的降低延迟,在高并发低延迟的应用场景非常有效。
5151

5252
这考验的就是数据库的扩展能力。
5353
<br />

201607/20160728_01.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -565,7 +565,7 @@ UBER文章说的 查询会与恢复堵塞,说的是物理备库,但必须纠
565565

566566
* Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still "see" any of the rows to be removed.
567567

568-
主库回收dead tuple的REDO,同事备库当前的query snapshot需要看到这些记录。
568+
主库回收dead tuple的REDO,同时备库当前的query snapshot需要看到这些记录。
569569

570570
这种情况可以通过参数控制,恢复优先,或查询优先。 可以配置时间窗口。
571571

201608/20160815_03.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ PostgreSQL 的基于流复制的物理备库是基于redo的物理块复制备
3939

4040
* Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still "see" any of the rows to be removed.
4141

42-
主库回收dead tuple的REDO,同事备库当前的query snapshot需要看到这些记录。
42+
主库回收dead tuple的REDO,同时备库当前的query snapshot需要看到这些记录。
4343

4444
这种情况可以通过参数控制,恢复优先,或查询优先。 可以配置时间窗口。
4545

201610/20161012_01.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ PostgreSQL , DaaS , 模板 , schema , database , apply delay , standby
1414
## 背景
1515
市面上有一些提供DaaS服务的厂商,例如heroKu,可能有上百万的数据库服务;
1616

17-
又比如提供PaaS平台的服务商,数据库也会有很多,同事这些数据库可能也是模板化的,这些厂商并不一定是为每个客户都新建一个数据库集群来满足数据库的需求。
17+
又比如提供PaaS平台的服务商,数据库也会有很多,同时这些数据库可能也是模板化的,这些厂商并不一定是为每个客户都新建一个数据库集群来满足数据库的需求。
1818

1919
很有可能是使用数据库或者schema来隔离不同用户的。
2020

201702/20170215_02.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ https://blog.tankywoo.com/2015/01/29/sysctl-load-order.html
2525

2626
昨天恰好又出现了, 于是今天准备写一个监控net.ipv4.ip_forward值的脚本, 才写到一半, 跳闸了, 网关机器启动后, 发现无法转发, nat表都是正常的, 继而发现ip_forward也被改为0了.
2727

28-
改回来后, 过了一会, 又跳闸, 启动后确认, 基本确定了是重启会造成这个问题. 因为之前网关机器的服务异常时, 同事都是重启了机器.
28+
改回来后, 过了一会, 又跳闸, 启动后确认, 基本确定了是重启会造成这个问题. 因为之前网关机器的服务异常时, 同时都是重启了机器.
2929

3030
继续排查, Gentoo的默认/etc/sysctl.conf配置(主要部分):
3131

201911/20191112_01.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ PostgreSQL , pg_basebackup , 阿里云 , standby
1414
## 背景
1515
自建异地rds pg 11从库,不管是同机房,还是异地,只要网络通就可以。利用PG的流复制。
1616

17-
以异地pg 从库 on ecs为例。首先要确保ecs的存储,内存与rds pg规格相当。同事建议
17+
以异地pg 从库 on ecs为例。首先要确保ecs的存储,内存与rds pg规格相当。
1818

1919
## 例子
2020
1、centos 7.x x64

202003/20200310_01.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,52 @@ rn | 1
230230

231231
也许未来内核会支持这样的接口, 可以找到精确的堵塞wal replay的query.
232232

233+
目前PG提供的查询blocking的系统函数只能用于查询重量级锁冲突:
234+
```
235+
-[ RECORD 1 ]-------+-----------------------------------------------------------------------------------------------
236+
Schema | pg_catalog
237+
Name | pg_blocking_pids
238+
Result data type | integer[]
239+
Argument data types | integer
240+
Type | func
241+
Volatility | volatile
242+
Parallel | safe
243+
Owner | postgres
244+
Security | invoker
245+
Access privileges |
246+
Language | internal
247+
Source code | pg_blocking_pids
248+
Description | get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock
249+
-[ RECORD 2 ]-------+-----------------------------------------------------------------------------------------------
250+
Schema | pg_catalog
251+
Name | pg_isolation_test_session_is_blocked
252+
Result data type | boolean
253+
Argument data types | integer, integer[]
254+
Type | func
255+
Volatility | volatile
256+
Parallel | safe
257+
Owner | postgres
258+
Security | invoker
259+
Access privileges |
260+
Language | internal
261+
Source code | pg_isolation_test_session_is_blocked
262+
Description | isolationtester support function
263+
-[ RECORD 3 ]-------+-----------------------------------------------------------------------------------------------
264+
Schema | pg_catalog
265+
Name | pg_safe_snapshot_blocking_pids
266+
Result data type | integer[]
267+
Argument data types | integer
268+
Type | func
269+
Volatility | volatile
270+
Parallel | safe
271+
Owner | postgres
272+
Security | invoker
273+
Access privileges |
274+
Language | internal
275+
Source code | pg_safe_snapshot_blocking_pids
276+
Description | get array of PIDs of sessions blocking specified backend PID from acquiring a safe snapshot
277+
```
278+
233279
## 堵塞了多少wal没有被replay
234280
```
235281
db1=# select pg_is_wal_replay_paused(),

202005/20200518_01.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ psql postgresql://localhost:1921/postgres?options="-c statement_timeout%3D5s -c
148148
- 当从库有big(long) query时, 主库IO暴增, 垃圾回收进程空转, 显示为表有垃圾, autovacuum发起扫描表垃圾, 但是回收不掉(因为从库依赖这些垃圾版本), 导致大量无用IO.
149149
- 当从库有big(long) query时, 主库表膨胀, 因为某些时刻更多的垃圾无法被及时回收, 导致膨胀.
150150

151-
5、如果是standby本身资源问题导致delay, 那么建议查看standby节点的网络贷款、cpu、io能力是否存在瓶颈. 该加资源就加资源.
151+
5、如果是standby本身资源问题导致delay, 那么建议查看standby节点的网络带宽、cpu、io能力是否存在瓶颈. 该加资源就加资源.
152152

153153

154154
## 参考

202011/20201117_02.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ PostgreSQL , 冲突 , replay , standby , query
1414
## 背景
1515
https://www.cybertec-postgresql.com/en/streaming-replication-conflicts-in-postgresql/
1616

17-
几种冲突的解释说得比较清楚, 最后Avoiding buffer pin conflicts可以尝试在主节点开启vacuum_defer_cleanup_age.
17+
几种冲突的解释说得比较清楚.
1818

1919
一切避免冲突的手段都可能导致主节点的垃圾回收做无用功, 费劲IO和CPU却不回收垃圾.
2020

@@ -41,14 +41,14 @@ This is the most frequent replication conflict.
4141
Snapshot conflicts can occur if VACUUM processes a table and removes dead tuples. This removal is replayed on the standby. Now a query on the standby may have started before VACUUM on the primary (it has an older snapshot), so it can still see the tuples that should be removed. This constitutes a snapshot conflict.
4242

4343
### Lock replication conflicts
44-
The queries on a standby server take an ACCESS EXCLUSIVE lock on the tables they are reading. So any ACCESS EXCLUSIVE lock on the primary (which conflicts with ACCESS SHARE) must be replayed on the standby to keep incompatible operations on the table from happening. PostgreSQL takes such a lock for operations that conflict with SELECT, for example DROP TABLE, TRUNCATE and many ALTER TABLE statements. If the standby should replay such a lock on a table that a query uses, we have a lock conflict.
44+
The queries on a standby server take an ACCESS SHARE lock on the tables they are reading. So any ACCESS EXCLUSIVE lock on the primary (which conflicts with ACCESS SHARE) must be replayed on the standby to keep incompatible operations on the table from happening. PostgreSQL takes such a lock for operations that conflict with SELECT, for example DROP TABLE, TRUNCATE and many ALTER TABLE statements. If the standby should replay such a lock on a table that a query uses, we have a lock conflict.
4545

4646
### Buffer pin replication conflicts
47-
One way to reduce the need for VACUUM is to use HOT updates. Then any query on the primary that accesses a page with dead heap-only tuples and can get an exclusive lock on it will prune the HOT chains. PostgreSQL always holds such page locks for a short time, so there is no conflict with processing on the primary. There are other causes for page locks, but this is perhaps the most frequent one.
47+
One way to reduce the need for VACUUM is to use HOT updates. Then any query on the primary that accesses a page with dead heap-only tuples and can get an exclusive lock on it will prune the HOT chains.(查询时修改page, 缩短hot chain. 加轻量级锁) PostgreSQL always holds such page locks for a short time, so there is no conflict with processing on the primary. There are other causes for page locks, but this is perhaps the most frequent one.
4848

49-
When the standby server should replay such an exclusive page lock and a query is using the page (“has the page pinned” in PostgreSQL jargon), you get a buffer pin replication conflict. Pages can be pinned for a while, for example during a sequential scan of a table on the outer side of a nested loop join.
49+
When the standby server should replay such an exclusive page lock and a query is using the page (“has the page pinned” in PostgreSQL jargon), you get a buffer pin replication conflict. Pages can be pinned for a while, for example during a sequential scan of a table on the outer side of a nested loop join. (当从库上有nest loop join, 并且外表是全表扫描, 而且刚好这个外表有prune HOT chains的wal replay时, 这个replay可能长时间等待).
5050

51-
HOT chain pruning can of course also lead to snapshot replication conflicts.
51+
HOT chain pruning can of course also lead to snapshot replication conflicts. (HOT chain pruning也会导致snapshot 冲突)
5252

5353
### Rare kinds of replication conflicts
5454
The following types of conflict are rare and will not bother you:

202112/20211216_01.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Linux ftrace
1+
## Linux ftrace - 内核性能分析 - 火焰图
22
33
### 作者
44
digoal
@@ -7,7 +7,7 @@ digoal
77
2021-12-16
88

99
### 标签
10-
PostgreSQL , Linux , trace , ftrace , strace , ptrace , read , write , IO
10+
PostgreSQL , Linux , 火焰图 , trace , ftrace , strace , ptrace , read , write , IO
1111

1212
----
1313

0 commit comments

Comments
 (0)