Skip to content

Commit f707548

Browse files
committed
new doc
1 parent 84ce6b0 commit f707548

File tree

6 files changed

+301
-0
lines changed

6 files changed

+301
-0
lines changed

201812/20181209_01.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ PostgreSQL , 实时轨迹 , IoT , 车联网 , GIS , 离散IO , 顺序IO , IO放
4949

5050
7、text 聚合不带压缩
5151

52+
8、类聚簇表
53+
54+
[《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》](../201905/20190503_03.md)
55+
5256
### 1 原始状态
5357

5458
```
@@ -250,6 +254,10 @@ insert into t_sensor_agg5 select sid,string_agg(t_sensor::text, '|' order by crt
250254
```
251255

252256
### 9 index only scan 类似聚集表效果
257+
注意:除了本文提到的full index达到类似聚簇表效果,还可以使用index include,达到同样效果,而且索引更加有效。
258+
259+
[《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》](../201905/20190503_03.md)
260+
253261
所有内容作为INDEX的KEY,类似聚集表的效果(相邻内容在同一个INDEX PAGE里面)。查询时走INDEX ONLY SCAN扫描方法,扫描的BLOCK最少。
254262

255263
注意:btree 索引内容不能超过1/3 PAGE (因为BTREE是双向链表,一个PAGE至少要有一条有效记录,所以有这个限制。)。
@@ -369,6 +377,8 @@ index only scan(类似聚集表)(无IO 放大) | 8880 MB | 2363
369377

370378
[《PostgreSQL 时序最佳实践 - 证券交易系统数据库设计 - 阿里云RDS PostgreSQL最佳实践》](../201704/20170417_01.md)
371379

380+
[《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》](../201905/20190503_03.md)
381+
372382

373383
<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a>
374384

201905/20190503_03.md

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
## PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)
2+
3+
### 作者
4+
digoal
5+
6+
### 日期
7+
2019-05-03
8+
9+
### 标签
10+
PostgreSQL , 离散扫描 , IoT , append only , 类聚簇 , index include
11+
12+
----
13+
14+
## 背景
15+
https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes
16+
17+
当一次SQL请求需要返回较多行,或者需要扫描较多行(即使使用索引)时,如果这些行在HEAP表中并非密集存储,而是非常离散的存储,那么扫描的记录数越多,访问的BLOCK就越多,性能会比较差。
18+
19+
优化思路:
20+
21+
1、cluster ,密集存储
22+
23+
让数据按索引的顺序密集存储,减少回表时IO放大
24+
25+
2、聚簇表
26+
27+
表的顺序与索引顺序一致,类似的还有index only scan(索引中包含所有需要搜索的字段,不回表)
28+
29+
3、预聚合
30+
31+
预先将需要访问的多条数据聚合成一条,例如轨迹数据,按目标对象聚合(例如单车ID),原始数据为点记录(多表),聚合成轨迹(单条)
32+
33+
4、index include
34+
35+
在索引中,放入额外属性内容,搜索时不需要回表,例如
36+
37+
```
38+
create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time);
39+
40+
create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time);
41+
```
42+
43+
以上两个索引的差异在哪里?
44+
45+
索引1,KEY是ID,在叶子节点中,存入KEY与(c1,c2,c3,info,crt_time)的内容。
46+
47+
索引2,KEY是(id,c1,c2,c3,info,crt_time),在所有节点中,存储的都是所有字段的值,比索引1要重,包括空间,索引维护,更新等。
48+
49+
应用举例:
50+
51+
[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 2 - (含index only scan类聚簇表效果)》](../201812/20181209_01.md)
52+
53+
[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 1》](../201812/20181207_01.md)
54+
55+
## index include例子
56+
对比三种情况(index , index only (full index) , index only (include) )的性能。
57+
58+
写入1000万数据,1000个KEY值,平均每个KEY值对应10000条数据,并且这1万行离散存储。
59+
60+
例如共享单车的轨迹,每个轨迹都是独立的点组成,同时有很多的单车在活动,所以存储到数据库时,每个单车的同一个轨迹的所有点实际上是离散存储在HEAP BLOCK中的。与本文涉及的内容相似。
61+
62+
1、include
63+
64+
```
65+
create table t1 (id int, c1 int, c2 int, c3 int, info text, crt_time timestamp);
66+
67+
create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time);
68+
69+
postgres=# insert into t1 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000);
70+
INSERT 0 10000000
71+
Time: 40343.081 ms (00:40.343)
72+
```
73+
74+
2、full index
75+
76+
```
77+
create table t2(like t1);
78+
create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time);
79+
80+
postgres=# insert into t2 select * from t1;
81+
INSERT 0 10000000
82+
Time: 52042.389 ms (00:52.042)
83+
```
84+
85+
3、index
86+
87+
```
88+
create table t3(like t1);
89+
create index idx_t3_1 on t3(id);
90+
postgres=# insert into t3 select * from t1;
91+
INSERT 0 10000000
92+
Time: 32631.633 ms (00:32.632)
93+
```
94+
95+
```
96+
vacuum analyze t1;
97+
vacuum analyze t2;
98+
vacuum analyze t3;
99+
```
100+
101+
4、查询效率
102+
103+
```
104+
postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t1 where id=1;
105+
QUERY PLAN
106+
-----------------------------------------------------------------------------------------------------------------------------------
107+
Index Only Scan using idx_t1_1 on public.t1 (cost=0.43..236.40 rows=9901 width=29) (actual time=0.011..1.292 rows=10040 loops=1)
108+
Output: id, c1, c2, c3, info, crt_time
109+
Index Cond: (t1.id = 1)
110+
Heap Fetches: 0
111+
Buffers: shared hit=62
112+
Planning Time: 0.030 ms
113+
Execution Time: 1.833 ms
114+
(7 rows)
115+
```
116+
117+
```
118+
postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t2 where id=1;
119+
QUERY PLAN
120+
-----------------------------------------------------------------------------------------------------------------------------------
121+
Index Only Scan using idx_t2_1 on public.t2 (cost=0.56..238.42 rows=9946 width=29) (actual time=0.031..1.504 rows=10040 loops=1)
122+
Output: id, c1, c2, c3, info, crt_time
123+
Index Cond: (t2.id = 1)
124+
Heap Fetches: 0
125+
Buffers: shared hit=63
126+
Planning Time: 0.078 ms
127+
Execution Time: 2.077 ms
128+
(7 rows)
129+
```
130+
131+
```
132+
postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t3 where id=1;
133+
QUERY PLAN
134+
--------------------------------------------------------------------------------------------------------------------------
135+
Bitmap Heap Scan on public.t3 (cost=107.26..10153.94 rows=9952 width=29) (actual time=3.061..17.160 rows=10040 loops=1)
136+
Output: id, c1, c2, c3, info, crt_time
137+
Recheck Cond: (t3.id = 1)
138+
Heap Blocks: exact=9392
139+
Buffers: shared hit=9420
140+
-> Bitmap Index Scan on idx_t3_1 (cost=0.00..104.78 rows=9952 width=0) (actual time=1.618..1.618 rows=10040 loops=1)
141+
Index Cond: (t3.id = 1)
142+
Buffers: shared hit=28
143+
Planning Time: 0.085 ms
144+
Execution Time: 17.768 ms
145+
(10 rows)
146+
147+
Time: 18.204 ms
148+
149+
postgres=# set enable_bitmapscan=off;
150+
151+
postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t3 where id=1;
152+
QUERY PLAN
153+
---------------------------------------------------------------------------------------------------------------------------------
154+
Index Scan using idx_t3_1 on public.t3 (cost=0.43..10457.29 rows=9952 width=29) (actual time=0.028..12.610 rows=10040 loops=1)
155+
Output: id, c1, c2, c3, info, crt_time
156+
Index Cond: (t3.id = 1)
157+
Buffers: shared hit=9420
158+
Planning Time: 0.087 ms
159+
Execution Time: 13.204 ms
160+
(6 rows)
161+
Time: 13.511 ms
162+
```
163+
164+
5、高并发查询性能对比
165+
166+
```
167+
vi test1.sql
168+
\set id random(1,1000)
169+
select id,c1,c2,c3,info,crt_time from t1 where id=:id;
170+
171+
vi test2.sql
172+
\set id random(1,1000)
173+
select id,c1,c2,c3,info,crt_time from t2 where id=:id;
174+
175+
vi test3.sql
176+
\set id random(1,1000)
177+
select id,c1,c2,c3,info,crt_time from t3 where id=:id;
178+
179+
180+
alter role all set enable_bitmapscan =off;
181+
```
182+
183+
5\.1、index only scan(index include)
184+
185+
```
186+
%Cpu(s): 32.7 us, 30.0 sy, 0.0 ni, 37.3 id
187+
188+
transaction type: ./test.sql
189+
scaling factor: 1
190+
query mode: prepared
191+
number of clients: 56
192+
number of threads: 56
193+
duration: 120 s
194+
number of transactions actually processed: 263335
195+
latency average = 25.519 ms
196+
latency stddev = 7.470 ms
197+
tps = 2193.947905 (including connections establishing)
198+
tps = 2194.053590 (excluding connections establishing)
199+
statement latencies in milliseconds:
200+
0.001 \set id random(1,1000)
201+
25.518 select id,c1,c2,c3,info,crt_time from t1 where id=:id;
202+
```
203+
204+
5\.2、index only scan(full index)
205+
206+
```
207+
%Cpu(s): 32.6 us, 30.1 sy, 0.0 ni, 37.3 id
208+
209+
transaction type: ./test.sql
210+
scaling factor: 1
211+
query mode: prepared
212+
number of clients: 56
213+
number of threads: 56
214+
duration: 120 s
215+
number of transactions actually processed: 262858
216+
latency average = 25.565 ms
217+
latency stddev = 7.574 ms
218+
tps = 2189.965138 (including connections establishing)
219+
tps = 2190.073948 (excluding connections establishing)
220+
statement latencies in milliseconds:
221+
0.001 \set id random(1,1000)
222+
25.564 select id,c1,c2,c3,info,crt_time from t2 where id=:id;
223+
```
224+
225+
5\.3、index scan(key only)
226+
227+
```
228+
%Cpu(s): 59.4 us, 12.6 sy, 0.0 ni, 28.0 id
229+
230+
scaling factor: 1
231+
query mode: prepared
232+
number of clients: 56
233+
number of threads: 56
234+
duration: 120 s
235+
number of transactions actually processed: 198793
236+
latency average = 33.804 ms
237+
latency stddev = 9.839 ms
238+
tps = 1656.139982 (including connections establishing)
239+
tps = 1656.227526 (excluding connections establishing)
240+
statement latencies in milliseconds:
241+
0.001 \set id random(1,1000)
242+
33.803 select id,c1,c2,c3,info,crt_time from t3 where id=:id;
243+
```
244+
245+
## 小结
246+
index include 应用场景
247+
248+
当一次SQL请求需要返回较多行,或者需要扫描较多行(即使使用索引)时,如果这些行在HEAP表中并非密集存储,而是非常离散的存储,那么扫描的记录数越多,访问的BLOCK就越多,性能会比较差。
249+
250+
![pic](20190503_03_pic_001.jpg)
251+
252+
index include技术,将key值以外的数据存储在index leaf page中,不需要回表就可以查询到这些数据,提高整体性能(同时又不需要将所有属性都放在KEY中,使得索引臃肿)。
253+
254+
![pic](20190503_03_pic_002.jpg)
255+
256+
例如共享单车的轨迹,每个轨迹都是独立的点组成,同时有很多的单车在活动,所以存储到数据库时,每个单车的同一个轨迹的所有点实际上是离散存储在HEAP BLOCK中的。与本文涉及的内容相似。
257+
258+
性能对比:
259+
260+
索引 | 写入1000万耗时 | KEY值搜索qps | CPU
261+
---|---|---
262+
index(key + include) | 40.3 | 2193 | 62.7%
263+
index(full index) | 52 | 2189 | 62.7%
264+
index(key only) | 32.6 | 1656 | 72%
265+
266+
## 参考
267+
[《PostgreSQL 12 preview - GiST 索引支持INCLUDE columns - 覆盖索引 - 类聚簇索引》](../201903/20190331_08.md)
268+
269+
[《PostgreSQL 10.0 preview 功能增强 - 唯一约束+附加字段组合功能索引 - 覆盖索引 - covering index》](../201703/20170312_23.md)
270+
271+
[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 2 - (含index only scan类聚簇表效果)》](../201812/20181209_01.md)
272+
273+
[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 1》](../201812/20181207_01.md)
274+
275+
[《PostgreSQL 9种索引的原理和应用场景》](../201706/20170627_01.md)
276+
277+
[《深入浅出PostgreSQL B-Tree索引结构》](../201605/20160528_01.md)
278+
279+
https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes
280+
281+
282+
<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a>
283+
284+
285+
## [digoal's 大量PostgreSQL文章入口](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")
286+
287+
288+
## [免费领取阿里云RDS PostgreSQL实例、ECS虚拟机](https://free.aliyun.com/ "57258f76c37864c6e6d23383d05714ea")
289+

201905/20190503_03_pic_001.jpg

91.9 KB
Loading

201905/20190503_03_pic_002.jpg

89.6 KB
Loading

201905/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22

33
### 文章列表
44
----
5+
##### 20190503_03.md [《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》](20190503_03.md)
56
##### 20190503_02.md [《PostgreSQL 与getrusage - 统计会话、语句资源使用情况, 同时也被用于资源隔离(EPAS,Greenplum) - log_parser_stats log_planner_stats log_executor_stats log_statement_stats》](20190503_02.md)
67
##### 20190503_01.md [《PostgreSQL 12 preview - vacuum 新增开关: index_cleanup , 是否同时回收索引》](20190503_01.md)

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ digoal's|PostgreSQL|文章|归类
4444

4545
### 所有文档如下
4646
----
47+
##### 201905/20190503_03.md [《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》](201905/20190503_03.md)
4748
##### 201905/20190503_02.md [《PostgreSQL 与getrusage - 统计会话、语句资源使用情况, 同时也被用于资源隔离(EPAS,Greenplum) - log_parser_stats log_planner_stats log_executor_stats log_statement_stats》](201905/20190503_02.md)
4849
##### 201905/20190503_01.md [《PostgreSQL 12 preview - vacuum 新增开关: index_cleanup , 是否同时回收索引》](201905/20190503_01.md)
4950
----

0 commit comments

Comments
 (0)