|
| 1 | +## PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回) |
| 2 | + |
| 3 | +### 作者 |
| 4 | +digoal |
| 5 | + |
| 6 | +### 日期 |
| 7 | +2019-05-03 |
| 8 | + |
| 9 | +### 标签 |
| 10 | +PostgreSQL , 离散扫描 , IoT , append only , 类聚簇 , index include |
| 11 | + |
| 12 | +---- |
| 13 | + |
| 14 | +## 背景 |
| 15 | +https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes |
| 16 | + |
| 17 | +当一次SQL请求需要返回较多行,或者需要扫描较多行(即使使用索引)时,如果这些行在HEAP表中并非密集存储,而是非常离散的存储,那么扫描的记录数越多,访问的BLOCK就越多,性能会比较差。 |
| 18 | + |
| 19 | +优化思路: |
| 20 | + |
| 21 | +1、cluster ,密集存储 |
| 22 | + |
| 23 | +让数据按索引的顺序密集存储,减少回表时IO放大 |
| 24 | + |
| 25 | +2、聚簇表 |
| 26 | + |
| 27 | +表的顺序与索引顺序一致,类似的还有index only scan(索引中包含所有需要搜索的字段,不回表) |
| 28 | + |
| 29 | +3、预聚合 |
| 30 | + |
| 31 | +预先将需要访问的多条数据聚合成一条,例如轨迹数据,按目标对象聚合(例如单车ID),原始数据为点记录(多表),聚合成轨迹(单条) |
| 32 | + |
| 33 | +4、index include |
| 34 | + |
| 35 | +在索引中,放入额外属性内容,搜索时不需要回表,例如 |
| 36 | + |
| 37 | +``` |
| 38 | +create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time); |
| 39 | + |
| 40 | +create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time); |
| 41 | +``` |
| 42 | + |
| 43 | +以上两个索引的差异在哪里? |
| 44 | + |
| 45 | +索引1,KEY是ID,在叶子节点中,存入KEY与(c1,c2,c3,info,crt_time)的内容。 |
| 46 | + |
| 47 | +索引2,KEY是(id,c1,c2,c3,info,crt_time),在所有节点中,存储的都是所有字段的值,比索引1要重,包括空间,索引维护,更新等。 |
| 48 | + |
| 49 | +应用举例: |
| 50 | + |
| 51 | +[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 2 - (含index only scan类聚簇表效果)》](../201812/20181209_01.md) |
| 52 | + |
| 53 | +[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 1》](../201812/20181207_01.md) |
| 54 | + |
| 55 | +## index include例子 |
| 56 | +对比三种情况(index , index only (full index) , index only (include) )的性能。 |
| 57 | + |
| 58 | +写入1000万数据,1000个KEY值,平均每个KEY值对应10000条数据,并且这1万行离散存储。 |
| 59 | + |
| 60 | +例如共享单车的轨迹,每个轨迹都是独立的点组成,同时有很多的单车在活动,所以存储到数据库时,每个单车的同一个轨迹的所有点实际上是离散存储在HEAP BLOCK中的。与本文涉及的内容相似。 |
| 61 | + |
| 62 | +1、include |
| 63 | + |
| 64 | +``` |
| 65 | +create table t1 (id int, c1 int, c2 int, c3 int, info text, crt_time timestamp); |
| 66 | + |
| 67 | +create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time); |
| 68 | + |
| 69 | +postgres=# insert into t1 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000); |
| 70 | +INSERT 0 10000000 |
| 71 | +Time: 40343.081 ms (00:40.343) |
| 72 | +``` |
| 73 | + |
| 74 | +2、full index |
| 75 | + |
| 76 | +``` |
| 77 | +create table t2(like t1); |
| 78 | +create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time); |
| 79 | + |
| 80 | +postgres=# insert into t2 select * from t1; |
| 81 | +INSERT 0 10000000 |
| 82 | +Time: 52042.389 ms (00:52.042) |
| 83 | +``` |
| 84 | + |
| 85 | +3、index |
| 86 | + |
| 87 | +``` |
| 88 | +create table t3(like t1); |
| 89 | +create index idx_t3_1 on t3(id); |
| 90 | +postgres=# insert into t3 select * from t1; |
| 91 | +INSERT 0 10000000 |
| 92 | +Time: 32631.633 ms (00:32.632) |
| 93 | +``` |
| 94 | + |
| 95 | +``` |
| 96 | +vacuum analyze t1; |
| 97 | +vacuum analyze t2; |
| 98 | +vacuum analyze t3; |
| 99 | +``` |
| 100 | + |
| 101 | +4、查询效率 |
| 102 | + |
| 103 | +``` |
| 104 | +postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t1 where id=1; |
| 105 | + QUERY PLAN |
| 106 | +----------------------------------------------------------------------------------------------------------------------------------- |
| 107 | + Index Only Scan using idx_t1_1 on public.t1 (cost=0.43..236.40 rows=9901 width=29) (actual time=0.011..1.292 rows=10040 loops=1) |
| 108 | + Output: id, c1, c2, c3, info, crt_time |
| 109 | + Index Cond: (t1.id = 1) |
| 110 | + Heap Fetches: 0 |
| 111 | + Buffers: shared hit=62 |
| 112 | + Planning Time: 0.030 ms |
| 113 | + Execution Time: 1.833 ms |
| 114 | +(7 rows) |
| 115 | +``` |
| 116 | + |
| 117 | +``` |
| 118 | +postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t2 where id=1; |
| 119 | + QUERY PLAN |
| 120 | +----------------------------------------------------------------------------------------------------------------------------------- |
| 121 | + Index Only Scan using idx_t2_1 on public.t2 (cost=0.56..238.42 rows=9946 width=29) (actual time=0.031..1.504 rows=10040 loops=1) |
| 122 | + Output: id, c1, c2, c3, info, crt_time |
| 123 | + Index Cond: (t2.id = 1) |
| 124 | + Heap Fetches: 0 |
| 125 | + Buffers: shared hit=63 |
| 126 | + Planning Time: 0.078 ms |
| 127 | + Execution Time: 2.077 ms |
| 128 | +(7 rows) |
| 129 | +``` |
| 130 | + |
| 131 | +``` |
| 132 | +postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t3 where id=1; |
| 133 | + QUERY PLAN |
| 134 | +-------------------------------------------------------------------------------------------------------------------------- |
| 135 | + Bitmap Heap Scan on public.t3 (cost=107.26..10153.94 rows=9952 width=29) (actual time=3.061..17.160 rows=10040 loops=1) |
| 136 | + Output: id, c1, c2, c3, info, crt_time |
| 137 | + Recheck Cond: (t3.id = 1) |
| 138 | + Heap Blocks: exact=9392 |
| 139 | + Buffers: shared hit=9420 |
| 140 | + -> Bitmap Index Scan on idx_t3_1 (cost=0.00..104.78 rows=9952 width=0) (actual time=1.618..1.618 rows=10040 loops=1) |
| 141 | + Index Cond: (t3.id = 1) |
| 142 | + Buffers: shared hit=28 |
| 143 | + Planning Time: 0.085 ms |
| 144 | + Execution Time: 17.768 ms |
| 145 | +(10 rows) |
| 146 | + |
| 147 | +Time: 18.204 ms |
| 148 | + |
| 149 | +postgres=# set enable_bitmapscan=off; |
| 150 | + |
| 151 | +postgres=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t3 where id=1; |
| 152 | + QUERY PLAN |
| 153 | +--------------------------------------------------------------------------------------------------------------------------------- |
| 154 | + Index Scan using idx_t3_1 on public.t3 (cost=0.43..10457.29 rows=9952 width=29) (actual time=0.028..12.610 rows=10040 loops=1) |
| 155 | + Output: id, c1, c2, c3, info, crt_time |
| 156 | + Index Cond: (t3.id = 1) |
| 157 | + Buffers: shared hit=9420 |
| 158 | + Planning Time: 0.087 ms |
| 159 | + Execution Time: 13.204 ms |
| 160 | +(6 rows) |
| 161 | +Time: 13.511 ms |
| 162 | +``` |
| 163 | + |
| 164 | +5、高并发查询性能对比 |
| 165 | + |
| 166 | +``` |
| 167 | +vi test1.sql |
| 168 | +\set id random(1,1000) |
| 169 | +select id,c1,c2,c3,info,crt_time from t1 where id=:id; |
| 170 | + |
| 171 | +vi test2.sql |
| 172 | +\set id random(1,1000) |
| 173 | +select id,c1,c2,c3,info,crt_time from t2 where id=:id; |
| 174 | + |
| 175 | +vi test3.sql |
| 176 | +\set id random(1,1000) |
| 177 | +select id,c1,c2,c3,info,crt_time from t3 where id=:id; |
| 178 | + |
| 179 | + |
| 180 | +alter role all set enable_bitmapscan =off; |
| 181 | +``` |
| 182 | + |
| 183 | +5\.1、index only scan(index include) |
| 184 | + |
| 185 | +``` |
| 186 | +%Cpu(s): 32.7 us, 30.0 sy, 0.0 ni, 37.3 id |
| 187 | + |
| 188 | +transaction type: ./test.sql |
| 189 | +scaling factor: 1 |
| 190 | +query mode: prepared |
| 191 | +number of clients: 56 |
| 192 | +number of threads: 56 |
| 193 | +duration: 120 s |
| 194 | +number of transactions actually processed: 263335 |
| 195 | +latency average = 25.519 ms |
| 196 | +latency stddev = 7.470 ms |
| 197 | +tps = 2193.947905 (including connections establishing) |
| 198 | +tps = 2194.053590 (excluding connections establishing) |
| 199 | +statement latencies in milliseconds: |
| 200 | + 0.001 \set id random(1,1000) |
| 201 | + 25.518 select id,c1,c2,c3,info,crt_time from t1 where id=:id; |
| 202 | +``` |
| 203 | + |
| 204 | +5\.2、index only scan(full index) |
| 205 | + |
| 206 | +``` |
| 207 | +%Cpu(s): 32.6 us, 30.1 sy, 0.0 ni, 37.3 id |
| 208 | + |
| 209 | +transaction type: ./test.sql |
| 210 | +scaling factor: 1 |
| 211 | +query mode: prepared |
| 212 | +number of clients: 56 |
| 213 | +number of threads: 56 |
| 214 | +duration: 120 s |
| 215 | +number of transactions actually processed: 262858 |
| 216 | +latency average = 25.565 ms |
| 217 | +latency stddev = 7.574 ms |
| 218 | +tps = 2189.965138 (including connections establishing) |
| 219 | +tps = 2190.073948 (excluding connections establishing) |
| 220 | +statement latencies in milliseconds: |
| 221 | + 0.001 \set id random(1,1000) |
| 222 | + 25.564 select id,c1,c2,c3,info,crt_time from t2 where id=:id; |
| 223 | +``` |
| 224 | + |
| 225 | +5\.3、index scan(key only) |
| 226 | + |
| 227 | +``` |
| 228 | +%Cpu(s): 59.4 us, 12.6 sy, 0.0 ni, 28.0 id |
| 229 | + |
| 230 | +scaling factor: 1 |
| 231 | +query mode: prepared |
| 232 | +number of clients: 56 |
| 233 | +number of threads: 56 |
| 234 | +duration: 120 s |
| 235 | +number of transactions actually processed: 198793 |
| 236 | +latency average = 33.804 ms |
| 237 | +latency stddev = 9.839 ms |
| 238 | +tps = 1656.139982 (including connections establishing) |
| 239 | +tps = 1656.227526 (excluding connections establishing) |
| 240 | +statement latencies in milliseconds: |
| 241 | + 0.001 \set id random(1,1000) |
| 242 | + 33.803 select id,c1,c2,c3,info,crt_time from t3 where id=:id; |
| 243 | +``` |
| 244 | + |
| 245 | +## 小结 |
| 246 | +index include 应用场景 |
| 247 | + |
| 248 | +当一次SQL请求需要返回较多行,或者需要扫描较多行(即使使用索引)时,如果这些行在HEAP表中并非密集存储,而是非常离散的存储,那么扫描的记录数越多,访问的BLOCK就越多,性能会比较差。 |
| 249 | + |
| 250 | + |
| 251 | + |
| 252 | +index include技术,将key值以外的数据存储在index leaf page中,不需要回表就可以查询到这些数据,提高整体性能(同时又不需要将所有属性都放在KEY中,使得索引臃肿)。 |
| 253 | + |
| 254 | + |
| 255 | + |
| 256 | +例如共享单车的轨迹,每个轨迹都是独立的点组成,同时有很多的单车在活动,所以存储到数据库时,每个单车的同一个轨迹的所有点实际上是离散存储在HEAP BLOCK中的。与本文涉及的内容相似。 |
| 257 | + |
| 258 | +性能对比: |
| 259 | + |
| 260 | +索引 | 写入1000万耗时 | KEY值搜索qps | CPU |
| 261 | +---|---|--- |
| 262 | +index(key + include) | 40.3 | 2193 | 62.7% |
| 263 | +index(full index) | 52 | 2189 | 62.7% |
| 264 | +index(key only) | 32.6 | 1656 | 72% |
| 265 | + |
| 266 | +## 参考 |
| 267 | +[《PostgreSQL 12 preview - GiST 索引支持INCLUDE columns - 覆盖索引 - 类聚簇索引》](../201903/20190331_08.md) |
| 268 | + |
| 269 | +[《PostgreSQL 10.0 preview 功能增强 - 唯一约束+附加字段组合功能索引 - 覆盖索引 - covering index》](../201703/20170312_23.md) |
| 270 | + |
| 271 | +[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 2 - (含index only scan类聚簇表效果)》](../201812/20181209_01.md) |
| 272 | + |
| 273 | +[《PostgreSQL IoT,车联网 - 实时轨迹、行程实践 1》](../201812/20181207_01.md) |
| 274 | + |
| 275 | +[《PostgreSQL 9种索引的原理和应用场景》](../201706/20170627_01.md) |
| 276 | + |
| 277 | +[《深入浅出PostgreSQL B-Tree索引结构》](../201605/20160528_01.md) |
| 278 | + |
| 279 | +https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes |
| 280 | + |
| 281 | + |
| 282 | +<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a> |
| 283 | + |
| 284 | + |
| 285 | +## [digoal's 大量PostgreSQL文章入口](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae") |
| 286 | + |
| 287 | + |
| 288 | +## [免费领取阿里云RDS PostgreSQL实例、ECS虚拟机](https://free.aliyun.com/ "57258f76c37864c6e6d23383d05714ea") |
| 289 | + |
0 commit comments