|
| 1 | +## PostgreSQL 11 preview - 新功能, 分区表全局索引管理 |
| 2 | + |
| 3 | +### 作者 |
| 4 | +digoal |
| 5 | + |
| 6 | +### 日期 |
| 7 | +2018-02-04 |
| 8 | + |
| 9 | +### 标签 |
| 10 | +PostgreSQL , 分区表 , 全局索引 , 分区索引 , 壳子索引 |
| 11 | + |
| 12 | +---- |
| 13 | + |
| 14 | +## 背景 |
| 15 | +PostgreSQL 10开始引入了原生支持的分区表,支持了range, list分区。 |
| 16 | + |
| 17 | +PostgreSQL 11对分区表的功能进行了增强,首先增加了hash分区表。其次,增加了分区表全局索引管理(PostgreSQL 10 分区表的主表上不支持索引,如果要建索引,不得不在每个分区上去定义,不太方便的。)。 |
| 18 | + |
| 19 | +以list分区为例,我们看看如何使用分区表的全局管理索引。 |
| 20 | + |
| 21 | +``` |
| 22 | +CREATE TABLE cities ( |
| 23 | + city_id bigserial not null, |
| 24 | + name text not null, |
| 25 | + population bigint |
| 26 | +) PARTITION BY LIST (left(lower(name), 1)); |
| 27 | + |
| 28 | +CREATE TABLE cities_ab |
| 29 | + PARTITION OF cities ( |
| 30 | + CONSTRAINT city_id_nonzero CHECK (city_id != 0) |
| 31 | +) FOR VALUES IN ('a', 'b'); |
| 32 | +``` |
| 33 | + |
| 34 | +## 创建分区表全局索引 |
| 35 | +1、创建全局索引 |
| 36 | + |
| 37 | +``` |
| 38 | +create index idx_cities_1 on cities(name); |
| 39 | +``` |
| 40 | + |
| 41 | +对现有分区,会自动添加这个索引 |
| 42 | + |
| 43 | +``` |
| 44 | +postgres=# \d+ cities |
| 45 | + Table "public.cities" |
| 46 | + Column | Type | Collation | Nullable | Default | Storage | Stats target | Description |
| 47 | +------------+--------+-----------+----------+-----------------------------------------+----------+--------------+------------- |
| 48 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | | |
| 49 | + name | text | | not null | | extended | | |
| 50 | + population | bigint | | | | plain | | |
| 51 | +Partition key: LIST ("left"(lower(name), 1)) |
| 52 | +Indexes: |
| 53 | + "idx_cities_1" btree (name) |
| 54 | +Partitions: cities_ab FOR VALUES IN ('a', 'b') |
| 55 | + |
| 56 | +postgres=# \d cities_ab |
| 57 | + Table "public.cities_ab" |
| 58 | + Column | Type | Collation | Nullable | Default |
| 59 | +------------+--------+-----------+----------+----------------------------------------- |
| 60 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 61 | + name | text | | not null | |
| 62 | + population | bigint | | | |
| 63 | +Partition of: cities FOR VALUES IN ('a', 'b') |
| 64 | +Indexes: |
| 65 | + "cities_ab_name_idx" btree (name) |
| 66 | +Check constraints: |
| 67 | + "city_id_nonzero" CHECK (city_id <> 0) |
| 68 | +``` |
| 69 | + |
| 70 | +元数据中,主表索引显示为'I'类型,分区上的索引显示为'i'类型。 |
| 71 | + |
| 72 | +``` |
| 73 | +postgres=# select relname,relkind from pg_class where relname ~ 'citi'; |
| 74 | + relname | relkind |
| 75 | +--------------------+--------- |
| 76 | + idx_cities_1 | I # 分区表 - 主表上的索引 |
| 77 | + cities_ab | r # 分区表 - 分区表 |
| 78 | + cities_ab_name_idx | i # 分区表 - 分区表上的索引 |
| 79 | + cities_city_id_seq | S # 序列 |
| 80 | + cities | p # 分区表 - 主表 |
| 81 | +(5 rows) |
| 82 | +``` |
| 83 | + |
| 84 | +2、新增分区,自动添加索引(如果这个分区上已经包含了同样定义的索引,那么会自动将这个索引attach到主表的索引中,而不会新建这个索引) |
| 85 | + |
| 86 | +``` |
| 87 | +CREATE TABLE cities_cd |
| 88 | + PARTITION OF cities ( |
| 89 | + CONSTRAINT city_id_nonzero CHECK (city_id != 0) |
| 90 | +) FOR VALUES IN ('c', 'd'); |
| 91 | +``` |
| 92 | + |
| 93 | +自动索引如下(cities_cd_name_idx): |
| 94 | + |
| 95 | +``` |
| 96 | +postgres=# \d+ cities_cd |
| 97 | + Table "public.cities_cd" |
| 98 | + Column | Type | Collation | Nullable | Default | Storage | Stats target | Description |
| 99 | +------------+--------+-----------+----------+-----------------------------------------+----------+--------------+------------- |
| 100 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | | |
| 101 | + name | text | | not null | | extended | | |
| 102 | + population | bigint | | | | plain | | |
| 103 | +Partition of: cities FOR VALUES IN ('c', 'd') |
| 104 | +Partition constraint: (("left"(lower(name), 1) IS NOT NULL) AND ("left"(lower(name), 1) = ANY (ARRAY['c'::text, 'd'::text]))) |
| 105 | +Indexes: |
| 106 | + "cities_cd_name_idx" btree (name) |
| 107 | +Check constraints: |
| 108 | + "city_id_nonzero" CHECK (city_id <> 0) |
| 109 | +``` |
| 110 | + |
| 111 | +## 创建分区表 主表全局壳子索引 |
| 112 | +有些时候,不希望所有的分区表都自动创建某些索引,但是又想统一管理全局索引,怎么办? |
| 113 | + |
| 114 | +1、我们可以定义壳子索引,使用ONLY这个关键字,表示这个索引只建在当前这个表上。 |
| 115 | + |
| 116 | +``` |
| 117 | +postgres=# create index idx_cities_2 on only cities (population); |
| 118 | +CREATE INDEX |
| 119 | +``` |
| 120 | + |
| 121 | +这种方法创建出来的索引为INVALID索引。 |
| 122 | + |
| 123 | +``` |
| 124 | +postgres=# \d+ cities |
| 125 | + Table "public.cities" |
| 126 | + Column | Type | Collation | Nullable | Default | Storage | Stats target | Description |
| 127 | +------------+--------+-----------+----------+-----------------------------------------+----------+--------------+------------- |
| 128 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | | |
| 129 | + name | text | | not null | | extended | | |
| 130 | + population | bigint | | | | plain | | |
| 131 | +Partition key: LIST ("left"(lower(name), 1)) |
| 132 | +Indexes: |
| 133 | + "idx_cities_1" btree (name) |
| 134 | + "idx_cities_2" btree (population) INVALID |
| 135 | +Partitions: cities_ab FOR VALUES IN ('a', 'b'), |
| 136 | + cities_cd FOR VALUES IN ('c', 'd') |
| 137 | +``` |
| 138 | + |
| 139 | +并且这个索引,也不会在分区上构建 |
| 140 | + |
| 141 | +``` |
| 142 | +postgres=# \d cities_ab |
| 143 | + Table "public.cities_ab" |
| 144 | + Column | Type | Collation | Nullable | Default |
| 145 | +------------+--------+-----------+----------+----------------------------------------- |
| 146 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 147 | + name | text | | not null | |
| 148 | + population | bigint | | | |
| 149 | +Partition of: cities FOR VALUES IN ('a', 'b') |
| 150 | +Indexes: |
| 151 | + "cities_ab_name_idx" btree (name) |
| 152 | +Check constraints: |
| 153 | + "city_id_nonzero" CHECK (city_id <> 0) |
| 154 | +``` |
| 155 | + |
| 156 | +## 将分区表分区索引绑定到主表全局索引 |
| 157 | + |
| 158 | +1、比如我们有一些分区想构建某个字段的索引,而其他分区确不想构建(比如对于某些分区的数据,我们不想按population字段查询,没有必要建立那些分区的索引) |
| 159 | + |
| 160 | +``` |
| 161 | +postgres=# create index idx_cities_ab_2 on cities_ab (population); |
| 162 | +CREATE INDEX |
| 163 | +``` |
| 164 | + |
| 165 | +2、将这个分区上的索引,绑定到INVALID的全局壳子索引下面 |
| 166 | + |
| 167 | +``` |
| 168 | +postgres=# alter index idx_cities_2 attach partition idx_cities_ab_2; |
| 169 | +ALTER INDEX |
| 170 | +``` |
| 171 | + |
| 172 | +``` |
| 173 | +postgres=# \d+ cities |
| 174 | + Table "public.cities" |
| 175 | + Column | Type | Collation | Nullable | Default | Storage | Stats target | Description |
| 176 | +------------+--------+-----------+----------+-----------------------------------------+----------+--------------+------------- |
| 177 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | | |
| 178 | + name | text | | not null | | extended | | |
| 179 | + population | bigint | | | | plain | | |
| 180 | +Partition key: LIST ("left"(lower(name), 1)) |
| 181 | +Indexes: |
| 182 | + "idx_cities_1" btree (name) |
| 183 | + "idx_cities_2" btree (population) INVALID |
| 184 | +Partitions: cities_ab FOR VALUES IN ('a', 'b'), |
| 185 | + cities_cd FOR VALUES IN ('c', 'd') |
| 186 | +``` |
| 187 | + |
| 188 | +3、那么现在的索引结构是这样的 |
| 189 | + |
| 190 | +``` |
| 191 | +postgres=# select relname,relkind from pg_class where relname ~ 'citi'; |
| 192 | + relname | relkind |
| 193 | +--------------------+--------- |
| 194 | + idx_cities_1 | I |
| 195 | + cities_ab | r |
| 196 | + cities_ab_name_idx | i |
| 197 | + cities_cd_name_idx | i |
| 198 | + cities_cd | r |
| 199 | + idx_cities_2 | I |
| 200 | + idx_cities_ab_2 | i |
| 201 | + cities_city_id_seq | S |
| 202 | + cities | p |
| 203 | +(9 rows) |
| 204 | +``` |
| 205 | + |
| 206 | +全局壳子索引的好处: |
| 207 | + |
| 208 | +1、新增的分区表,不会自动创建这个索引,而是需要手工添加。满足一些特殊需求(例如有些分区不会检索某个字段,而其他分区需要检索,那么可以区别对待。)、 |
| 209 | + |
| 210 | +## 全局索引管理的好处 |
| 211 | + |
| 212 | +1、新增分区自动创建全局索引一样的索引(如果分区中已经包含了一个定义一致的本地索引,那么这个索引会自动attach到全局索引下,并被其管理)。 |
| 213 | + |
| 214 | +2、删除全局索引时,自动删除已经attach在这个全局索引下面的所有索引。 |
| 215 | + |
| 216 | +``` |
| 217 | +postgres=# drop index idx_cities_1; |
| 218 | +DROP INDEX |
| 219 | +postgres=# \d cities |
| 220 | + Table "public.cities" |
| 221 | + Column | Type | Collation | Nullable | Default |
| 222 | +------------+--------+-----------+----------+----------------------------------------- |
| 223 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 224 | + name | text | | not null | |
| 225 | + population | bigint | | | |
| 226 | +Partition key: LIST ("left"(lower(name), 1)) |
| 227 | +Indexes: |
| 228 | + "idx_cities_2" btree (population) INVALID |
| 229 | +Number of partitions: 2 (Use \d+ to list them.) |
| 230 | + |
| 231 | +postgres=# \d cities_ab |
| 232 | + Table "public.cities_ab" |
| 233 | + Column | Type | Collation | Nullable | Default |
| 234 | +------------+--------+-----------+----------+----------------------------------------- |
| 235 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 236 | + name | text | | not null | |
| 237 | + population | bigint | | | |
| 238 | +Partition of: cities FOR VALUES IN ('a', 'b') |
| 239 | +Indexes: |
| 240 | + "idx_cities_ab_2" btree (population) |
| 241 | +Check constraints: |
| 242 | + "city_id_nonzero" CHECK (city_id <> 0) |
| 243 | + |
| 244 | +postgres=# \d cities_cd |
| 245 | + Table "public.cities_cd" |
| 246 | + Column | Type | Collation | Nullable | Default |
| 247 | +------------+--------+-----------+----------+----------------------------------------- |
| 248 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 249 | + name | text | | not null | |
| 250 | + population | bigint | | | |
| 251 | +Partition of: cities FOR VALUES IN ('c', 'd') |
| 252 | +Check constraints: |
| 253 | + "city_id_nonzero" CHECK (city_id <> 0) |
| 254 | + |
| 255 | +``` |
| 256 | + |
| 257 | +在另一个分区上,创建一个本地索引。并且不将这个索引attach到主表的全局壳子索引上。 |
| 258 | + |
| 259 | +那么在删除主表索引时,这个本地索引不会被自动删除。 |
| 260 | + |
| 261 | +``` |
| 262 | +postgres=# create index idx_cities_cd_2 on cities_cd (population ); |
| 263 | +CREATE INDEX |
| 264 | +postgres=# drop index idx_cities_2; |
| 265 | +DROP INDEX |
| 266 | +postgres=# \d cities |
| 267 | + Table "public.cities" |
| 268 | + Column | Type | Collation | Nullable | Default |
| 269 | +------------+--------+-----------+----------+----------------------------------------- |
| 270 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 271 | + name | text | | not null | |
| 272 | + population | bigint | | | |
| 273 | +Partition key: LIST ("left"(lower(name), 1)) |
| 274 | +Number of partitions: 2 (Use \d+ to list them.) |
| 275 | + |
| 276 | +postgres=# \d cities_ab |
| 277 | + Table "public.cities_ab" |
| 278 | + Column | Type | Collation | Nullable | Default |
| 279 | +------------+--------+-----------+----------+----------------------------------------- |
| 280 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 281 | + name | text | | not null | |
| 282 | + population | bigint | | | |
| 283 | +Partition of: cities FOR VALUES IN ('a', 'b') |
| 284 | +Check constraints: |
| 285 | + "city_id_nonzero" CHECK (city_id <> 0) |
| 286 | + |
| 287 | +postgres=# \d cities_cd |
| 288 | + Table "public.cities_cd" |
| 289 | + Column | Type | Collation | Nullable | Default |
| 290 | +------------+--------+-----------+----------+----------------------------------------- |
| 291 | + city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) |
| 292 | + name | text | | not null | |
| 293 | + population | bigint | | | |
| 294 | +Partition of: cities FOR VALUES IN ('c', 'd') |
| 295 | +Indexes: |
| 296 | + "idx_cities_cd_2" btree (population) |
| 297 | +Check constraints: |
| 298 | + "city_id_nonzero" CHECK (city_id <> 0) |
| 299 | +``` |
| 300 | + |
| 301 | +## 分区索引好处 |
| 302 | + |
| 303 | +分区索引不被全局管理,比较灵活,但是管理麻烦一些。 |
| 304 | + |
| 305 | +好处是不同的分区可以拥有不同的索引定义。在一些数据特殊管理的场景很灵活。 |
| 306 | + |
| 307 | +例如某些分区会经常检索A字段,但是某些分区确经常检索B字段,那么就可以为不同的分区创建不同的本地索引,而不是全局一起构建。可以节约一些成本,但是管理成本可能会上升。 |
| 308 | + |
| 309 | +## 元信息 |
| 310 | + |
| 311 | +元数据中,主表索引显示为'I'类型,分区上的索引显示为'i'类型。 |
| 312 | + |
| 313 | +``` |
| 314 | +postgres=# select relname,relkind from pg_class where relname ~ 'citi'; |
| 315 | + relname | relkind |
| 316 | +--------------------+--------- |
| 317 | + idx_cities_1 | I # 分区表 - 主表上的索引 |
| 318 | + cities_ab | r # 分区表 - 分区表 |
| 319 | + cities_ab_name_idx | i # 分区表 - 分区表上的索引 |
| 320 | + cities_city_id_seq | S # 序列 |
| 321 | + cities | p # 分区表 - 主表 |
| 322 | +(5 rows) |
| 323 | +``` |
| 324 | + |
| 325 | +解释relkind中的含义: |
| 326 | + |
| 327 | +``` |
| 328 | +I = partiton table global index, 分区表 - 主表上的索引 |
| 329 | +r = ordinary table, 普通表 |
| 330 | +i = index, 普通表的索引,或者分区表分区上的本地索引 |
| 331 | +S = sequence, 序列 |
| 332 | +t = TOAST table, 切片表(变长字段压缩后超过BLOCK的1/4时,会存到切片表中) |
| 333 | +v = view, 视图 |
| 334 | +m = materialized view, 物化视图 |
| 335 | +c = composite type, 自定义复合类型 |
| 336 | +f = foreign table, 外部表 |
| 337 | +p = partitioned table 分区表 - 主表 |
| 338 | +``` |
| 339 | + |
| 340 | + |
| 341 | +## 小结 |
| 342 | +PostgreSQL 11 允许对分区表的主表创建索引,并内部支持了全局索引的管理机制(包括依赖、自动attach等)。 |
| 343 | + |
| 344 | +全局索引带来的好处, |
| 345 | + |
| 346 | +1、当删除全局索引时,会将所有分区的索引自动删除。 |
| 347 | + |
| 348 | +2、当新增分区时,会自动创建与全局索引一致的索引(假设这个分区已有这样的索引,那么会自动将已有索引attach到全局索引下,便于管理) |
| 349 | + |
| 350 | +3、对分区表创建索引时,会自动在所有分区下创建本地索引,并将分区本地索引ATTACH到全局索引下便于管理。 |
| 351 | + |
| 352 | +另外,全局索引还有一种壳子模式,即使用only关键字创建,它的好处: |
| 353 | + |
| 354 | +1、使用ONLY关键字在分区表-主表上创建索引时,索引状态为INVALID,表示新建分区时,不会自动添加这个同类结构的索引,也不会自动ATTACH已有索引到这个全局索引下面。 |
| 355 | + |
| 356 | +2、使用全局索引的壳子模式,对不同的分区采用不同的索引,仅对需要用于管理的索引ATTACH到这个壳子中来管理。可以满足一些特殊业务场景(例如某些分区会经常检索A字段,但是某些分区确经常检索B字段,那么就可以为不同的分区创建不同的本地索引,而不是全局一起构建。可以节约一些成本,但是管理成本可能会上升。)。 |
| 357 | + |
| 358 | +PostgreSQL 11的全局索引使用非常灵活。 |
| 359 | + |
| 360 | +## 参考 |
| 361 | + |
| 362 | +https://www.postgresql.org/docs/devel/static/sql-alterindex.html |
| 363 | + |
| 364 | +https://www.postgresql.org/docs/devel/static/sql-createindex.html |
| 365 | + |
| 366 | +https://www.postgresql.org/docs/devel/static/catalog-pg-class.html |
| 367 | + |
| 368 | + |
0 commit comments