Skip to content

Commit a279540

Browse files
committed
gin
1 parent a1c069f commit a279540

17 files changed

+999
-5
lines changed

201702/20170203_01.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,15 @@ PostgreSQL , gin , in , or , multi key , right link scan , skip scan
1414
## 背景
1515
PostgreSQL中,有一种GIN索引,被广泛应用于多值类型,例如数组,分词,同时也被应用于模糊查询等领域。
1616

17-
gin索引,将列(比如数组,全文检索类型)中的值拿出来,再存储到树形结构中(类似B+TREE,值+行号s),对于高频值,为了减少树的深度,行号s会存储在另外的页中
17+
gin索引,将列(比如数组,全文检索类型)中的值拿出来,再存储到树形结构中(类似B-TREE,键值+heap行号s),对于低频值,会作为posting list直接存在树的gin的叶子节点中,而对于高频值,行号s会存储在另外树结构(posting tree)中,gin的叶子节点中存储的是指向posting tree的pointer
1818

1919
![pic](../201612/20161231_01_pic_001.jpg)
2020

2121
![pic](../201612/20161231_01_pic_002.jpg)
2222

2323
![pic](../201612/20161231_01_pic_003.jpg)
2424

25-
GIN本质上是elemet为key的树结构,而value则为pointer for posting tree或者posting list。
25+
GIN本质上是elemet为key的树结构,而value则为"posting tree pointer"或者"posting list"
2626

2727
```
2828
Internally, a GIN index contains a B-tree index constructed over keys,
@@ -31,12 +31,12 @@ where each key is an element of one or more indexed items (a member of an array,
3131
3232
and where each tuple in a leaf page contains either
3333
34-
a pointer to a B-tree of heap pointers (a “posting tree”), // 通常指key+ctids > 2000字节
34+
a pointer to a B-tree of heap pointers (a “posting tree”), /
3535
36-
or a simple list of heap pointers (a “posting list”) when the list is small enough to fit into a single index tuple along with the key value. // 通常指key+ctids < 2000字节
36+
or a simple list of heap pointers (a “posting list”) when the list is small enough to fit into a single index tuple along with the key value.
3737
```
3838

39-
关于GIN的详细介绍,可参考
39+
关于GIN的一些介绍,可参考
4040

4141
[《从难缠的模糊查询聊开 - PostgreSQL独门绝招之一 GIN , GiST , SP-GiST , RUM 索引原理与技术背景》](../201612/20161231_01.md)
4242

@@ -236,6 +236,18 @@ Heavily modified from Alexander Korotkov's fast scan patch.
236236

237237
[《用PostgreSQL找回618秒逝去的青春 - 递归收敛优化》](../201612/20161201_01.md)
238238

239+
## posting list 压缩优化
240+
posting list的压缩优化也是9.4对GIN的优化之一。
241+
242+
## fastupdate, pending list 优化
243+
由于多值类型的变更,插入,可能影响GIN索引的若干个KEY,所以IO巨大,为了减少这种IO,提高数据的写入\变更速度,提出了pending list的结构,类似缓冲区,这部分数据非树结构,可以有效合并IO,使用速度提升非常明显。
244+
245+
但是要注意pending list的存在,使得查询效率有一定的下降,特别是pending list中有大量数据时,使用vacuum可以手动将pending list合并到gin tree中。
246+
247+
或者等pending list写满后触发合并的动作,或者等待autovcauum来合并。
248+
249+
https://www.postgresql.org/docs/9.6/static/gin-tips.html
250+
239251
## 其他
240252
btree_gin
241253

201702/20170204_01.md

Lines changed: 980 additions & 0 deletions
Large diffs are not rendered by default.

201702/20170204_01_pic_001.png

81.4 KB
Loading

201702/20170204_01_pic_002.png

13.1 KB
Loading

201702/20170204_01_pic_003.png

10.3 KB
Loading

201702/20170204_01_pic_004.png

4.72 KB
Loading

201702/20170204_01_pic_005.png

1.94 KB
Loading

201702/20170204_01_pic_006.png

1.82 KB
Loading

201702/20170204_01_pic_007.png

2.81 KB
Loading

201702/20170204_01_pic_008.png

3.02 KB
Loading

0 commit comments

Comments
 (0)