|
| 1 | +## 菜鸟末端轨迹 面面、点判断 空间索引性能 - 阿里云RDS PostgreSQL最佳实践 |
| 2 | + |
| 3 | +### 作者 |
| 4 | +digoal |
| 5 | + |
| 6 | +### 日期 |
| 7 | +2017-08-03 |
| 8 | + |
| 9 | +### 标签 |
| 10 | +PostgreSQL , PostGIS , 多边形 , 面 , 点 , 面点判断 , 菜鸟 |
| 11 | + |
| 12 | +---- |
| 13 | + |
| 14 | +## 背景 |
| 15 | +菜鸟末端轨迹项目中涉及的一个关键需求,面面判断。 |
| 16 | + |
| 17 | +在数据库中存储了一些多边形记录,约几百万到千万条记录,例如一个小区,在地图上是一个多边形。 |
| 18 | + |
| 19 | +不同的快递公司,会有各自不同的多边形划分方法(网点负责的片区(多边形),某个快递员负责的片区(多边形))。 |
| 20 | + |
| 21 | +用户在寄件时,根据用户的位置,查找对应快递公司负责这个片区的网点、或者负责该片区的快递员。 |
| 22 | + |
| 23 | + |
| 24 | + |
| 25 | +## 一、需求 |
| 26 | +1、在数据库中存储了一些静态的面信息,代表小区、园区、写字楼等等。所有的面不相交。 |
| 27 | + |
| 28 | +2、为了支持不同的业务类型,对一个地图,可能划分为不同的多边形组成。 |
| 29 | + |
| 30 | +例如不同的快递公司,会有各自不同的多边形划分方法(网点负责的片区(多边形),某个快递员负责的片区(多边形))。 |
| 31 | + |
| 32 | +因此在一张地图上,有多个图层,每个图层的多边形划分方法可能不一样。 |
| 33 | + |
| 34 | +3、快速的根据快递公司、客户的位置,求包含这个点的多边形(即得到对应快递公司负责这个片区的网点、或者负责该片区的快递员)。 |
| 35 | + |
| 36 | +## 二、架构设计 |
| 37 | +用到阿里云的RDS PostgreSQL,以及PG提供的PostGIS插件。 |
| 38 | + |
| 39 | +我们需要用到PostGIS的函数有两个 |
| 40 | + |
| 41 | +http://postgis.net/docs/manual-2.3/ST_Within.html |
| 42 | + |
| 43 | +1、ST_within |
| 44 | + |
| 45 | +ST_Within — Returns true if the geometry A is completely inside geometry B |
| 46 | + |
| 47 | +boolean ST_Within(geometry A, geometry B); |
| 48 | + |
| 49 | +Returns TRUE if geometry A is completely inside geometry B. For this function to make sense, the source geometries must both be of the same coordinate projection, having the same SRID. It is a given that if ST_Within(A,B) is true and ST_Within(B,A) is true, then the two geometries are considered spatially equal. |
| 50 | + |
| 51 | +This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries. To avoid index use, use the function _ST_Within. |
| 52 | + |
| 53 | +``` |
| 54 | +-- a circle within a circle |
| 55 | +SELECT ST_Within(smallc,smallc) As smallinsmall, |
| 56 | + ST_Within(smallc, bigc) As smallinbig, |
| 57 | + ST_Within(bigc,smallc) As biginsmall, |
| 58 | + ST_Within(ST_Union(smallc, bigc), bigc) as unioninbig, |
| 59 | + ST_Within(bigc, ST_Union(smallc, bigc)) as biginunion, |
| 60 | + ST_Equals(bigc, ST_Union(smallc, bigc)) as bigisunion |
| 61 | +FROM |
| 62 | +( |
| 63 | +SELECT ST_Buffer(ST_GeomFromText('POINT(50 50)'), 20) As smallc, |
| 64 | + ST_Buffer(ST_GeomFromText('POINT(50 50)'), 40) As bigc) As foo; |
| 65 | +-- Result |
| 66 | + smallinsmall | smallinbig | biginsmall | unioninbig | biginunion | bigisunion |
| 67 | +--------------+------------+------------+------------+------------+------------ |
| 68 | + t | t | f | t | t | t |
| 69 | +(1 row) |
| 70 | +``` |
| 71 | + |
| 72 | +2、ST_Contains |
| 73 | + |
| 74 | +ST_Contains — Returns true if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A. |
| 75 | + |
| 76 | +boolean ST_Contains(geometry geomA, geometry geomB); |
| 77 | + |
| 78 | +Returns TRUE if geometry B is completely inside geometry A. For this function to make sense, the source geometries must both be of the same coordinate projection, having the same SRID. ST_Contains is the inverse of ST_Within. So ST_Contains(A,B) implies ST_Within(B,A) except in the case of invalid geometries where the result is always false regardless or not defined. |
| 79 | + |
| 80 | +This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries. To avoid index use, use the function _ST_Contains. |
| 81 | + |
| 82 | +``` |
| 83 | +-- A circle within a circle |
| 84 | +SELECT ST_Contains(smallc, bigc) As smallcontainsbig, |
| 85 | + ST_Contains(bigc,smallc) As bigcontainssmall, |
| 86 | + ST_Contains(bigc, ST_Union(smallc, bigc)) as bigcontainsunion, |
| 87 | + ST_Equals(bigc, ST_Union(smallc, bigc)) as bigisunion, |
| 88 | + ST_Covers(bigc, ST_ExteriorRing(bigc)) As bigcoversexterior, |
| 89 | + ST_Contains(bigc, ST_ExteriorRing(bigc)) As bigcontainsexterior |
| 90 | +FROM (SELECT ST_Buffer(ST_GeomFromText('POINT(1 2)'), 10) As smallc, |
| 91 | + ST_Buffer(ST_GeomFromText('POINT(1 2)'), 20) As bigc) As foo; |
| 92 | + |
| 93 | +-- Result |
| 94 | + smallcontainsbig | bigcontainssmall | bigcontainsunion | bigisunion | bigcoversexterior | bigcontainsexterior |
| 95 | +------------------+------------------+------------------+------------+-------------------+--------------------- |
| 96 | + f | t | t | t | t | f |
| 97 | + |
| 98 | +-- Example demonstrating difference between contains and contains properly |
| 99 | +SELECT ST_GeometryType(geomA) As geomtype, ST_Contains(geomA,geomA) AS acontainsa, ST_ContainsProperly(geomA, geomA) AS acontainspropa, |
| 100 | + ST_Contains(geomA, ST_Boundary(geomA)) As acontainsba, ST_ContainsProperly(geomA, ST_Boundary(geomA)) As acontainspropba |
| 101 | +FROM (VALUES ( ST_Buffer(ST_Point(1,1), 5,1) ), |
| 102 | + ( ST_MakeLine(ST_Point(1,1), ST_Point(-1,-1) ) ), |
| 103 | + ( ST_Point(1,1) ) |
| 104 | + ) As foo(geomA); |
| 105 | + |
| 106 | + geomtype | acontainsa | acontainspropa | acontainsba | acontainspropba |
| 107 | +--------------+------------+----------------+-------------+----------------- |
| 108 | +ST_Polygon | t | f | f | f |
| 109 | +ST_LineString | t | f | f | f |
| 110 | +ST_Point | t | t | f | f |
| 111 | +``` |
| 112 | + |
| 113 | + |
| 114 | + |
| 115 | + |
| 116 | + |
| 117 | +## 三、DEMO与性能 |
| 118 | +为了简化测试,采样PG内置的几何类型进行测试,用法与PostGIS是类似的。 |
| 119 | + |
| 120 | +1、创建测试表 |
| 121 | + |
| 122 | +``` |
| 123 | +postgres=# create table po(id int, typid int, po polygon); |
| 124 | +CREATE TABLE |
| 125 | +``` |
| 126 | + |
| 127 | +2、创建分区表或分区索引 |
| 128 | + |
| 129 | +``` |
| 130 | +create extension btree_gist; |
| 131 | +create index idx_po_1 on po using gist(typid, po); |
| 132 | +``` |
| 133 | + |
| 134 | +3、创建空间排他约束,可选 |
| 135 | + |
| 136 | +如果要求单个typid内的po不重叠,可以创建空间排他约束 |
| 137 | + |
| 138 | +``` |
| 139 | +create table tbl_po(id int, typid int, po polygon) |
| 140 | +PARTITION BY LIST (typid); |
| 141 | + |
| 142 | +CREATE TABLE tbl_po_1 |
| 143 | + PARTITION OF tbl_po ( |
| 144 | + EXCLUDE USING gist (po WITH &&) |
| 145 | +) FOR VALUES IN (1); |
| 146 | + |
| 147 | +... |
| 148 | + |
| 149 | +CREATE TABLE tbl_po_20 |
| 150 | + PARTITION OF tbl_po ( |
| 151 | + EXCLUDE USING gist (po WITH &&) |
| 152 | +) FOR VALUES IN (20); |
| 153 | + |
| 154 | +查看某分区表的空间排他约束如下 |
| 155 | + |
| 156 | +postgres=# \d tbl_po_1 |
| 157 | + Table "postgres.tbl_po_1" |
| 158 | + Column | Type | Collation | Nullable | Default |
| 159 | +--------+---------+-----------+----------+--------- |
| 160 | + id | integer | | | |
| 161 | + typid | integer | | | |
| 162 | + po | polygon | | | |
| 163 | +Partition of: tbl_po FOR VALUES IN (1) |
| 164 | +Indexes: |
| 165 | + "tbl_po_1_po_excl" EXCLUDE USING gist (po WITH &&) |
| 166 | +``` |
| 167 | + |
| 168 | +4、写入1000万多边形测试数据 |
| 169 | + |
| 170 | +``` |
| 171 | +insert into po select id, random()*20, polygon('(('||x1||','||y1||'),('||x2||','||y2||'),('||x3||','||y3||'))') from (select id, 180-random()*180 x1, 180-random()*180 x2, 180-random()*180 x3, 90-random()*90 y1, 90-random()*90 y2, 90-random()*90 y3 from generate_series(1,10000000) t(id)) t; |
| 172 | +``` |
| 173 | + |
| 174 | +5、测试面点判断性能 |
| 175 | + |
| 176 | +查询包含point(1,1)的多边形,响应时间0.3毫秒。 |
| 177 | + |
| 178 | +``` |
| 179 | +postgres=# explain (analyze,verbose,timing,costs,buffers) select * from po where po @> polygon('((1,1),(1,1),(1,1))') limit 1; |
| 180 | + QUERY PLAN |
| 181 | +------------------------------------------------------------------------------------------------------------------------------------- |
| 182 | + Limit (cost=0.42..1.71 rows=1 width=93) (actual time=0.326..0.326 rows=1 loops=1) |
| 183 | + Output: id, typid, po |
| 184 | + Buffers: shared hit=71 |
| 185 | + -> Index Scan using idx_po_1 on postgres.po (cost=0.42..12940.12 rows=10000 width=93) (actual time=0.325..0.325 rows=1 loops=1) |
| 186 | + Output: id, typid, po |
| 187 | + Index Cond: (po.po @> '((1,1),(1,1),(1,1))'::polygon) |
| 188 | + Rows Removed by Index Recheck: 13 |
| 189 | + Buffers: shared hit=71 |
| 190 | + Planning time: 0.032 ms |
| 191 | + Execution time: 0.338 ms |
| 192 | +(10 rows) |
| 193 | +``` |
| 194 | + |
| 195 | +6、压测 |
| 196 | + |
| 197 | +``` |
| 198 | +vi test.sql |
| 199 | +\set x random(-180,180) |
| 200 | +\set y random(-90,90) |
| 201 | +select * from po where po @> polygon('((:x,:y),(:x,:y),(:x,:y))') limit 1; |
| 202 | + |
| 203 | +pgbench -M simple -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100 |
| 204 | +transaction type: ./test.sql |
| 205 | +scaling factor: 1 |
| 206 | +query mode: simple |
| 207 | +number of clients: 64 |
| 208 | +number of threads: 64 |
| 209 | +duration: 100 s |
| 210 | +number of transactions actually processed: 30106928 |
| 211 | +latency average = 0.213 ms |
| 212 | +latency stddev = 0.159 ms |
| 213 | +tps = 301049.099821 (including connections establishing) |
| 214 | +tps = 301089.965566 (excluding connections establishing) |
| 215 | +script statistics: |
| 216 | + - statement latencies in milliseconds: |
| 217 | + 0.002 \set x random(-180,180) |
| 218 | + 0.001 \set y random(-90,90) |
| 219 | + 0.218 select * from po where po @> polygon('((:x,:y),(:x,:y),(:x,:y))') limit 1; |
| 220 | +``` |
| 221 | + |
| 222 | +**TPS:30万 ,平均响应时间:0.2毫秒** |
| 223 | + |
| 224 | +## 四、技术点 |
| 225 | + |
| 226 | +1、空间排他约束 |
| 227 | + |
| 228 | +这个约束可以用于强制记录中的多边形不相交。例如地图这类严谨数据,绝对不可能出现两个多边形相交的,否则就有领土纷争了。 |
| 229 | + |
| 230 | +PostgreSQL就是这么严谨,意不意外。 |
| 231 | + |
| 232 | +2、分区表 |
| 233 | + |
| 234 | +本例中不同的快递公司,对应不同的图层,每个快递公司根据网点、快递员负责的片区(多边形)划分为多个多边形。 |
| 235 | + |
| 236 | +使用LIST分区,每个分区对应一家快递公司。 |
| 237 | + |
| 238 | +3、空间索引 |
| 239 | + |
| 240 | +GiST空间索引,支持KNN、包含、相交、上下左右等空间搜索。 |
| 241 | + |
| 242 | +效率极高。 |
| 243 | + |
| 244 | +4、空间分区索引 |
| 245 | + |
| 246 | +[《分区索引的应用和实践 - 阿里云RDS PostgreSQL最佳实践》](../201707/20170721_01.md) |
| 247 | + |
| 248 | +5、面面、点判断 |
| 249 | + |
| 250 | +面面判断或面点判断是本例的主要需求,用户在寄包裹时,根据用户位置在数据库的一千万多边形中找出覆盖这个点的多边形。 |
| 251 | + |
| 252 | +## 五、云端产品 |
| 253 | + |
| 254 | +[阿里云 RDS PostgreSQL](https://www.aliyun.com/product/rds/postgresql) |
| 255 | + |
| 256 | +## 六、类似场景、案例 |
| 257 | + |
| 258 | +[《PostgreSQL 物流轨迹系统数据库需求分析与设计 - 包裹侠实时跟踪与召回》](../201704/20170418_01.md) |
| 259 | + |
| 260 | +## 七、小结 |
| 261 | +菜鸟末端轨迹项目中涉及的一个关键需求,面面判断。 |
| 262 | + |
| 263 | +在数据库中存储了一些多边形记录,约几百万到千万条记录,例如一个小区,在地图上是一个多边形。 |
| 264 | + |
| 265 | +不同的快递公司,会有各自不同的多边形划分方法(网点负责的片区(多边形),某个快递员负责的片区(多边形))。 |
| 266 | + |
| 267 | +用户在寄件时,根据用户的位置,查找对应快递公司负责这个片区的网点、或者负责该片区的快递员。 |
| 268 | + |
| 269 | +**使用阿里云RDS PostgreSQL,用户存放约1千万的多边形数据,单库实现了每秒30万的处理请求,单次请求平均响应时间约0.2毫秒。** |
| 270 | + |
| 271 | +惊不惊喜、意不意外。 |
| 272 | + |
| 273 | +## 八、参考 |
| 274 | + |
| 275 | +http://postgis.net/docs/manual-2.3/ST_Within.html |
| 276 | + |
| 277 | +[《分区索引的应用和实践 - 阿里云RDS PostgreSQL最佳实践》](../201707/20170721_01.md) |
0 commit comments