|
| 1 | +## PostgreSQL 生成空间热力图 |
| 2 | + |
| 3 | +### 作者 |
| 4 | +digoal |
| 5 | + |
| 6 | +### 日期 |
| 7 | +2018-07-25 |
| 8 | + |
| 9 | +### 标签 |
| 10 | +PostgreSQL , 热力图 , 空间切割 , 并行计算 , parallel safe |
| 11 | + |
| 12 | +---- |
| 13 | + |
| 14 | +## 背景 |
| 15 | +结合空间数据,计算基于地理位置信息的热力图,在空间数据可视化场景中是一个非常常见的需求。 |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +结合流计算,可以实现实时的热力图计算。 |
| 20 | + |
| 21 | +结合并行计算,可以高效率的对海量数据进行热力图计算。 |
| 22 | + |
| 23 | +## 计算热力图中bucket的方法 |
| 24 | + |
| 25 | +https://www.postgresql.org/docs/devel/static/functions-math.html |
| 26 | + |
| 27 | +``` |
| 28 | +width_bucket(operand dp, b1 dp, b2 dp, count int) |
| 29 | +int |
| 30 | +return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; |
| 31 | +returns 0 or count+1 for an input outside the range |
| 32 | +width_bucket(5.35, 0.024, 10.06, 5) |
| 33 | +3 |
| 34 | + |
| 35 | +width_bucket(operand numeric, b1 numeric, b2 numeric, count int) |
| 36 | +int |
| 37 | +return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; |
| 38 | +returns 0 or count+1 for an input outside the range |
| 39 | +width_bucket(5.35, 0.024, 10.06, 5) |
| 40 | +3 |
| 41 | +``` |
| 42 | + |
| 43 | +例如 |
| 44 | + |
| 45 | +``` |
| 46 | +postgres=# select width_bucket(1,1,10,10); |
| 47 | + width_bucket |
| 48 | +-------------- |
| 49 | + 1 |
| 50 | +(1 row) |
| 51 | + |
| 52 | +postgres=# select width_bucket(0,1,10,10); |
| 53 | + width_bucket |
| 54 | +-------------- |
| 55 | + 0 |
| 56 | +(1 row) |
| 57 | + |
| 58 | +postgres=# select width_bucket(10,1,10,10); |
| 59 | + width_bucket |
| 60 | +-------------- |
| 61 | + 11 |
| 62 | +(1 row) |
| 63 | + |
| 64 | +postgres=# select width_bucket(9.9,1,10,10); |
| 65 | + width_bucket |
| 66 | +-------------- |
| 67 | + 10 |
| 68 | +(1 row) |
| 69 | +``` |
| 70 | + |
| 71 | +``` |
| 72 | +width_bucket( |
| 73 | + p1 -- 输入值 |
| 74 | + p2 -- 边界值(最小,包含) |
| 75 | + p3 -- 边界值(最大,不包含) |
| 76 | + p4 -- 切割份数 |
| 77 | +) |
| 78 | + |
| 79 | +当小于最小边界值时,返回0 |
| 80 | +当大于等于最大边界值时,返回p4+1 |
| 81 | +``` |
| 82 | + |
| 83 | + |
| 84 | +例如x轴的边界是1,10000,y轴的边界是1,10000。 |
| 85 | + |
| 86 | +x,y两个方向分别切割为50个bucket,一共2500个bucket,求一个点落在哪个bucket: |
| 87 | + |
| 88 | +``` |
| 89 | +width_bucket(pos[0], 1, 10001, 50), -- x轴落在哪列bucket |
| 90 | +width_bucket(pos[1], 1, 10001, 50), -- y轴落在哪列bucket |
| 91 | +``` |
| 92 | + |
| 93 | +## 例子 |
| 94 | +1、建表 |
| 95 | + |
| 96 | +``` |
| 97 | +create table tbl_pos( |
| 98 | + id int, |
| 99 | + info text, -- 信息 |
| 100 | + val float8, -- 取值 |
| 101 | + pos point -- 位置 |
| 102 | +); |
| 103 | +``` |
| 104 | + |
| 105 | +2、写入1亿个点 |
| 106 | + |
| 107 | +``` |
| 108 | +vi test.sql |
| 109 | +insert into tbl_pos values ( random()*100000, md5(random()::text), random()*1000, point((random()*10000::int), (random()*10000::int)) ); |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | +pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000 |
| 114 | +``` |
| 115 | + |
| 116 | +3、热力图计算 |
| 117 | + |
| 118 | +强制并行计算 |
| 119 | + |
| 120 | +``` |
| 121 | +postgres=# set min_parallel_table_scan_size =0; |
| 122 | +SET |
| 123 | +postgres=# set min_parallel_index_scan_size =0; |
| 124 | +SET |
| 125 | +postgres=# set parallel_setup_cost =0; |
| 126 | +SET |
| 127 | +postgres=# set parallel_tuple_cost =0; |
| 128 | +SET |
| 129 | +postgres=# set max_parallel_workers_per_gather =28; |
| 130 | +SET |
| 131 | +postgres=# alter table tbl_pos set (parallel_workers =28); |
| 132 | +ALTER TABLE |
| 133 | +``` |
| 134 | + |
| 135 | +热力图计算SQL,效率还不错: |
| 136 | + |
| 137 | +``` |
| 138 | +select |
| 139 | + width_bucket(pos[0], 0, 10001, 50), -- x轴落在哪列bucket |
| 140 | + width_bucket(pos[1], 0, 10001, 50), -- y轴落在哪列bucket |
| 141 | + avg(val), |
| 142 | + min(val), |
| 143 | + max(val), |
| 144 | + stddev(val), |
| 145 | + count(*) |
| 146 | +from tbl_pos |
| 147 | + group by 1,2; |
| 148 | + |
| 149 | + |
| 150 | + QUERY PLAN |
| 151 | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 152 | + Finalize GroupAggregate (cost=1252812.00..1252928.00 rows=200 width=48) (actual time=2632.324..2672.909 rows=2500 loops=1) |
| 153 | + Group Key: (width_bucket(pos[0], '0'::double precision, '10001'::double precision, 50)), (width_bucket(pos[1], '0'::double precision, '10001'::double precision, 50)) |
| 154 | + -> Sort (cost=1252812.00..1252826.00 rows=5600 width=96) (actual time=2632.290..2648.544 rows=72500 loops=1) |
| 155 | + Sort Key: (width_bucket(pos[0], '0'::double precision, '10001'::double precision, 50)), (width_bucket(pos[1], '0'::double precision, '10001'::double precision, 50)) |
| 156 | + Sort Method: external merge Disk: 9824kB |
| 157 | + -> Gather (cost=1252460.37..1252463.37 rows=5600 width=96) (actual time=2532.132..2564.905 rows=72500 loops=1) |
| 158 | + Workers Planned: 28 |
| 159 | + Workers Launched: 28 |
| 160 | + -> Partial HashAggregate (cost=1252460.37..1252463.37 rows=200 width=96) (actual time=2522.428..2523.559 rows=2500 loops=29) |
| 161 | + Group Key: width_bucket(pos[0], '0'::double precision, '10001'::double precision, 50), width_bucket(pos[1], '0'::double precision, '10001'::double precision, 50) |
| 162 | + -> Parallel Seq Scan on tbl_pos (cost=0.00..1189951.79 rows=3571919 width=16) (actual time=0.030..1302.462 rows=3448276 loops=29) |
| 163 | + Planning time: 0.154 ms |
| 164 | + Execution time: 2676.288 ms |
| 165 | +(13 rows) |
| 166 | +``` |
| 167 | + |
| 168 | +样本 |
| 169 | + |
| 170 | +``` |
| 171 | +postgres=# select |
| 172 | + width_bucket(pos[0], 0, 10001, 10), -- x轴落在哪列bucket |
| 173 | + width_bucket(pos[1], 0, 10001, 10), -- y轴落在哪列bucket |
| 174 | + avg(val), |
| 175 | + min(val), |
| 176 | + max(val), |
| 177 | + stddev(val), |
| 178 | + count(*) |
| 179 | +from tbl_pos |
| 180 | + group by 1,2; |
| 181 | + width_bucket | width_bucket | avg | min | max | stddev | count |
| 182 | +--------------+--------------+------------------+----------------------+------------------+------------------+--------- |
| 183 | + 1 | 1 | 499.638668709335 | 0.000637955963611603 | 999.998900108039 | 288.562996477433 | 1002686 |
| 184 | + 1 | 2 | 499.772206697849 | 0.00113388523459435 | 999.999452847987 | 288.505295714968 | 1000891 |
| 185 | + 1 | 3 | 500.44455454312 | 0.00135181471705437 | 999.997937120497 | 288.45102360668 | 999911 |
| 186 | + 1 | 4 | 500.234164866407 | 0.00214902684092522 | 999.999100342393 | 288.707167816157 | 1000473 |
| 187 | + 1 | 5 | 499.793710464008 | 0.000125262886285782 | 999.999575316906 | 288.672382834812 | 999036 |
| 188 | + 1 | 6 | 500.366854944369 | 0.00212574377655983 | 999.999585561454 | 288.558891852102 | 998866 |
| 189 | + 1 | 7 | 499.825623783545 | 0.000547617673873901 | 999.999700114131 | 288.582317248892 | 1000902 |
| 190 | + 1 | 8 | 499.393569281915 | 0.00330200418829918 | 999.999083112925 | 288.561094278074 | 1000193 |
| 191 | + 1 | 9 | 499.713056248083 | 0.00243959948420525 | 999.999618623406 | 288.709997455837 | 1000017 |
| 192 | + 1 | 10 | 500.312448499828 | 0.00238511711359024 | 999.999850522727 | 288.865560266629 | 998469 |
| 193 | + 2 | 1 | 499.848655048635 | 0.00146497040987015 | 999.999508261681 | 288.639402346948 | 1000917 |
| 194 | + 2 | 2 | 500.084846394446 | 0.0005294568836689 | 999.999178107828 | 288.704696698903 | 997594 |
| 195 | + 2 | 3 | 499.99258346144 | 0.00163912773132324 | 999.99839020893 | 288.507497234907 | 1001310 |
| 196 | + 2 | 4 | 499.817295558208 | 0.00184541568160057 | 999.997940845788 | 288.767308817191 | 1000607 |
| 197 | + 2 | 5 | 499.87314410326 | 0.00135786831378937 | 999.999302905053 | 288.593077096809 | 998588 |
| 198 | + 2 | 6 | 499.825467223571 | 0.000847037881612778 | 999.998526647687 | 288.789326889728 | 1000426 |
| 199 | + 2 | 7 | 499.50907809986 | 7.4971467256546e-05 | 999.9989871867 | 288.535982009648 | 1001179 |
| 200 | + 2 | 8 | 499.850422744194 | 0.000966247171163559 | 999.999921303242 | 288.516738657089 | 1000745 |
| 201 | + 2 | 9 | 500.110417044655 | 0.000320374965667725 | 999.999660998583 | 288.77420504779 | 999978 |
| 202 | + 2 | 10 | 500.135548004555 | 0.000233296304941177 | 999.999852851033 | 288.520964728395 | 998363 |
| 203 | +........ |
| 204 | +``` |
| 205 | + |
| 206 | +取出数据,即可渲染。 |
| 207 | + |
| 208 | +结合流计算,可以在FEED数据写入时,实时的进行计算。而不是QUERY发起时计算。参考本文末尾的文档。 |
| 209 | + |
| 210 | +## 小结 |
| 211 | +PostgreSQL非常适合于时空数据的分析,包括本文提到的热力图分析。 |
| 212 | + |
| 213 | +使用并行计算,即查即算,1亿个点,差不多耗时2.7秒。 |
| 214 | + |
| 215 | +如果使用流式计算,写入时即算,查询时查的是结果,效率更高。 |
| 216 | + |
| 217 | +## 参考 |
| 218 | + |
| 219 | +1、求bucket值 |
| 220 | + |
| 221 | +https://www.postgresql.org/docs/devel/static/functions-math.html |
| 222 | + |
| 223 | +``` |
| 224 | +width_bucket(operand dp, b1 dp, b2 dp, count int) |
| 225 | +int |
| 226 | +return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; |
| 227 | +returns 0 or count+1 for an input outside the range |
| 228 | +width_bucket(5.35, 0.024, 10.06, 5) |
| 229 | +3 |
| 230 | + |
| 231 | +width_bucket(operand numeric, b1 numeric, b2 numeric, count int) |
| 232 | +int |
| 233 | +return the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; |
| 234 | +returns 0 or count+1 for an input outside the range |
| 235 | +width_bucket(5.35, 0.024, 10.06, 5) |
| 236 | +3 |
| 237 | +``` |
| 238 | + |
| 239 | +2、求geometry对象的x,y,z值 |
| 240 | + |
| 241 | +http://postgis.net/docs/manual-2.4/reference.html |
| 242 | + |
| 243 | +``` |
| 244 | +ST_X — Return the X coordinate of the point, or NULL if not available. Input must be a point. |
| 245 | +ST_XMax — Returns X maxima of a bounding box 2d or 3d or a geometry. |
| 246 | +ST_XMin — Returns X minima of a bounding box 2d or 3d or a geometry. |
| 247 | +ST_Y — Return the Y coordinate of the point, or NULL if not available. Input must be a point. |
| 248 | +ST_YMax — Returns Y maxima of a bounding box 2d or 3d or a geometry. |
| 249 | +ST_YMin — Returns Y minima of a bounding box 2d or 3d or a geometry. |
| 250 | +ST_Z — Return the Z coordinate of the point, or NULL if not available. Input must be a point. |
| 251 | +ST_ZMax — Returns Z minima of a bounding box 2d or 3d or a geometry. |
| 252 | +ST_Zmflag — Returns ZM (dimension semantic) flag of the geometries as a small int. Values are: 0=2d, 1=3dm, 2=3dz, 3=4d. |
| 253 | +ST_ZMin — Returns Z minima of a bounding box 2d or 3d or a geometry. |
| 254 | +``` |
| 255 | + |
| 256 | +3、求point对象的x,y值 |
| 257 | + |
| 258 | +``` |
| 259 | +point[0] |
| 260 | + |
| 261 | +point[1] |
| 262 | +``` |
| 263 | + |
| 264 | +4、PostgreSQL 并行计算 |
| 265 | + |
| 266 | +[《PostgreSQL 11 preview - 并行计算 增强 汇总》](../201805/20180519_02.md) |
| 267 | + |
| 268 | +[《PostgreSQL 单表并行bulkload的extend file lock 冲突问题解决》](../201805/20180515_03.md) |
| 269 | + |
| 270 | +[《阿里云RDS PostgreSQL OSS 外部表实践 - (dblink异步调用封装并行) 从OSS并行导入数据》](../201804/20180427_01.md) |
| 271 | + |
| 272 | +[《PostgreSQL 变态并行拉取单表的方法 - 按块并行(按行号(ctid)并行) + dblink 异步调用》](../201804/20180410_03.md) |
| 273 | + |
| 274 | +[《PostgreSQL 11 preview - 多阶段并行聚合array_agg, string_agg》](../201803/20180322_11.md) |
| 275 | + |
| 276 | +[《PostgreSQL 11 preview - 分区表智能并行聚合、分组计算(已类似MPP架构,性能暴增)》](../201803/20180322_07.md) |
| 277 | + |
| 278 | +[《PostgreSQL Oracle 兼容性之 - 自定义并行聚合函数 PARALLEL_ENABLE AGGREGATE》](../201803/20180312_03.md) |
| 279 | + |
| 280 | +[《PostgreSQL VOPS 向量计算 + DBLINK异步并行 - 单实例 10亿 聚合计算跑进2秒》](../201802/20180210_01.md) |
| 281 | + |
| 282 | +[《PostgreSQL 相似搜索分布式架构设计与实践 - dblink异步调用与多机并行(远程 游标+记录 UDF实例)》](../201802/20180205_03.md) |
| 283 | + |
| 284 | +[《PostgreSQL 11 preview - Parallel Append(包括 union all\分区查询) (多表并行计算) sharding架构并行计算核心功能之一》](../201802/20180204_03.md) |
| 285 | + |
| 286 | +[《PostgreSQL 11 preview - 并行排序、并行索引 (性能线性暴增) 单实例100亿TOP-K仅40秒》](../201802/20180204_01.md) |
| 287 | + |
| 288 | +[《PostgreSQL 11 preview - 分区表智能并行JOIN (已类似MPP架构,性能暴增)》](../201802/20180202_02.md) |
| 289 | + |
| 290 | +[《PostgreSQL dblink异步调用实现 并行hash分片JOIN - 含数据交、并、差 提速案例 - 含dblink VS pg 11 parallel hash join VS pg 11 智能分区JOIN》](../201802/20180201_02.md) |
| 291 | + |
| 292 | +[《PostgreSQL 10 自定义并行计算聚合函数的原理与实践 - (含array_agg合并多个数组为单个一元数组的例子)》](../201801/20180119_04.md) |
| 293 | + |
| 294 | +[《惊天性能!单RDS PostgreSQL实例 支撑 2000亿 - 实时标签透视案例 (含dblink异步并行调用)》](../201712/20171223_01.md) |
| 295 | + |
| 296 | +[《HTAP数据库 PostgreSQL 场景与性能测试之 23 - (OLAP) 并行计算》](../201711/20171107_24.md) |
| 297 | + |
| 298 | +[《PostgreSQL 如何让 列存(外部列存) 并行起来》](../201710/20171014_01.md) |
| 299 | + |
| 300 | +[《阿里云RDS PostgreSQL OSS 外部表实践 - (dblink异步调用封装并行) 数据并行导出到OSS》](../201709/20170906_01.md) |
| 301 | + |
| 302 | +[《PostgreSQL 并行写入堆表,如何保证时序线性存储 - BRIN索引优化》](../201706/20170611_02.md) |
| 303 | + |
| 304 | +[《PostgreSQL 10.0 preview 功能增强 - 逻辑复制支持并行COPY初始化数据》](../201703/20170328_01.md) |
| 305 | + |
| 306 | +[《PostgreSQL 10.0 preview 多核并行增强 - 控制集群并行度》](../201703/20170313_12.md) |
| 307 | + |
| 308 | +[《PostgreSQL 10.0 preview 多核并行增强 - 索引扫描、子查询、VACUUM、fdw/csp钩子》](../201703/20170313_08.md) |
| 309 | + |
| 310 | +[《PostgreSQL 10.0 preview sharding增强 - 支持Append节点并行》](../201703/20170312_11.md) |
| 311 | + |
| 312 | +[《PostgreSQL 10.0 preview 多核并行增强 - 并行hash join支持shared hashdata, 节约哈希表内存提高效率》](../201703/20170312_08.md) |
| 313 | + |
| 314 | +[《PostgreSQL 10.0 preview sharding增强 - postgres_fdw 多节点异步并行执行》](../201703/20170312_07.md) |
| 315 | + |
| 316 | +[《PostgreSQL 10.0 preview 多核并行增强 - tuplesort 多核并行创建索引》](../201703/20170312_05.md) |
| 317 | + |
| 318 | +[《PostGIS 地理信息、栅格数据 多核并行处理(st_memunion, st_union)》](../201703/20170307_03.md) |
| 319 | + |
| 320 | +[《分析加速引擎黑科技 - LLVM、列存、多核并行、算子复用 大联姻 - 一起来开启PostgreSQL的百宝箱》](../201612/20161216_01.md) |
| 321 | + |
| 322 | +[《PostgreSQL 并行计算tpc-h测试和优化分析》](../201611/20161108_01.md) |
| 323 | + |
| 324 | +[《PostgreSQL 9.6 并行计算 在 xfs, ext4 下的表现对比》](../201610/20161002_02.md) |
| 325 | + |
| 326 | +[《PostgreSQL 9.6 并行计算 优化器算法浅析 - 以及如何强制并行度》](../201610/20161002_01.md) |
| 327 | + |
| 328 | +[《PostgreSQL 9.6 引领开源数据库攻克多核并行计算难题》](../201610/20161001_01.md) |
| 329 | + |
| 330 | +[《PostgreSQL 9.6 并行计算 优化器算法浅析》](../201608/20160816_02.md) |
| 331 | + |
| 332 | +5、流计算 |
| 333 | + |
| 334 | +[《PostgreSQL 流式处理应用实践 - 二手商品实时归类(异步消息notify/listen、阅后即焚)》](../201807/20180713_03.md) |
| 335 | + |
| 336 | +[《PostgreSQL 流计算插件pipelinedb sharding 集群版原理介绍 - 一个全功能的分布式流计算引擎》](../201803/20180314_04.md) |
| 337 | + |
| 338 | +[《[未完待续] PostgreSQL 流式fft傅里叶变换 (plpython + numpy + 数据库流式计算)》](../201803/20180307_01.md) |
| 339 | + |
| 340 | +[《PostgreSQL count-min sketch top-n 概率计算插件 cms_topn (结合窗口实现同比、环比、滑窗分析等) - 流计算核心功能之一》](../201803/20180301_03.md) |
| 341 | + |
| 342 | +[《阿里云RDS PostgreSQL varbitx实践 - 流式标签 (阅后即焚流式批量计算) - 万亿级,任意标签圈人,毫秒响应》](../201712/20171212_01.md) |
| 343 | + |
| 344 | +[《HTAP数据库 PostgreSQL 场景与性能测试之 32 - (OLTP) 高吞吐数据进出(堆存、行扫、无需索引) - 阅后即焚(JSON + 函数流式计算)》](../201711/20171107_33.md) |
| 345 | + |
| 346 | +[《HTAP数据库 PostgreSQL 场景与性能测试之 31 - (OLTP) 高吞吐数据进出(堆存、行扫、无需索引) - 阅后即焚(读写大吞吐并测)》](../201711/20171107_32.md) |
| 347 | + |
| 348 | +[《HTAP数据库 PostgreSQL 场景与性能测试之 27 - (OLTP) 物联网 - FEED日志, 流式处理 与 阅后即焚 (CTE)》](../201711/20171107_28.md) |
| 349 | + |
| 350 | +[《[未完待续] 流式机器学习(online machine learning) - pipelineDB with plR and plPython》](../201711/20171107_01.md) |
| 351 | + |
| 352 | +[《[未完待续] PostgreSQL 分布式流式计算 最佳实践》](../201710/20171030_01.md) |
| 353 | + |
| 354 | +[《打造云端流计算、在线业务、数据分析的业务数据闭环 - 阿里云RDS、HybridDB for PostgreSQL最佳实践》](../201707/20170728_01.md) |
| 355 | + |
| 356 | +[《PostgreSQL 10.0 preview 性能增强 - libpq支持pipeline batch模式减少网络交互提升性能》](../201703/20170312_15.md) |
| 357 | + |
| 358 | +[《流计算风云再起 - PostgreSQL携PipelineDB力挺IoT》](../201612/20161220_01.md) |
| 359 | + |
| 360 | +[《在PostgreSQL中实现update | delete limit - CTID扫描实践 (高效阅后即焚)》](../201608/20160827_01.md) |
| 361 | + |
| 362 | +[《PostgreSQL Oracle 兼容性之 - PL/SQL pipelined》](../201603/20160318_01.md) |
| 363 | + |
| 364 | +[《PostgreSQL 流计算指 pipelinedb 实时处理Kafka消息流》](../201510/20151021_02.md) |
| 365 | + |
| 366 | + |
| 367 | +<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a> |
| 368 | + |
0 commit comments