Skip to content

Files

Latest commit

Dec 15, 2023
f400df2 · Dec 15, 2023

History

History
244 lines (166 loc) · 8.82 KB

20210709_01.md

File metadata and controls

244 lines (166 loc) · 8.82 KB

实时营销, 人群圈选推荐业务 性能优化 - memory copy+rb contains计算瓶颈 - rb hash分片

作者

digoal

日期

2021-07-09

标签

PostgreSQL , hash分片 , memory copy , rb_contains , 优化


背景

视频回放: https://www.bilibili.com/video/BV1xh41167uN/

实时营销, 人群圈选推荐业务:

用户数, 十亿级别  
每天刷新用户标签表.     单一标签最大uid个数, 亿级别  
每天重算定向推广规则表.  单一推广包最大uid个数, 亿级别  

用户标签表结构:

标签, uids rb      -- 千行级别,  每个rb里面可能有百万、千万个uid.   

定向推广规则表结构:

包名, uids rb      --  万行级别, 每个rb里面可能有百万、千万个uid.   

每个UID最多上万个定向规则.

1、运营或广告主根据标签组合生成定向规则(存入定向推广规则表), 无性能瓶颈.

2、推广逻辑:
对于某个推广位, 登陆app时, 根据运营(推广位)规则映射表, 找到这个推广位里面的定向规则有哪些, 同时检查哪些定向规则匹配当前用户, 根据匹配到的定向规则往这个广告位填充对应的营销内容.

select 标签 from tbl where 标签 in () and rb @> uid;   

第2步存在性能瓶颈, 因为rb很大(每个RB有几百万甚至上千万UID, RB字段估计几十到几百MB), 目测为判断uid是否在rb内时 memory copy和RB CONTAIN操作符计算的瓶颈.

优化方案

对rb按hash切分, 拆成多条或单条多字段结构.

标签, hash_val_mod, UIDs rb  
select 标签 from tbl where 标签 in () and hash_val_mod=? and rb @> uid;   

在mac book pro上, 优化后切分为128个分片, 性能提升50倍.

DEMO

强制索引扫描

alter role postgres set enable_seqscan=off;  
alter role postgres set enable_bitmapscan=off;  

优化前

create unlogged table test (id int, uids roaringbitmap);  
create index idx_test_1 on test (id);  
  
create or replace function gen_rb(int, int) returns roaringbitmap as $$  
  select rb_build_agg((random()*$1)::int) from generate_series(1,$2);  
$$ language sql strict;  
  
insert into test select generate_series(1,100), gen_rb(1000000000,5000000);  

每行500万个UID, 约10MB.

postgres=# \dt+  
                                    List of relations  
 Schema | Name  | Type  |  Owner   | Persistence | Access method |  Size   | Description   
--------+-------+-------+----------+-------------+---------------+---------+-------------  
 public | test  | table | postgres | unlogged    | heap          | 999 MB  |   
 public | test1 | table | postgres | unlogged    | heap          | 2425 MB |   
(2 rows)  
postgres=# explain select * from test where id in (1,2,3,4,5,6) and uids @> 10;  
                               QUERY PLAN                                 
------------------------------------------------------------------------  
 Index Scan using idx_test_1 on test  (cost=0.14..4.28 rows=1 width=22)  
   Index Cond: (id = ANY ('{1,2,3,4,5,6}'::integer[]))  
   Filter: (uids @> 10)  
(3 rows)  
vi test.sql  
\set v random(1,100)  
\set uid random(1,1000000000)  
select * from test where id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :v+8, :v+9, :v+10) and uids @> :uid::int;    
  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 16 -j 16 -T 120  
  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 16  
number of threads: 16  
duration: 120 s  
number of transactions actually processed: 8138  
latency average = 236.345 ms  
latency stddev = 104.607 ms  
initial connection time = 19.508 ms  
tps = 67.523957 (without initial connection time)  
statement latencies in milliseconds:  
         0.004  \set v random(1,100)  
         0.001  \set uid random(1,1000000000)  
       236.403  select * from test where id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :v+8, :v+9, :v+10) and rb_contains(uids, :ui  

优化后

create unlogged table test1 (id int, hid int, uids roaringbitmap);   
create index idx_test1_1 on test1(hid,id);  
  
create or replace function gen_rb1(int, int) returns table(id int, rb roaringbitmap) as $$  
  select abs(mod(hashint4(uid), 128)) as id, rb_build_agg(uid) as rb from   
    (select (random()*$1)::int uid from generate_series(1,$2)) t   
  group by 1;  
$$ language sql strict;  
  
insert into test1 select id, (gen_rb1(1000000000,5000000)).* from generate_series(1,100) id;  

HASH分片为128后, 每行约4万个UID, 约1.9MB.

postgres=# \dt+  
                                    List of relations  
 Schema | Name  | Type  |  Owner   | Persistence | Access method |  Size   | Description   
--------+-------+-------+----------+-------------+---------------+---------+-------------  
 public | test  | table | postgres | unlogged    | heap          | 999 MB  |   
 public | test1 | table | postgres | unlogged    | heap          | 2425 MB |   
(2 rows)  
postgres=# explain select * from test1 where hid = abs(mod(hashint4(10), 128)) and id in (1,2,3,4,5,6) and uids @> 10;  
                                QUERY PLAN                                   
---------------------------------------------------------------------------  
 Index Scan using idx_test1_1 on test1  (cost=0.28..39.91 rows=1 width=26)  
   Index Cond: ((hid = 57) AND (id = ANY ('{1,2,3,4,5,6}'::integer[])))  
   Filter: (uids @> 10)  
(3 rows)  
vi test1.sql  
\set v random(1,100)  
\set uid random(1,1000000000)  
select * from test1 where hid = abs(mod(hashint4(:uid::int), 128)) and id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :v+8, :v+9, :v+10) and uids @> :uid::int;  
  
pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 16 -j 16 -T 120  
  
  
transaction type: ./test1.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 16  
number of threads: 16  
duration: 120 s  
number of transactions actually processed: 387850  
latency average = 4.955 ms  
latency stddev = 11.985 ms  
initial connection time = 20.415 ms  
tps = 3221.912507 (without initial connection time)  
statement latencies in milliseconds:  
         0.001  \set v random(1,100)  
         0.001  \set uid random(1,1000000000)  
         4.957  select * from test1 where hid = abs(mod(hashint4(:uid::int), 128)) and id in (:v, :v+1, :v+2, :v+3, :v+4, :v+5, :v+6, :v+7, :  

结论

在mac book pro上, 优化后切分为128个分片, 性能提升50倍.

思路与之类似:
《重新发现PostgreSQL之美 - 26 这个推荐算法价值1亿》

《PostgreSQL 推荐系统优化总计 - 空间、时间、标量等混合多模查询场景, 大量已读过滤导致CPU IO剧增(类挖矿概率下降优化)》

《PostgreSQL multipolygon 空间索引查询过滤精简优化 - IO,CPU放大优化》

《PostgreSQL 空间切割(st_split, ST_Subdivide)功能扩展 - 空间对象网格化 (多边形GiST优化)》

《PostgreSQL 空间st_contains,st_within空间包含搜索优化 - 降IO和降CPU(bound box) (多边形GiST优化)》

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat