digoal
2023-09-08
PostgreSQL , PolarDB , 数据库 , 教学
欢迎数据库应用开发者参与贡献场景, 在此issue回复即可, 共同建设《沉浸式数据库学习教学素材库》, 帮助开发者用好数据库, 提升开发者职业竞争力, 同时为企业降本提效.
- 系列课程的核心目标是教大家怎么用好数据库, 而不是怎么运维管理数据库、怎么开发数据库内核. 所以面向的对象是数据库的用户、应用开发者、应用架构师、数据库厂商的产品经理、售前售后专家等角色.
本文的实验可以使用永久免费的阿里云云起实验室来完成.
如果你本地有docker环境也可以把镜像拉到本地来做实验:
x86_64机器使用以下docker image:
ARM机器使用以下docker image:
在企业ERP软件、网站中经常会有一些让用户输入筛选条件(或者勾选筛选条件)的地方, 一个页面可能出现若干个选项, 每个选项用户可以进行勾选或下拉框选择.
例如淘宝网, 发货地是哪里, 商品价格范围, 商品类目, 内存大小, .... 很多选项提供选择.
分析业务场景, 经常会使用大宽表来表示对象的特征, 每个字段代表一个特征维度, 然后通过各个字段的组合条件来进行数据的统计分析.
营销场景, 和前面分析场景类似, 通过各个字段的组合条件圈选目标用户.
通常一个选项代表一个对象的某一个属性, 也就是用户可能根据任意组合条件进行筛选, 本实验主要目标是学习如何快速的在任意字段组合的条件输入下, 搜索到满足条件的数据.
1、创建1个测试大宽表, 1个PK, 1000个字段, 500个字符串, 500个int类型.
do language plpgsql $$
declare
sql text := '';
begin
drop table if exists tbl;
sql := 'create unlogged table tbl(id int primary key,';
for i in 1..500 loop
sql := sql || 'c' || i || ' text,';
end loop;
for i in 501..1000 loop
sql := sql || 'c' || i || ' int,';
end loop;
sql := rtrim(sql,',');
sql := sql||')';
execute sql;
end;
$$;
2、写入测试数据 10 万条, 每个字段的值随机生成.
do language plpgsql $$
declare
sql text := '';
begin
sql := 'insert into tbl select generate_series(1,100000), ';
for i in 1..500 loop
sql := sql || 'round((random()*100)::numeric, 2)::text,' ;
end loop;
for i in 501..1000 loop
sql := sql || '(random()*100)::int,' ;
end loop;
sql := rtrim(sql,',');
execute sql;
end;
$$;
1、每个字段创建1个索引
do language plpgsql $$
declare
sql text := '';
begin
for i in 1..1000 loop
sql := 'create index on tbl (c' || i || ')';
execute sql;
end loop;
end;
$$;
vacuum analyze tbl;
2、使用任意条件筛选记录
select id from tbl where c10='11.11' and c501=1 and c700=99 and c800=80 and c900=90 and c1000=1;
postgres=# explain analyze select id from tbl where c10='11.11' and c501=1 and c700=99 and c800=80 and c900=90 and c1000=1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=10.54..11.66 rows=1 width=4) (actual time=0.204..0.205 rows=0 loops=1)
Recheck Cond: ((c10 = '11.11'::text) AND (c1000 = 1))
Filter: ((c501 = 1) AND (c700 = 99) AND (c800 = 80) AND (c900 = 90))
-> BitmapAnd (cost=10.54..10.54 rows=1 width=0) (actual time=0.201..0.202 rows=0 loops=1)
-> Bitmap Index Scan on tbl_c10_idx (cost=0.00..1.47 rows=10 width=0) (actual time=0.017..0.018 rows=14 loops=1)
Index Cond: (c10 = '11.11'::text)
-> Bitmap Index Scan on tbl_c1000_idx (cost=0.00..8.82 rows=990 width=0) (actual time=0.180..0.181 rows=978 loops=1)
Index Cond: (c1000 = 1)
Planning Time: 23.427 ms
Execution Time: 0.252 ms
(10 rows)
编写测试函数, 自动生成查询语句, 第一个参数表示要查询几个字符串字段, 第二个参数表示要查询几个INT字段.
create or replace function get_uid(int, int) returns setof int as $$
declare
sql text := '';
begin
for i in 1..$1 loop
sql := sql || 'c' || ceil(random()*500)::int || ' = ''' || round((random()*100)::numeric, 2)::text || ''' and ';
end loop;
for i in 1..$2 loop
sql := sql || 'c' || 500 + ceil(random()*500)::int || ' = ' || (random()*100)::int || ' and ';
end loop;
sql := 'select id from tbl where ' || rtrim(sql, 'and ');
-- raise notice '%', sql;
return query execute sql;
end;
$$ language plpgsql strict;
-- 组合查询5个字符串字段条件, 10个int字段条件.
select * from get_uid(5, 10);
postgres=# select * from get_uid(5,10);
NOTICE: select id from tbl where c285 = '94.00' and c176 = '21.69' and c201 = '61.58' and c248 = '82.78' and c238 = '57.56' and c611 = 53 and c929 = 78 and c531 = 11 and c780 = 53 and c590 = 88 and c996 = 21 and c549 = 1 and c801 = 44 and c748 = 42 and c918 = 46
get_uid
---------
(0 rows)
postgres=# select * from get_uid(5,10);
NOTICE: select id from tbl where c72 = '31.83' and c6 = '61.87' and c401 = '98.84' and c311 = '37.58' and c59 = '4.44' and c782 = 18 and c609 = 17 and c581 = 64 and c896 = 38 and c986 = 91 and c967 = 88 and c898 = 92 and c640 = 47 and c899 = 14 and c935 = 44
get_uid
---------
(0 rows)
3、查询性能压测
vi t1.sql
select * from get_uid(5, 10);
pgbench -M prepared -n -r -P 1 -f ./t1.sql -c 10 -j 10 -T 120
transaction type: ./t1.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 98063
latency average = 12.235 ms
latency stddev = 5.583 ms
initial connection time = 21.713 ms
tps = 817.238399 (without initial connection time)
statement latencies in milliseconds:
12.235 select * from get_uid(5, 10);
4、表和空间占用情况
postgres=# select pg_size_pretty(pg_table_size('tbl'::regclass));
pg_size_pretty
----------------
782 MB
(1 row)
postgres=# select pg_size_pretty(pg_indexes_size('tbl'::regclass));
pg_size_pretty
----------------
812 MB
(1 row)
PolarDB|PG 支持 bloom index, 在bloom index中的字段可以任意字段组合进行筛选.
先删除已有1000列的索引.
do language plpgsql $$
declare
begin
for i in 1..1000 loop
execute 'drop index tbl_c' || i || '_idx';
end loop;
end;
$$;
1、创建1000列的bloom index, 每个bloom index最多只能包含32列.
https://www.postgresql.org/docs/16/bloom.html
do language plpgsql $$
declare
sql text;
begin
execute 'create extension if not exists bloom';
for x in 1 .. (1000/32 + 1) loop
if x <> (1000/32 + 1) then
sql := 'create index on tbl using bloom (';
for i in ((x-1)*32+1) .. (x*32) loop
sql := sql || 'c' || i || ',';
end loop;
sql := rtrim(sql,',') || ')';
execute sql;
else
sql := 'create index on tbl using bloom (';
for i in ((x-1)*32+1) .. ((x-1)*32) + mod(1000,32) loop
sql := sql || 'c' || i || ',';
end loop;
sql := rtrim(sql,',') || ')';
execute sql;
end if;
end loop;
end;
$$;
vacuum analyze tbl;
2、使用任意条件筛选记录
select id from tbl where c10='11.11' and c501=1 and c700=99 and c800=80 and c900=90 and c1000=1;
postgres=# explain analyze select id from tbl where c10='11.11' and c501=1 and c700=99 and c800=80 and c900=90 and c1000=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=967.80..979.04 rows=1 width=4) (actual time=168.529..168.530 rows=0 loops=1)
Recheck Cond: (c10 = '11.11'::text)
Rows Removed by Index Recheck: 30779
Filter: ((c501 = 1) AND (c1000 = 1) AND (c700 = 99) AND (c800 = 80) AND (c900 = 90))
Rows Removed by Filter: 14
Heap Blocks: exact=30793
-> Bitmap Index Scan on tbl_c1_c2_c3_c4_c5_c6_c7_c8_c9_c10_c11_c12_c13_c14_c15_c16__idx (cost=0.00..967.80 rows=10 width=0) (actual time=16.038..16.038 rows=30793 loops=1)
Index Cond: (c10 = '11.11'::text)
Planning Time: 1.040 ms
Execution Time: 168.615 ms
(10 rows)
-- 组合查询5个字符串字段条件, 10个int字段条件.
select * from get_uid(5, 10);
postgres=# select * from get_uid(5,10);
get_uid
---------
(0 rows)
Time: 4.318 ms
postgres=# select * from get_uid(5,10);
get_uid
---------
(0 rows)
Time: 198.422 ms
postgres=# select * from get_uid(5,10);
get_uid
---------
(0 rows)
Time: 5.622 ms
3、查询性能压测
vi t1.sql
select * from get_uid(5, 10);
pgbench -M prepared -n -r -P 1 -f ./t1.sql -c 10 -j 10 -T 120
transaction type: ./t1.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 4650
latency average = 258.305 ms
latency stddev = 107.361 ms
initial connection time = 23.889 ms
tps = 38.697368 (without initial connection time)
statement latencies in milliseconds:
258.315 select * from get_uid(5, 10);
4、表和空间占用情况
postgres=# select pg_size_pretty(pg_table_size('tbl'::regclass));
pg_size_pretty
----------------
782 MB
(1 row)
postgres=# select pg_size_pretty(pg_indexes_size('tbl'::regclass));
pg_size_pretty
----------------
52 MB
(1 row)
PolarDB|PG 还支持数组类型, 由于我们这个实验要的是任意字段组合筛选, 可以把所有字段的值组合起来放到一个数组里面. 查询是使用数组的包含、相交计算得到结果.
PS:
- 如果字段很长, 可以考虑使用文本组合后的hashvalue作为数组的元素.
- hashvalue []
1、创建一张新的标签表
drop table if exists tbl1;
create unlogged table tbl1 (
id int primary key,
tags text[] -- 用数组表示1000个字段的标签值(如果原始字段内容特别长, 可以用hashvalue表示原始值, 从而降低存储空间.).
);
2、将原始1000个字段的表组合后写入到新的标签表.
do language plpgsql $$
declare
sql text := 'insert into tbl1 select id, array[';
begin
for i in 1..1000 loop
sql := sql || '''c' || i || '_'' || ' || 'c' || i || ',' ; -- 字段名_值 作为标签元素
end loop;
sql := rtrim(sql, ',') || '] from tbl';
execute sql;
end;
$$;
postgres=# select * from tbl1 limit 1;
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 1
tags | {c1_37.60,c2_84.41,c3_76.07,c4_96.78,c5_27.19,c6_78.18,c7_45.04,c8_45.72,c9_7.73,c10_18.37,c11_48.12,c12_62.91,c13_73.46,c14_37.79,c15_3.79,c16_55.38,c17_68.77,c18_35.94,c19_71.77,c20_0.94,c21_55.30,c22_15.09,c23_30.26,c24_42.45,c25_36.35,c26_28.48,c27_13.03,c28_31.33,c29_54.01,c30_90.54,c31_96.98,c32_47.14,c33_74.71,c34_93.53,c35_79.06,c36_92.60,c37_40.74,c38_15.57,c39_11.12,c40_91.00,c41_88.95,c42_94.40,c43_73.12,c44_83.46,c45_7.08,c46_9.04,c47_55.70,c48_14.95,c49_3.14,c50_39.39,c51_9.91,c52_1.65,c53_29.39,c54_19.60,c55_66.05,c56_17.97,c57_9.66,c58_24.26,c59_58.94,c60_25.86,c61_33.37,c62_12.66,c63_50.02,c64_72.93,c65_72.54,c66_44.18,c67_56.70,c68_10.01,c69_55.54,c70_80.24,c71_78.99,c72_57.40,c73_55.52,c74_21.91,c75_17.23,c76_84.85,c77_69.09,c78_38.74,c79_60.59,c80_52.37,c81_24.16,c82_18.13,c83_74.99,c84_91.11,c85_53.06,c86_75.59,c87_54.02,c88_30.04,c89_40.68,c90_86.54,c91_14.05,c92_0.25,c93_35.67,c94_28.88,c95_4.02,c96_72.56,c97_29.92,c98_11.59,c99_73.13,c100_14.34,c101_35.60,c102_35.98,c103_50.47,c104_38.60,c105_45.34,c106_19.85,c107_67.55,c108_74.99,c109_57.91,c110_66.72,c111_17.36,c112_21.40,c113_5.22,c114_87.46,c115_35.60,c116_52.42,c117_75.76,c118_64.95,c119_96.89,c120_22.94,c121_0.78,c122_75.48,c123_99.93,c124_30.99,c125_43.02,c126_57.93,c127_22.35,c128_2.51,c129_0.34,c130_12.60,c131_32.14,c132_54.56,c133_38.50,c134_57.47,c135_67.00,c136_44.46,c137_86.85,c138_10.47,c139_22.95,c140_46.56,c141_38.28,c142_68.81,c143_15.09,c144_30.42,c145_69.29,c146_40.56,c147_89.49,c148_73.24,c149_48.64,c150_43.95,c151_25.85,c152_93.73,c153_3.22,c154_23.65,c155_96.39,c156_75.81,c157_9.80,c158_64.14,c159_51.96,c160_80.48,c161_40.05,c162_82.98,c163_27.81,c164_57.60,c165_97.73,c166_13.58,c167_79.33,c168_40.64,c169_0.70,c170_90.48,c171_24.14,c172_75.43,c173_9.99,c174_10.54,c175_27.35,c176_25.21,c177_34.83,c178_91.67,c179_88.65,c180_51.24,c181_20.29,c182_95.45,c183_85.32,c184_71.06,c185_84.01,c186_29.37,c187_77.10,c188_89.29,c189_26.24,c190_36.31,c191_64.02,c192_13.59,c193_14.36,c194_99.00,c195_6.48,c196_57.40,c197_94.43,c198_1.04,c199_9.68,c200_79.27,c201_22.21,c202_58.98,c203_57.54,c204_66.13,c205_28.59,c206_63.66,c207_61.70,c208_75.12,c209_44.93,c210_88.45,c211_0.69,c212_59.30,c213_47.05,c214_56.53,c215_35.17,c216_10.17,c217_28.35,c218_31.59,c219_44.23,c220_10.84,c221_58.72,c222_54.51,c223_74.67,c224_55.09,c225_12.08,c226_93.03,c227_90.07,c228_8.72,c229_68.63,c230_78.50,c231_56.62,c232_70.63,c233_94.39,c234_51.91,c235_82.97,c236_94.70,c237_28.03,c238_35.68,c239_14.12,c240_12.86,c241_29.68,c242_43.42,c243_56.60,c244_20.45,c245_67.66,c246_71.48,c247_20.56,c248_53.80,c249_92.91,c250_16.49,c251_11.75,c252_74.90,c253_17.25,c254_94.71,c255_95.77,c256_39.67,c257_18.29,c258_13.39,c259_10.71,c260_35.76,c261_19.90,c262_50.96,c263_92.37,c264_45.87,c265_81.85,c266_33.20,c267_35.20,c268_98.80,c269_95.28,c270_62.16,c271_78.66,c272_77.73,c273_51.68,c274_34.61,c275_34.75,c276_42.58,c277_92.42,c278_20.22,c279_59.37,c280_9.13,c281_73.67,c282_5.69,c283_87.63,c284_93.38,c285_99.72,c286_21.40,c287_65.78,c288_47.18,c289_95.19,c290_78.22,c291_6.72,c292_65.46,c293_47.53,c294_54.66,c295_2.81,c296_60.69,c297_55.21,c298_92.57,c299_49.26,c300_65.05,c301_46.82,c302_12.05,c303_62.06,c304_12.45,c305_84.85,c306_25.28,c307_31.28,c308_62.96,c309_56.80,c310_72.05,c311_97.20,c312_90.81,c313_69.57,c314_76.08,c315_4.89,c316_60.47,c317_63.22,c318_17.95,c319_10.75,c320_44.96,c321_21.19,c322_73.59,c323_29.85,c324_95.67,c325_80.04,c326_92.25,c327_19.99,c328_90.12,c329_96.39,c330_99.94,c331_59.70,c332_63.63,c333_57.24,c334_55.74,c335_53.52,c336_63.41,c337_76.31,c338_52.85,c339_66.84,c340_52.45,c341_89.51,c342_14.87,c343_79.93,c344_79.86,c345_91.83,c346_61.32,c347_51.42,c348_29.58,c349_72.26,c350_5.01,c351_85.32,c352_58.64,c353_38.29,c354_45.73,c355_13.96,c356_48.01,c357_44.29,c358_68.46,c359_97.26,c360_64.19,c361_74.52,c362_66.49,c363_23.52,c364_80.23,c365_89.63,c366_46.15,c367_99.49,c368_34.49,c369_41.60,c370_36.89,c371_79.57,c372_29.55,c373_44.99,c374_9.71,c375_93.89,c376_48.77,c377_68.95,c378_85.02,c379_48.67,c380_12.65,c381_73.94,c382_96.74,c383_43.85,c384_21.41,c385_57.48,c386_37.34,c387_54.93,c388_55.51,c389_23.88,c390_76.12,c391_76.93,c392_94.78,c393_42.51,c394_62.36,c395_17.08,c396_95.15,c397_77.99,c398_62.66,c399_33.59,c400_44.40,c401_42.98,c402_56.34,c403_54.48,c404_34.15,c405_50.39,c406_20.74,c407_23.55,c408_2.44,c409_97.47,c410_75.42,c411_29.15,c412_51.66,c413_50.14,c414_44.89,c415_91.56,c416_51.93,c417_72.55,c418_18.14,c419_51.93,c420_35.04,c421_57.19,c422_82.50,c423_0.58,c424_43.31,c425_4.44,c426_41.01,c427_56.75,c428_59.71,c429_11.66,c430_67.89,c431_11.61,c432_37.53,c433_1.06,c434_45.81,c435_39.17,c436_11.12,c437_16.95,c438_70.54,c439_76.89,c440_85.26,c441_88.35,c442_11.89,c443_62.78,c444_12.44,c445_96.56,c446_20.83,c447_65.87,c448_10.62,c449_31.95,c450_59.45,c451_60.05,c452_94.49,c453_24.68,c454_66.18,c455_11.88,c456_79.59,c457_3.54,c458_20.19,c459_93.36,c460_20.00,c461_71.57,c462_27.41,c463_1.49,c464_98.28,c465_89.51,c466_61.01,c467_95.81,c468_53.58,c469_65.93,c470_70.93,c471_28.22,c472_4.13,c473_59.12,c474_52.95,c475_82.03,c476_50.77,c477_78.78,c478_85.48,c479_97.67,c480_84.56,c481_18.93,c482_3.01,c483_4.23,c484_24.73,c485_75.80,c486_54.20,c487_2.03,c488_81.63,c489_80.96,c490_69.57,c491_30.59,c492_37.76,c493_68.55,c494_42.37,c495_78.54,c496_18.97,c497_63.07,c498_63.40,c499_39.48,c500_56.01,c501_19,c502_79,c503_29,c504_18,c505_22,c506_96,c507_56,c508_1,c509_15,c510_97,c511_3,c512_51,c513_90,c514_74,c515_4,c516_26,c517_45,c518_10,c519_48,c520_26,c521_55,c522_55,c523_17,c524_59,c525_34,c526_45,c527_76,c528_71,c529_69,c530_74,c531_13,c532_27,c533_2,c534_29,c535_34,c536_41,c537_16,c538_18,c539_63,c540_35,c541_37,c542_7,c543_54,c544_69,c545_21,c546_88,c547_63,c548_21,c549_9,c550_20,c551_92,c552_22,c553_72,c554_89,c555_63,c556_78,c557_10,c558_7,c559_38,c560_31,c561_90,c562_46,c563_47,c564_12,c565_18,c566_88,c567_2,c568_17,c569_39,c570_51,c571_12,c572_48,c573_76,c574_92,c575_80,c576_49,c577_12,c578_1,c579_60,c580_97,c581_28,c582_33,c583_16,c584_29,c585_76,c586_40,c587_27,c588_52,c589_97,c590_10,c591_35,c592_6,c593_75,c594_7,c595_60,c596_76,c597_23,c598_91,c599_1,c600_3,c601_20,c602_32,c603_65,c604_12,c605_26,c606_80,c607_52,c608_55,c609_1,c610_58,c611_16,c612_44,c613_37,c614_2,c615_59,c616_28,c617_51,c618_44,c619_36,c620_11,c621_97,c622_32,c623_75,c624_20,c625_95,c626_19,c627_38,c628_95,c629_85,c630_94,c631_97,c632_25,c633_91,c634_95,c635_2,c636_52,c637_36,c638_2,c639_92,c640_78,c641_34,c642_79,c643_61,c644_11,c645_12,c646_66,c647_19,c648_70,c649_83,c650_78,c651_66,c652_13,c653_13,c654_92,c655_73,c656_96,c657_93,c658_13,c659_57,c660_8,c661_35,c662_13,c663_47,c664_47,c665_3,c666_33,c667_96,c668_97,c669_78,c670_47,c671_62,c672_43,c673_45,c674_45,c675_66,c676_66,c677_90,c678_45,c679_12,c680_74,c681_94,c682_33,c683_56,c684_1,c685_48,c686_10,c687_31,c688_95,c689_57,c690_33,c691_90,c692_78,c693_5,c694_76,c695_67,c696_4,c697_38,c698_3,c699_21,c700_54,c701_40,c702_42,c703_95,c704_97,c705_92,c706_24,c707_86,c708_75,c709_69,c710_81,c711_9,c712_16,c713_94,c714_55,c715_95,c716_3,c717_58,c718_32,c719_65,c720_61,c721_81,c722_55,c723_48,c724_18,c725_61,c726_64,c727_16,c728_54,c729_68,c730_3,c731_76,c732_47,c733_98,c734_28,c735_84,c736_15,c737_7,c738_93,c739_66,c740_74,c741_51,c742_74,c743_28,c744_54,c745_5,c746_80,c747_28,c748_33,c749_5,c750_64,c751_45,c752_16,c753_58,c754_31,c755_8,c756_13,c757_12,c758_63,c759_45,c760_97,c761_5,c762_87,c763_46,c764_88,c765_16,c766_68,c767_28,c768_1,c769_95,c770_6,c771_55,c772_76,c773_75,c774_74,c775_6,c776_49,c777_8,c778_34,c779_41,c780_36,c781_64,c782_71,c783_80,c784_56,c785_62,c786_87,c787_83,c788_22,c789_33,c790_3,c791_70,c792_6,c793_5,c794_59,c795_65,c796_16,c797_21,c798_50,c799_32,c800_97,c801_47,c802_52,c803_85,c804_55,c805_84,c806_74,c807_26,c808_24,c809_60,c810_36,c811_65,c812_29,c813_27,c814_88,c815_19,c816_9,c817_48,c818_57,c819_67,c820_70,c821_46,c822_19,c823_1,c824_27,c825_60,c826_100,c827_29,c828_52,c829_5,c830_20,c831_57,c832_91,c833_14,c834_62,c835_58,c836_78,c837_94,c838_51,c839_27,c840_14,c841_24,c842_59,c843_60,c844_99,c845_71,c846_96,c847_97,c848_91,c849_57,c850_73,c851_28,c852_12,c853_20,c854_82,c855_3,c856_96,c857_37,c858_72,c859_7,c860_39,c861_4,c862_5,c863_39,c864_39,c865_80,c866_4,c867_98,c868_6,c869_86,c870_6,c871_61,c872_60,c873_85,c874_99,c875_16,c876_83,c877_10,c878_7,c879_98,c880_11,c881_33,c882_80,c883_82,c884_3,c885_37,c886_64,c887_13,c888_17,c889_19,c890_64,c891_1,c892_96,c893_6,c894_84,c895_43,c896_72,c897_6,c898_27,c899_9,c900_45,c901_100,c902_58,c903_45,c904_91,c905_70,c906_94,c907_34,c908_11,c909_16,c910_16,c911_58,c912_91,c913_38,c914_22,c915_99,c916_31,c917_68,c918_72,c919_83,c920_61,c921_82,c922_5,c923_6,c924_43,c925_35,c926_27,c927_92,c928_30,c929_68,c930_100,c931_0,c932_69,c933_63,c934_34,c935_49,c936_49,c937_51,c938_61,c939_23,c940_31,c941_29,c942_63,c943_90,c944_31,c945_90,c946_42,c947_7,c948_49,c949_16,c950_10,c951_81,c952_29,c953_77,c954_25,c955_15,c956_53,c957_57,c958_94,c959_36,c960_74,c961_62,c962_88,c963_20,c964_90,c965_10,c966_18,c967_81,c968_0,c969_65,c970_79,c971_83,c972_95,c973_31,c974_60,c975_47,c976_75,c977_95,c978_54,c979_18,c980_82,c981_75,c982_16,c983_19,c984_33,c985_48,c986_82,c987_26,c988_74,c989_72,c990_1,c991_87,c992_28,c993_19,c994_34,c995_48,c996_98,c997_38,c998_88,c999_13,c1000_98}
3、在数组字段上创建倒排索引: gin index
create index on tbl1 using gin (tags);
vacuum analyze tbl1;
4、创建任意字段搜索查询函数, 模拟数组包含查询
create or replace function get_uid1(int, int) returns setof int as $$
declare
sql text := 'array[';
begin
for i in 1..$1 loop
sql := sql || '''c' || ceil(random()*500)::int || '_' || round((random()*100)::numeric, 2)::text || ''',';
end loop;
for i in 1..$2 loop
sql := sql || '''c' || 500 + ceil(random()*500)::int || '_' || (random()*100)::int || ''',';
end loop;
sql := 'select id from tbl1 where tags @> ' || rtrim(sql, ',') || ']';
-- raise notice '%', sql;
return query execute sql;
end;
$$ language plpgsql strict;
-- 组合查询5个字符串字段条件, 10个int字段条件.
select * from get_uid1(5, 10);
postgres=# select * from get_uid1(5, 10);
NOTICE: select id from tbl1 where tags @> array['c153_68.20','c336_2.45','c400_50.07','c197_48.06','c182_90.43','c960_36','c925_75','c525_53','c611_73','c828_21','c896_16','c864_1','c659_97','c666_86','c639_78']
get_uid1
----------
(0 rows)
Time: 4.368 ms
postgres=# select * from get_uid1(5, 10);
NOTICE: select id from tbl1 where tags @> array['c465_24.84','c500_92.92','c135_35.36','c61_63.15','c438_13.23','c969_96','c996_27','c602_5','c535_22','c720_10','c616_40','c619_28','c511_23','c856_77','c538_73']
get_uid1
----------
(0 rows)
Time: 4.324 ms
postgres=# explain analyze select id from tbl1 where tags @> array['c153_68.20','c336_2.45','c400_50.07','c197_48.06','c182_90.43','c960_36','c925_75','c525_53','c611_73','c828_21','c896_16','c864_1','c659_97','c666_86','c639_78'];
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl1 (cost=83.60..84.71 rows=1 width=4) (actual time=0.863..0.865 rows=0 loops=1)
Recheck Cond: (tags @> '{c153_68.20,c336_2.45,c400_50.07,c197_48.06,c182_90.43,c960_36,c925_75,c525_53,c611_73,c828_21,c896_16,c864_1,c659_97,c666_86,c639_78}'::text[])
-> Bitmap Index Scan on tbl1_tags_idx (cost=0.00..83.60 rows=1 width=0) (actual time=0.856..0.857 rows=0 loops=1)
Index Cond: (tags @> '{c153_68.20,c336_2.45,c400_50.07,c197_48.06,c182_90.43,c960_36,c925_75,c525_53,c611_73,c828_21,c896_16,c864_1,c659_97,c666_86,c639_78}'::text[])
Planning Time: 0.340 ms
Execution Time: 0.948 ms
(6 rows)
5、查询性能压测
vi t3.sql
select * from get_uid1(5, 10);
pgbench -M prepared -n -r -P 1 -f ./t3.sql -c 10 -j 10 -T 120
transaction type: ./t3.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 3078754
latency average = 0.390 ms
latency stddev = 0.165 ms
initial connection time = 24.890 ms
tps = 25661.456683 (without initial connection time)
statement latencies in milliseconds:
0.390 select * from get_uid1(5, 10);
6、表和空间占用情况
postgres=# select pg_size_pretty(pg_table_size('tbl1'::regclass));
pg_size_pretty
----------------
795 MB
(1 row)
postgres=# select pg_size_pretty(pg_indexes_size('tbl1'::regclass));
pg_size_pretty
----------------
564 MB
(1 row)
在传统方法中针对任意字段组合过滤的场景, 使用了1000列全部都建上索引的方法来加速, 但是实际上在圈选或者分析场景中, 筛选的记录数是非常庞大的, IO消耗非凸显出来. 同时1000列索引还会带来写入降速严重的问题.
为了解决这个问题, PolarDB|PG 可以通过 duckdb_fdw 来支持parquet列存储, 带来的好处是存储压缩比例高节省空间、列向量计算提升查询性能、不需要创建索引不影响高速导入.
1、将数据库中的数据先导出到csv文件中, 准备转换为parquet文件.
su - postgres
cd ~
copy tbl to '/var/lib/postgresql/db1.csv' with (format csv , header true);
2、在duckdb中创建表
vi ddl.sql
do language plpgsql $$
declare
sql text := '';
begin
sql := 'create table tbl(id int primary key,';
for i in 1..500 loop
sql := sql || 'c' || i || ' text,';
end loop;
for i in 501..1000 loop
sql := sql || 'c' || i || ' int,';
end loop;
sql := rtrim(sql,',');
sql := sql || ');' ;
raise notice '%', sql;
end;
$$;
psql -f ddl.sql 2>&1 | grep NOTICE | awk -F 'NOTICE:' '{print $2}' | ./duckdb /var/lib/postgresql/db1.duckdb
3、将csv文件转换到parquet列存储格式, 并创建查询视图
./duckdb /var/lib/postgresql/db1.duckdb
D insert into tbl select * from '/var/lib/postgresql/db1.csv';
D copy tbl to '/var/lib/postgresql/db1.parquet' (FORMAT PARQUET);
D create view v_tbl as SELECT * FROM read_parquet('/var/lib/postgresql/db1.parquet');
D select count(*) from v_tbl;
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 100000 │
└──────────────┘
4、在PolarDB|PG中, 创建duckdb_fdw插件, 并将duckdb中的表作为外部表查询.
create extension if not exists duckdb_fdw;
CREATE SERVER DuckDB_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS (database '/var/lib/postgresql/db1.duckdb');
create schema duckdb;
IMPORT FOREIGN SCHEMA public FROM SERVER DuckDB_server INTO duckdb;
postgres=# \det duckdb.*
List of foreign tables
Schema | Table | Server
--------+-------+---------------
duckdb | tbl | duckdb_server
duckdb | v_tbl | duckdb_server
(2 rows)
postgres=# select count(*) from duckdb.v_tbl;
count
--------
100000
(1 row)
Time: 20.730 ms
postgres=# explain verbose select count(*) from duckdb.v_tbl;
QUERY PLAN
------------------------------------------------
Foreign Scan (cost=1.00..1.00 rows=1 width=8)
Output: (count(*))
SQLite query: SELECT count(*) FROM v_tbl
(3 rows)
Time: 7.842 ms
5、创建查询duckdb parquet的压测函数. 压测方法与前面的类似, 若干个字符串字段, 若干个整型字段.
create or replace function get_uid2(int, int) returns setof int8 as $$
declare
sql text := '';
begin
for i in 1..$1 loop
sql := sql || 'c' || ceil(random()*500)::int || ' = ''' || round((random()*100)::numeric, 2)::text || ''' and ';
end loop;
for i in 1..$2 loop
sql := sql || 'c' || 500 + ceil(random()*500)::int || ' = ' || (random()*100)::int || ' and ';
end loop;
sql := 'select id from duckdb.v_tbl where ' || rtrim(sql, 'and ');
-- raise notice '%', sql;
return query execute sql;
end;
$$ language plpgsql strict;
-- 组合查询5个字符串字段条件, 10个int字段条件.
select * from get_uid2(5, 10);
postgres=# select * from get_uid2(5, 10);
NOTICE: select id from duckdb.v_tbl where c386 = '92.82' and c370 = '3.55' and c110 = '33.45' and c60 = '10.28' and c184 = '39.87' and c520 = 5 and c662 = 71 and c586 = 14 and c777 = 68 and c617 = 25 and c843 = 95 and c611 = 79 and c775 = 48 and c598 = 92 and c899 = 74
get_uid2
----------
(0 rows)
Time: 47.058 ms
postgres=# select * from get_uid2(5, 10);
NOTICE: select id from duckdb.v_tbl where c100 = '93.50' and c62 = '91.11' and c441 = '72.54' and c6 = '50.63' and c349 = '65.93' and c925 = 29 and c959 = 44 and c518 = 79 and c781 = 78 and c649 = 46 and c961 = 16 and c857 = 17 and c664 = 73 and c950 = 56 and c811 = 4
get_uid2
----------
(0 rows)
Time: 47.597 ms
6、查询性能压测
vi t4.sql
select * from get_uid2(5, 10);
pgbench -M prepared -n -r -P 1 -f ./t4.sql -c 1 -j 1 -T 120
transaction type: ./t4.sql
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 120 s
number of transactions actually processed: 9003
latency average = 13.329 ms
latency stddev = 1.255 ms
initial connection time = 8.840 ms
tps = 75.021592 (without initial connection time)
statement latencies in milliseconds:
13.329 select * from get_uid2(5, 10);
7、表和空间占用情况
postgres@abaf67f643b7:~$ ll /var/lib/postgresql/db1.parquet
-rw-r--r-- 1 postgres postgres 262M Sep 8 09:07 /var/lib/postgresql/db1.parquet
10万数据量, 1000字段, 任意组合搜索.
方法 | 单次15个组合条件请求响应速度 (毫秒) | 处理吞吐 tps | 表占有空间 | 索引占有空间 |
---|---|---|---|---|
传统方法1000字段 + 1000 btree索引 | 12.235 | 817 | 782 MB | 812 MB |
PolarDB|PG bloom 索引 | 258.315 | 39 | 782 MB | 52 MB |
PolarDB|PG 数组 + gin索引 | 0.390 | 25661 | 795 MB | 564 MB |
PolarDB|PG + duckdb parquet 列存储 + 无索引 | 13.329 | 75 | 262 MB | 0 |
1 hashvalue
2 bloom index
3 array
4 gin index
5 向量特征, 向量距离, 向量索引?
6 bitmap index scan
1 本实验提到的都是等值或不等于的查询要求, 如果有范围查询怎么办? 阶梯化? 转化为有限空间内的值? 最终还是模拟等值查询!!!
2 向量特征?
4 列存储?
5 为什么会选择多个索引使用bitmap index方法?
6 and查询用数组怎么表达? @>
postgres=# \do+ @>
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+--------------------------------+-------------
pg_catalog | @> | anyarray | anyarray | boolean | arraycontains | contains
7 or查询在数组中如何表达? &&
postgres=# \do+ &&
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+--------------------------------+----------------------------------
pg_catalog | && | anyarray | anyarray | boolean | arrayoverlap | overlaps
8 如果原始字段内容特别长, 可以用hashvalue表示原始值, 从而降低存储空间.
9 duckdb_fdw 的duckdb server只能打开1次, 不能多个连接同时打开, 未来可能有改进空间. 否则每次初始化duckdb非常麻烦. 留个作业, 有兴趣的同学可以改进duckdb_fdw, 例如以RO模式同时打开, 或者支持更方便的克隆功能?