@@ -268,79 +268,97 @@ pipeline=# select * from cv_obj;
268268
2692693\. 热点文章ID范围
270270
271- 总共2亿文章,使用高斯分布进行LIKE,95%的文章ID分布在钟鼎为中心的2 .0/xx这个区间内,67% 分布在1.0/xx这个区间。 横坐标越靠近鈡的顶端的值,产生的概率越高。xx越小,鈡越尖,也就是说高频值越少。
271+ 总共2亿文章,使用高斯分布进行LIKE,分布在以钟鼎为中心的2 .0/xx这个区间内的文章ID,覆盖了95%的出现概率。 分布在1.0/xx这个区间的文章ID覆盖了67%的出现概率。
272272
273- 假设有2.7万高频文章,分布在95%的区间,那么XX=14900。
273+ 横坐标越靠近鈡的顶端的值(即文章ID=1亿),产生的概率越高。
274+
275+ xx越小,鈡越尖,也就是说热点文章越少。
274276
275277原理参考
276278
277279[ 《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》] ( .../201506/20150618_01.md )
278-
280+
279281![ pic] ( ../201506/20150618_01_pic_001.png )
280282
2812834\. 随机用户喜欢随机文章
282284
2832855\. 随机用户喜欢热点文章
284286
285287### 首先根据以上要求生成基础数据
286- 压测脚本,LIKE文章,100万热点文章,使用高斯分布产生
288+ 压测脚本,LIKE文章,使用高斯分布产生文章ID,经过长时间的压测,文章被LIKE的次数呈现高斯分布,钟鼎的文章被LIKE的次数最多。
289+
290+ xx设置为10.0,表示以钟鼎为中心的20%这个区间内的文章ID,覆盖了95%的出现概率。分布在10%这个区间的文章ID覆盖了67%的出现概率。
291+
292+ xx越大,钟鼎的文章ID概率越高。
287293
288294```
289295vi test.sql
290296\setrandom uid 1 100000000
291- \setrandom id 1 200000000 gaussian 14900 .0
297+ \setrandom id 1 200000000 gaussian 10 .0
292298select f_obj(:id,:uid);
293299```
294300
295- 256个连接进行压测,测试结果,每秒产生17.9万次LIKE请求 。
301+ 256个连接进行压测,测试结果,每秒产生17.7万次LIKE请求 。
296302
297303```
298304pgbench -M prepared -n -r -P 1 -f ./test.sql -c 256 -j 256 -T 120
299305
300- transaction type: Custom query
301- scaling factor: 1
302- query mode: prepared
303- number of clients: 256
304- number of threads: 256
305- duration: 120 s
306- number of transactions actually processed: 21500685
307- latency average: 1.427 ms
308- latency stddev: 1.204 ms
309- tps = 179035.949606 (including connections establishing)
310- tps = 179047.297058 (excluding connections establishing)
311- statement latencies in milliseconds:
312- 0.002314 \setrandom uid 1 100000000
313- 0.002261 \setrandom id 1 200000000 gaussian 14900.0
314- 1.422216 select f_obj(:id,:uid);
306+ transaction type: Custom query
307+ scaling factor: 1
308+ query mode: prepared
309+ number of clients: 256
310+ number of threads: 256
311+ duration: 120 s
312+ number of transactions actually processed: 21331348
313+ latency average: 1.438 ms
314+ latency stddev: 0.591 ms
315+ tps = 177652.080934 (including connections establishing)
316+ tps = 177665.827969 (excluding connections establishing)
317+ statement latencies in milliseconds:
318+ 0.002267 \setrandom uid 1 100000000
319+ 0.002384 \setrandom id 1 200000000 gaussian 10.0
320+ 1.433405 select f_obj(:id,:uid);
315321```
316322
317323阶段性压测后文章数
318324
319325```
320- pipeline=# select count(*) from cv_obj;
321- count
322- ----------
323- 27612942
324- (1 row)
326+ pipeline=# select count(*) from cv_obj;
327+ count
328+ ----------
329+ 86842876
330+ (1 row)
325331
326332-- 查询钟鼎附近的词被LIKE的次数
327333
328334pipeline=# select like_cnt from cv_obj where id=100000000;
329335 like_cnt
330336----------
331- 15060
337+ 18317
332338(1 row)
333339
334340pipeline=# select like_cnt from cv_obj where id=100000001;
335341 like_cnt
336342----------
337- 14927
343+ 18410
338344(1 row)
339345
340346pipeline=# select like_cnt from cv_obj where id=100000002;
341347 like_cnt
342348----------
343- 15156
349+ 18566
350+ (1 row)
351+
352+ pipeline=# select like_cnt from cv_obj where id=100000000-1;
353+ like_cnt
354+ ----------
355+ 18380
356+ (1 row)
357+
358+ pipeline=# select like_cnt from cv_obj where id=100000000-2;
359+ like_cnt
360+ ----------
361+ 18399
344362(1 row)
345363
346364鈡的底部边缘被LIKE就很少
@@ -353,7 +371,7 @@ pipeline=# select * from cv_obj where id>199999990;
353371
354372符合预期,继续压测。(或者我们也可以选择指数分布进行测试)
355373
356- 暂时没有进行优化 ,CPU使用情况如下
374+ 暂时没有进行优化的情况下 ,CPU使用情况如下
357375
358376```
359377Cpu(s): 35.2%us, 17.4%sy, 13.8%ni, 33.2%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st
@@ -367,8 +385,6 @@ Cpu(s): 35.2%us, 17.4%sy, 13.8%ni, 33.2%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st
367385
368386持续压测like,产生2亿文章的LIKE数据,然后进入测试2。
369387
370- 或者随机生成2亿LIKE数据,根据场景提到的LIKE次数分布。另外还需要随机生成关系数据,根据场景提到的关注分布。
371-
372388### 生成用户关系数据
3733891\. 用户ID范围
374390
@@ -451,65 +467,105 @@ pipeline=# select count(*) from user_like_agg ;
451467
4524683\. 查询LIKE某文章的用户中,哪些是我的好友?
453469
454- 压测脚本1, 查询文章被谁like?查询文章被like了多少次?
470+ 压测脚本1, 查询文章被谁like?
455471
456472```
457473vi test1.sql
458474\setrandom id 1 200000000
459- select who_like,like_cnt from cv_obj where id=:id;
475+ select who_like from cv_obj where id=:id;
460476
461477pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 128 -j 128 -T 120
462478```
463479
464- 压测脚本2, 查询LIKE某文章的用户中,哪些是我的好友?
480+ 压测脚本2, 查询文章被like了多少次?
481+
482+ ```
483+ vi test2.sql
484+ \setrandom id 1 200000000
485+ select like_cnt from cv_obj where id=:id;
486+
487+ pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 128 -j 128 -T 120
488+ ```
489+
490+ 压测脚本3, 查询LIKE某文章的用户中,哪些是我的好友?
465491
466492```
467- vi test2 .sql
493+ vi test3 .sql
468494\setrandom id 1 200000000
469495\setrandom uid 1 100000000
470496select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;
471497
472- pgbench -M prepared -n -r -P 1 -f ./test2 .sql -c 128 -j 128 -T 120
498+ pgbench -M prepared -n -r -P 1 -f ./test3 .sql -c 128 -j 128 -T 120
473499```
474500
475- 压测结果1,基于对象ID的PK查询, 达到 104万 /s 并不意外。
501+ 压测结果1,查询文章被谁like? 达到 101万 /s 并不意外。
476502
477503```
478- transaction type: Custom query
479- scaling factor: 1
480- query mode: prepared
481- number of clients: 128
482- number of threads: 128
483- duration: 120 s
484- number of transactions actually processed: 125251141
485- latency average: 0.122 ms
486- latency stddev: 0.210 ms
487- tps = 1043643.576926 (including connections establishing)
488- tps = 1043716.991815 (excluding connections establishing)
489- statement latencies in milliseconds:
490- 0.001711 \setrandom id 1 1000000000
491- 0.119755 select who_like,like_cnt from cv_obj where id=:id;
504+ transaction type: Custom query
505+ scaling factor: 1
506+ query mode: prepared
507+ number of clients: 128
508+ number of threads: 128
509+ duration: 120 s
510+ number of transactions actually processed: 121935264
511+ latency average: 0.125 ms
512+ latency stddev: 0.203 ms
513+ tps = 1016035.198013 (including connections establishing)
514+ tps = 1016243.580731 (excluding connections establishing)
515+ statement latencies in milliseconds:
516+ 0.001589 \setrandom id 1 1000000000
517+ 0.123249 select who_like from cv_obj where id=:id;
492518```
493519
494- 压测结果2,查询LIKE某文的用户中,哪些是我的好友?82.2万 /s。
520+ 压测结果2,查询文章被like了多少次? 104万 /s。
495521
496522```
497- transaction type: Custom query
498- scaling factor: 1
499- query mode: prepared
500- number of clients: 128
501- number of threads: 128
502- duration: 120 s
503- number of transactions actually processed: 98735109
504- latency average: 0.155 ms
505- latency stddev: 2.237 ms
506- tps = 822678.853360 (including connections establishing)
507- tps = 822803.996869 (excluding connections establishing)
508- statement latencies in milliseconds:
509- 0.001786 \setrandom id 1 1000000000
510- 0.000748 \setrandom uid 1 100000000
511- 0.151807 select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;
523+ transaction type: Custom query
524+ scaling factor: 1
525+ query mode: prepared
526+ number of clients: 128
527+ number of threads: 128
528+ duration: 120 s
529+ number of transactions actually processed: 124966713
530+ latency average: 0.122 ms
531+ latency stddev: 0.204 ms
532+ tps = 1041268.730790 (including connections establishing)
533+ tps = 1041479.852625 (excluding connections establishing)
534+ statement latencies in milliseconds:
535+ 0.001708 \setrandom id 1 1000000000
536+ 0.120069 select like_cnt from cv_obj where id=:id;
512537```
538+
539+ 压测结果3,查询LIKE某文的用户中,哪些是我的好友? 64.8万/s。
540+
541+ ```
542+ transaction type: Custom query
543+ scaling factor: 1
544+ query mode: prepared
545+ number of clients: 128
546+ number of threads: 128
547+ duration: 120 s
548+ number of transactions actually processed: 77802915
549+ latency average: 0.196 ms
550+ latency stddev: 1.649 ms
551+ tps = 648273.025370 (including connections establishing)
552+ tps = 648368.477278 (excluding connections establishing)
553+ statement latencies in milliseconds:
554+ 0.001719 \setrandom id 1 1000000000
555+ 0.000695 \setrandom uid 1 100000000
556+ 0.193728 select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;
557+ ```
558+
559+ ## 优化思路
560+ 1\. 数组越长,一条记录占用的空间会越大,使用TOAST切片存储,可以有效的提高查询非数组字段的效率。
561+
562+ ```
563+ 例如
564+
565+ alter table cv_obj alter column who_like set (storage=extended);
566+ ```
567+
568+ 2\. profiling,针对性的优化。
513569
514570## 小结
515571微博、facebook最常用的操作:
@@ -548,19 +604,23 @@ statement latencies in milliseconds:
548604
5496051\. 关注微博(文章)
550606
551- 17.9万 /s,预计可以优化到30万以上 。
607+ 17.7万 /s,预计可以优化到30万 。
552608
553- 2\. 查询文章被谁like?查询文章被like了多少次?
609+ 2\. 查询文章被谁like?
554610
555- 104.3万/s
611+ 101.6万/s
612+
613+ 3\. 查询文章被like了多少次?
556614
557- 3 \. 查询LIKE某文章的用户中,哪些是我的好友?
615+ 104.1万/s
558616
559- 82.2万/s
617+ 4 \. 查询LIKE某文章的用户中,哪些是我的好友?
560618
561- ![ pic ] ( 20170512_02_pic_003.jpg )
619+ 64.8万/s
562620
563- 机器:
621+ ![ pic] ( 20170512_02_pic_003.jpg )
622+
623+ 5\. 机器:
564624
565625(10W左右价位的X86,12* 8TB SATA盘,1块SSD作为BCACHE)
566626
@@ -574,4 +634,4 @@ statement latencies in milliseconds:
574634
575635[ 《PostgreSQL on Linux 最佳部署手册》] ( ../201611/20161121_01.md )
576636
577- [ 《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》] ( .../201506/20150618_01.md )
637+ [ 《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》] ( .../201506/20150618_01.md )
0 commit comments