Skip to content

Commit 76887da

Browse files
committed
fix
1 parent f5bef5f commit 76887da

File tree

2 files changed

+136
-76
lines changed

2 files changed

+136
-76
lines changed

201705/20170512_02.md

Lines changed: 136 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -268,79 +268,97 @@ pipeline=# select * from cv_obj;
268268

269269
3\. 热点文章ID范围
270270

271-
总共2亿文章,使用高斯分布进行LIKE,95%的文章ID分布在钟鼎为中心的2.0/xx这个区间内,67%分布在1.0/xx这个区间。 横坐标越靠近鈡的顶端的值,产生的概率越高。xx越小,鈡越尖,也就是说高频值越少。
271+
总共2亿文章,使用高斯分布进行LIKE,分布在以钟鼎为中心的2.0/xx这个区间内的文章ID,覆盖了95%的出现概率。分布在1.0/xx这个区间的文章ID覆盖了67%的出现概率。
272272

273-
假设有2.7万高频文章,分布在95%的区间,那么XX=14900。
273+
横坐标越靠近鈡的顶端的值(即文章ID=1亿),产生的概率越高。
274+
275+
xx越小,鈡越尖,也就是说热点文章越少。
274276

275277
原理参考
276278

277279
[《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》](.../201506/20150618_01.md)
278-
280+
279281
![pic](../201506/20150618_01_pic_001.png)
280282

281283
4\. 随机用户喜欢随机文章
282284

283285
5\. 随机用户喜欢热点文章
284286

285287
### 首先根据以上要求生成基础数据
286-
压测脚本,LIKE文章,100万热点文章,使用高斯分布产生
288+
压测脚本,LIKE文章,使用高斯分布产生文章ID,经过长时间的压测,文章被LIKE的次数呈现高斯分布,钟鼎的文章被LIKE的次数最多。
289+
290+
xx设置为10.0,表示以钟鼎为中心的20%这个区间内的文章ID,覆盖了95%的出现概率。分布在10%这个区间的文章ID覆盖了67%的出现概率。
291+
292+
xx越大,钟鼎的文章ID概率越高。
287293

288294
```
289295
vi test.sql
290296
\setrandom uid 1 100000000
291-
\setrandom id 1 200000000 gaussian 14900.0
297+
\setrandom id 1 200000000 gaussian 10.0
292298
select f_obj(:id,:uid);
293299
```
294300

295-
256个连接进行压测,测试结果,每秒产生17.9万次LIKE请求
301+
256个连接进行压测,测试结果,每秒产生17.7万次LIKE请求
296302

297303
```
298304
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 256 -j 256 -T 120
299305
300-
transaction type: Custom query
301-
scaling factor: 1
302-
query mode: prepared
303-
number of clients: 256
304-
number of threads: 256
305-
duration: 120 s
306-
number of transactions actually processed: 21500685
307-
latency average: 1.427 ms
308-
latency stddev: 1.204 ms
309-
tps = 179035.949606 (including connections establishing)
310-
tps = 179047.297058 (excluding connections establishing)
311-
statement latencies in milliseconds:
312-
0.002314 \setrandom uid 1 100000000
313-
0.002261 \setrandom id 1 200000000 gaussian 14900.0
314-
1.422216 select f_obj(:id,:uid);
306+
transaction type: Custom query
307+
scaling factor: 1
308+
query mode: prepared
309+
number of clients: 256
310+
number of threads: 256
311+
duration: 120 s
312+
number of transactions actually processed: 21331348
313+
latency average: 1.438 ms
314+
latency stddev: 0.591 ms
315+
tps = 177652.080934 (including connections establishing)
316+
tps = 177665.827969 (excluding connections establishing)
317+
statement latencies in milliseconds:
318+
0.002267 \setrandom uid 1 100000000
319+
0.002384 \setrandom id 1 200000000 gaussian 10.0
320+
1.433405 select f_obj(:id,:uid);
315321
```
316322

317323
阶段性压测后文章数
318324

319325
```
320-
pipeline=# select count(*) from cv_obj;
321-
count
322-
----------
323-
27612942
324-
(1 row)
326+
pipeline=# select count(*) from cv_obj;
327+
count
328+
----------
329+
86842876
330+
(1 row)
325331
326332
-- 查询钟鼎附近的词被LIKE的次数
327333
328334
pipeline=# select like_cnt from cv_obj where id=100000000;
329335
like_cnt
330336
----------
331-
15060
337+
18317
332338
(1 row)
333339
334340
pipeline=# select like_cnt from cv_obj where id=100000001;
335341
like_cnt
336342
----------
337-
14927
343+
18410
338344
(1 row)
339345
340346
pipeline=# select like_cnt from cv_obj where id=100000002;
341347
like_cnt
342348
----------
343-
15156
349+
18566
350+
(1 row)
351+
352+
pipeline=# select like_cnt from cv_obj where id=100000000-1;
353+
like_cnt
354+
----------
355+
18380
356+
(1 row)
357+
358+
pipeline=# select like_cnt from cv_obj where id=100000000-2;
359+
like_cnt
360+
----------
361+
18399
344362
(1 row)
345363
346364
鈡的底部边缘被LIKE就很少
@@ -353,7 +371,7 @@ pipeline=# select * from cv_obj where id>199999990;
353371

354372
符合预期,继续压测。(或者我们也可以选择指数分布进行测试)
355373

356-
暂时没有进行优化,CPU使用情况如下
374+
暂时没有进行优化的情况下,CPU使用情况如下
357375

358376
```
359377
Cpu(s): 35.2%us, 17.4%sy, 13.8%ni, 33.2%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st
@@ -367,8 +385,6 @@ Cpu(s): 35.2%us, 17.4%sy, 13.8%ni, 33.2%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st
367385

368386
持续压测like,产生2亿文章的LIKE数据,然后进入测试2。
369387

370-
或者随机生成2亿LIKE数据,根据场景提到的LIKE次数分布。另外还需要随机生成关系数据,根据场景提到的关注分布。
371-
372388
### 生成用户关系数据
373389
1\. 用户ID范围
374390

@@ -451,65 +467,105 @@ pipeline=# select count(*) from user_like_agg ;
451467

452468
3\. 查询LIKE某文章的用户中,哪些是我的好友?
453469

454-
压测脚本1, 查询文章被谁like?查询文章被like了多少次?
470+
压测脚本1, 查询文章被谁like?
455471

456472
```
457473
vi test1.sql
458474
\setrandom id 1 200000000
459-
select who_like,like_cnt from cv_obj where id=:id;
475+
select who_like from cv_obj where id=:id;
460476
461477
pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 128 -j 128 -T 120
462478
```
463479

464-
压测脚本2, 查询LIKE某文章的用户中,哪些是我的好友?
480+
压测脚本2, 查询文章被like了多少次?
481+
482+
```
483+
vi test2.sql
484+
\setrandom id 1 200000000
485+
select like_cnt from cv_obj where id=:id;
486+
487+
pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 128 -j 128 -T 120
488+
```
489+
490+
压测脚本3, 查询LIKE某文章的用户中,哪些是我的好友?
465491

466492
```
467-
vi test2.sql
493+
vi test3.sql
468494
\setrandom id 1 200000000
469495
\setrandom uid 1 100000000
470496
select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;
471497
472-
pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 128 -j 128 -T 120
498+
pgbench -M prepared -n -r -P 1 -f ./test3.sql -c 128 -j 128 -T 120
473499
```
474500

475-
压测结果1,基于对象ID的PK查询,达到 104万/s 并不意外。
501+
压测结果1,查询文章被谁like? 达到 101万/s 并不意外。
476502

477503
```
478-
transaction type: Custom query
479-
scaling factor: 1
480-
query mode: prepared
481-
number of clients: 128
482-
number of threads: 128
483-
duration: 120 s
484-
number of transactions actually processed: 125251141
485-
latency average: 0.122 ms
486-
latency stddev: 0.210 ms
487-
tps = 1043643.576926 (including connections establishing)
488-
tps = 1043716.991815 (excluding connections establishing)
489-
statement latencies in milliseconds:
490-
0.001711 \setrandom id 1 1000000000
491-
0.119755 select who_like,like_cnt from cv_obj where id=:id;
504+
transaction type: Custom query
505+
scaling factor: 1
506+
query mode: prepared
507+
number of clients: 128
508+
number of threads: 128
509+
duration: 120 s
510+
number of transactions actually processed: 121935264
511+
latency average: 0.125 ms
512+
latency stddev: 0.203 ms
513+
tps = 1016035.198013 (including connections establishing)
514+
tps = 1016243.580731 (excluding connections establishing)
515+
statement latencies in milliseconds:
516+
0.001589 \setrandom id 1 1000000000
517+
0.123249 select who_like from cv_obj where id=:id;
492518
```
493519

494-
压测结果2,查询LIKE某文的用户中,哪些是我的好友?82.2万/s。
520+
压测结果2,查询文章被like了多少次? 104万/s。
495521

496522
```
497-
transaction type: Custom query
498-
scaling factor: 1
499-
query mode: prepared
500-
number of clients: 128
501-
number of threads: 128
502-
duration: 120 s
503-
number of transactions actually processed: 98735109
504-
latency average: 0.155 ms
505-
latency stddev: 2.237 ms
506-
tps = 822678.853360 (including connections establishing)
507-
tps = 822803.996869 (excluding connections establishing)
508-
statement latencies in milliseconds:
509-
0.001786 \setrandom id 1 1000000000
510-
0.000748 \setrandom uid 1 100000000
511-
0.151807 select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;
523+
transaction type: Custom query
524+
scaling factor: 1
525+
query mode: prepared
526+
number of clients: 128
527+
number of threads: 128
528+
duration: 120 s
529+
number of transactions actually processed: 124966713
530+
latency average: 0.122 ms
531+
latency stddev: 0.204 ms
532+
tps = 1041268.730790 (including connections establishing)
533+
tps = 1041479.852625 (excluding connections establishing)
534+
statement latencies in milliseconds:
535+
0.001708 \setrandom id 1 1000000000
536+
0.120069 select like_cnt from cv_obj where id=:id;
512537
```
538+
539+
压测结果3,查询LIKE某文的用户中,哪些是我的好友? 64.8万/s。
540+
541+
```
542+
transaction type: Custom query
543+
scaling factor: 1
544+
query mode: prepared
545+
number of clients: 128
546+
number of threads: 128
547+
duration: 120 s
548+
number of transactions actually processed: 77802915
549+
latency average: 0.196 ms
550+
latency stddev: 1.649 ms
551+
tps = 648273.025370 (including connections establishing)
552+
tps = 648368.477278 (excluding connections establishing)
553+
statement latencies in milliseconds:
554+
0.001719 \setrandom id 1 1000000000
555+
0.000695 \setrandom uid 1 100000000
556+
0.193728 select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;
557+
```
558+
559+
## 优化思路
560+
1\. 数组越长,一条记录占用的空间会越大,使用TOAST切片存储,可以有效的提高查询非数组字段的效率。
561+
562+
```
563+
例如
564+
565+
alter table cv_obj alter column who_like set (storage=extended);
566+
```
567+
568+
2\. profiling,针对性的优化。
513569

514570
## 小结
515571
微博、facebook最常用的操作:
@@ -548,19 +604,23 @@ statement latencies in milliseconds:
548604

549605
1\. 关注微博(文章)
550606

551-
17.9万/s,预计可以优化到30万以上
607+
17.7万/s,预计可以优化到30万
552608

553-
2\. 查询文章被谁like?查询文章被like了多少次?
609+
2\. 查询文章被谁like?
554610

555-
104.3万/s
611+
101.6万/s
612+
613+
3\. 查询文章被like了多少次?
556614

557-
3\. 查询LIKE某文章的用户中,哪些是我的好友?
615+
104.1万/s
558616

559-
82.2万/s
617+
4\. 查询LIKE某文章的用户中,哪些是我的好友?
560618

561-
![pic](20170512_02_pic_003.jpg)
619+
64.8万/s
562620

563-
机器:
621+
![pic](20170512_02_pic_003.jpg)
622+
623+
5\. 机器:
564624

565625
(10W左右价位的X86,12*8TB SATA盘,1块SSD作为BCACHE)
566626

@@ -574,4 +634,4 @@ statement latencies in milliseconds:
574634

575635
[《PostgreSQL on Linux 最佳部署手册》](../201611/20161121_01.md)
576636

577-
[《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》](.../201506/20150618_01.md)
637+
[《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》](.../201506/20150618_01.md)

201705/20170512_02_pic_003.jpg

-657 Bytes
Loading

0 commit comments

Comments
 (0)