digoal
diff --git a/‎201705/20170512_02.md‎
Lines changed: 136 additions & 76 deletions b/‎201705/20170512_02.md‎
Lines changed: 136 additions & 76 deletions
diff --git a/‎201705/20170512_02_pic_003.jpg‎
-657 Bytes b/‎201705/20170512_02_pic_003.jpg‎
-657 Bytes
@@ -268,79 +268,97 @@ pipeline=# select * from cv_obj;
 
 3\. 热点文章ID范围    
 
-总共2亿文章，使用高斯分布进行LIKE，95%的文章ID分布在钟鼎为中心的2.0/xx这个区间内，67%分布在1.0/xx这个区间。 横坐标越靠近鈡的顶端的值，产生的概率越高。xx越小，鈡越尖，也就是说高频值越少。  
+总共2亿文章，使用高斯分布进行LIKE，分布在以钟鼎为中心的2.0/xx这个区间内的文章ID，覆盖了95%的出现概率。分布在1.0/xx这个区间的文章ID覆盖了67%的出现概率。   
 
-假设有2.7万高频文章，分布在95%的区间，那么XX=14900。  
+横坐标越靠近鈡的顶端的值（即文章ID=1亿），产生的概率越高。    
+   
+xx越小，鈡越尖，也就是说热点文章越少。  
 
 原理参考    
 
 [《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》](.../201506/20150618_01.md)     
-  
+   
 ![pic](../201506/20150618_01_pic_001.png)  
 
 4\. 随机用户喜欢随机文章    
 
 5\. 随机用户喜欢热点文章    
 
 ### 首先根据以上要求生成基础数据    
-压测脚本，LIKE文章，100万热点文章，使用高斯分布产生      
+压测脚本，LIKE文章，使用高斯分布产生文章ID，经过长时间的压测，文章被LIKE的次数呈现高斯分布，钟鼎的文章被LIKE的次数最多。        
+  
+xx设置为10.0，表示以钟鼎为中心的20%这个区间内的文章ID，覆盖了95%的出现概率。分布在10%这个区间的文章ID覆盖了67%的出现概率。  
+  
+xx越大，钟鼎的文章ID概率越高。    
 
 ```    
 vi test.sql    
 \setrandom uid 1 100000000    
-\setrandom id 1 200000000 gaussian 14900.0  
+\setrandom id 1 200000000 gaussian 10.0  
 select f_obj(:id,:uid);    
 ```    
 
-256个连接进行压测，测试结果，每秒产生17.9万次LIKE请求。     
+256个连接进行压测，测试结果，每秒产生17.7万次LIKE请求。     
 
 ```    
 pgbench -M prepared -n -r -P 1 -f ./test.sql -c 256 -j 256 -T 120    
     
-transaction type: Custom query  
-scaling factor: 1  
-query mode: prepared  
-number of clients: 256  
-number of threads: 256  
-duration: 120 s  
-number of transactions actually processed: 21500685  
-latency average: 1.427 ms  
-latency stddev: 1.204 ms  
-tps = 179035.949606 (including connections establishing)  
-tps = 179047.297058 (excluding connections establishing)  
-statement latencies in milliseconds:  
-        0.002314        \setrandom uid 1 100000000    
-        0.002261        \setrandom id 1 200000000 gaussian 14900.0  
-        1.422216        select f_obj(:id,:uid);    
+transaction type: Custom query
+scaling factor: 1
+query mode: prepared
+number of clients: 256
+number of threads: 256
+duration: 120 s
+number of transactions actually processed: 21331348
+latency average: 1.438 ms
+latency stddev: 0.591 ms
+tps = 177652.080934 (including connections establishing)
+tps = 177665.827969 (excluding connections establishing)
+statement latencies in milliseconds:
+        0.002267        \setrandom uid 1 100000000  
+        0.002384        \setrandom id 1 200000000 gaussian 10.0
+        1.433405        select f_obj(:id,:uid);  
 ```    
 
 阶段性压测后文章数    
 
 ```    
-pipeline=# select count(*) from cv_obj;    
-  count       
-----------    
- 27612942    
-(1 row)    
+pipeline=# select count(*) from cv_obj;
+  count   
+----------
+ 86842876
+(1 row) 
     
 -- 查询钟鼎附近的词被LIKE的次数  
   
 pipeline=# select like_cnt from cv_obj where id=100000000;
  like_cnt 
 ----------
-    15060
+    18317
 (1 row)
 
 pipeline=# select like_cnt from cv_obj where id=100000001;
  like_cnt 
 ----------
-    14927
+    18410
 (1 row)
 
 pipeline=# select like_cnt from cv_obj where id=100000002;
  like_cnt 
 ----------
-    15156
+    18566
+(1 row)
+
+pipeline=# select like_cnt from cv_obj where id=100000000-1;
+ like_cnt 
+----------
+    18380
+(1 row)
+
+pipeline=# select like_cnt from cv_obj where id=100000000-2;
+ like_cnt 
+----------
+    18399
 (1 row)
   
 鈡的底部边缘被LIKE就很少  
@@ -353,7 +371,7 @@ pipeline=# select * from cv_obj where id>199999990;
 
 符合预期，继续压测。(或者我们也可以选择指数分布进行测试)      
 
-暂时没有进行优化，CPU使用情况如下    
+暂时没有进行优化的情况下，CPU使用情况如下    
 
 ```    
 Cpu(s): 35.2%us, 17.4%sy, 13.8%ni, 33.2%id,  0.3%wa,  0.0%hi,  0.1%si,  0.0%st    
@@ -367,8 +385,6 @@ Cpu(s): 35.2%us, 17.4%sy, 13.8%ni, 33.2%id,  0.3%wa,  0.0%hi,  0.1%si,  0.0%st
 
 持续压测like，产生2亿文章的LIKE数据，然后进入测试2。    
 
-或者随机生成2亿LIKE数据，根据场景提到的LIKE次数分布。另外还需要随机生成关系数据，根据场景提到的关注分布。    
-    
 ### 生成用户关系数据    
 1\. 用户ID范围    
 
@@ -451,65 +467,105 @@ pipeline=# select count(*) from user_like_agg ;
 
 3\. 查询LIKE某文章的用户中，哪些是我的好友？    
 
-压测脚本1, 查询文章被谁like？查询文章被like了多少次？    
+压测脚本1, 查询文章被谁like？   
 
 ```    
 vi test1.sql    
 \setrandom id 1 200000000    
-select who_like,like_cnt from cv_obj where id=:id;    
+select who_like from cv_obj where id=:id;    
     
 pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 128 -j 128 -T 120    
 ```    
 
-压测脚本2, 查询LIKE某文章的用户中，哪些是我的好友？    
+压测脚本2, 查询文章被like了多少次？     
+  
+```
+vi test2.sql    
+\setrandom id 1 200000000    
+select like_cnt from cv_obj where id=:id;    
+    
+pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 128 -j 128 -T 120    
+```
+  
+压测脚本3, 查询LIKE某文章的用户中，哪些是我的好友？    
 
 ```    
-vi test2.sql    
+vi test3.sql    
 \setrandom id 1 200000000    
 \setrandom uid 1 100000000    
 select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;    
     
-pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 128 -j 128 -T 120    
+pgbench -M prepared -n -r -P 1 -f ./test3.sql -c 128 -j 128 -T 120    
 ```    
 
-压测结果1，基于对象ID的PK查询，达到 104万/s 并不意外。    
+压测结果1，查询文章被谁like？ 达到 101万/s 并不意外。    
 
 ```    
-transaction type: Custom query    
-scaling factor: 1    
-query mode: prepared    
-number of clients: 128    
-number of threads: 128    
-duration: 120 s    
-number of transactions actually processed: 125251141    
-latency average: 0.122 ms    
-latency stddev: 0.210 ms    
-tps = 1043643.576926 (including connections establishing)    
-tps = 1043716.991815 (excluding connections establishing)    
-statement latencies in milliseconds:    
-        0.001711        \setrandom id 1 1000000000    
-        0.119755        select who_like,like_cnt from cv_obj where id=:id;    
+transaction type: Custom query
+scaling factor: 1
+query mode: prepared
+number of clients: 128
+number of threads: 128
+duration: 120 s
+number of transactions actually processed: 121935264
+latency average: 0.125 ms
+latency stddev: 0.203 ms
+tps = 1016035.198013 (including connections establishing)
+tps = 1016243.580731 (excluding connections establishing)
+statement latencies in milliseconds:
+        0.001589        \setrandom id 1 1000000000
+        0.123249        select who_like  from cv_obj where id=:id;
 ```    
 
-压测结果2，查询LIKE某文的用户中，哪些是我的好友？82.2万/s。    
+压测结果2，查询文章被like了多少次？  104万/s。    
 
 ```    
-transaction type: Custom query    
-scaling factor: 1    
-query mode: prepared    
-number of clients: 128    
-number of threads: 128    
-duration: 120 s    
-number of transactions actually processed: 98735109    
-latency average: 0.155 ms    
-latency stddev: 2.237 ms    
-tps = 822678.853360 (including connections establishing)    
-tps = 822803.996869 (excluding connections establishing)    
-statement latencies in milliseconds:    
-        0.001786        \setrandom id 1 1000000000    
-        0.000748        \setrandom uid 1 100000000    
-        0.151807        select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;    
+transaction type: Custom query
+scaling factor: 1
+query mode: prepared
+number of clients: 128
+number of threads: 128
+duration: 120 s
+number of transactions actually processed: 124966713
+latency average: 0.122 ms
+latency stddev: 0.204 ms
+tps = 1041268.730790 (including connections establishing)
+tps = 1041479.852625 (excluding connections establishing)
+statement latencies in milliseconds:
+        0.001708        \setrandom id 1 1000000000
+        0.120069        select like_cnt from cv_obj where id=:id;
 ```    
+  
+压测结果3，查询LIKE某文的用户中，哪些是我的好友？  64.8万/s。    
+    
+```    
+transaction type: Custom query
+scaling factor: 1
+query mode: prepared
+number of clients: 128
+number of threads: 128
+duration: 120 s
+number of transactions actually processed: 77802915
+latency average: 0.196 ms
+latency stddev: 1.649 ms
+tps = 648273.025370 (including connections establishing)
+tps = 648368.477278 (excluding connections establishing)
+statement latencies in milliseconds:
+        0.001719        \setrandom id 1 1000000000
+        0.000695        \setrandom uid 1 100000000
+        0.193728        select array_intersect(t1.who_like, t2.like_who) from (select who_like from cv_obj where id=:id) t1,(select array[like_who] as like_who from user_like_agg where uid=:uid) t2;
+```    
+    
+## 优化思路
+1\. 数组越长，一条记录占用的空间会越大，使用TOAST切片存储，可以有效的提高查询非数组字段的效率。   
+  
+```
+例如
+
+alter table cv_obj alter column who_like set (storage=extended);
+```
+  
+2\. profiling，针对性的优化。   
 
 ## 小结    
 微博、facebook最常用的操作：    
@@ -548,19 +604,23 @@ statement latencies in milliseconds:
 
 1\. 关注微博（文章）    
 
-17.9万/s，预计可以优化到30万以上。    
+17.7万/s，预计可以优化到30万。    
 
-2\. 查询文章被谁like？查询文章被like了多少次？    
+2\. 查询文章被谁like？  
 
-104.3万/s    
+101.6万/s    
+  
+3\. 查询文章被like了多少次？   
 
-3\. 查询LIKE某文章的用户中，哪些是我的好友？    
+104.1万/s    
 
-82.2万/s    
+4\. 查询LIKE某文章的用户中，哪些是我的好友？    
 
-![pic](20170512_02_pic_003.jpg)    
+64.8万/s    
 
-机器:    
+![pic](20170512_02_pic_003.jpg)    
+      
+5\. 机器:    
 
 （10W左右价位的X86，12*8TB SATA盘，1块SSD作为BCACHE）    
 
@@ -574,4 +634,4 @@ statement latencies in milliseconds:
 
 [《PostgreSQL on Linux 最佳部署手册》](../201611/20161121_01.md)      
 
-[《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》](.../201506/20150618_01.md)     
+[《生成泊松、高斯、指数、随机分布数据 - PostgreSQL pg_bench 》](.../201506/20150618_01.md)