digoal
diff --git a/‎201210/20121024_01.md‎
Lines changed: 689 additions & 0 deletions b/‎201210/20121024_01.md‎
Lines changed: 689 additions & 0 deletions
diff --git a/‎201210/readme.md‎
Lines changed: 1 addition & 0 deletions b/‎201210/readme.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎201301/20130122_01.md‎
Lines changed: 342 additions & 0 deletions b/‎201301/20130122_01.md‎
Lines changed: 342 additions & 0 deletions
diff --git a/‎201301/readme.md‎
Lines changed: 1 addition & 0 deletions b/‎201301/readme.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎201702/20170215_03.md‎
Lines changed: 11 additions & 1 deletion b/‎201702/20170215_03.md‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎201702/20170215_03_pdf_001.pdf‎
369 KB b/‎201702/20170215_03_pdf_001.pdf‎
369 KB
@@ -1,4 +1,5 @@
 ### 文章列表  
 ----  
+##### 20121024_01.md   [《PostgreSQL 优化CASE - 无序UUID性能问题诊断》](20121024_01.md)  
 ##### 20121022_01.md   [《use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2》](20121022_01.md)  
 ##### 20121014_01.md   [《How many xlogs | WAL can be generated by PostgreSQL?》](20121014_01.md)  
@@ -0,0 +1,342 @@
+## PostgreSQL sharding有序UUID最佳实践 - serial global uuid stored in 64bit int8  
+                                                                                    
+### 作者                                                                                                                                 
+digoal                                                                               
+                                                                                      
+### 日期                                                                                 
+2013-01-22                                                                                
+                                                                                  
+### 标签                                                                               
+PostgreSQL , uuid , 有序uuid , sharding    
+                                                                                    
+----                                                                              
+                                                                                       
+## 背景        
+Instagram 使用PostgreSQL数据库, 2012year中国PostgreSQL用户大会的时候他们来做过一次交流。  
+  
+现在Instagram的月度活跃用户数已经超过9000万，每天照片上传量超4000万。  
+  
+sharding, 一个非常关键的算法是如何产生所有节点全局唯一的ID。  
+  
+Instagram 使用int8来存储这个唯一ID. 把64个bit位拆成3个部分, 如下 :   
+  
+1\. 最高的41个bit位, 存储从某时间点开始经过的毫秒数. (区别于UNIX epoch, 自'1970-01-01 00:00:00' 以来的秒数)  
+  
+2\. 接下来的13个bit位, 存储shard ID.  
+  
+3\. 最后10个bit位, 存储序列值.  
+  
+例如 :   
+  
+1\. 指定'2010-01-01 00:00:00' 为这个起点, 这41个bit存储的是从这个起点开始历经的毫秒数.   
+  
+41个bit位无符号的情况下可以存储2^41=2199023255552个数字, 也就是约69.7year的数据.  
+  
+```  
+postgres=# select (2^41)/1000/60/60/24/365.0;  
+     ?column?       
+------------------  
+ 69.7305700010147  
+(1 row)  
+```  
+  
+如果把起始值设置为'2012-01-01'的话, 69.7year后也就是 '2081-09-01' 后这个算法将会有问题. 因为数值将大于41个bit位.  
+  
+2\. shard ID用了13个bit位, 所以可以存储8192个shard节点的信息.  
+  
+```  
+postgres=# select 2^13;  
+ ?column?   
+----------  
+     8192  
+(1 row)  
+```  
+  
+如果每个shard节点用到1个主机, 使用这个算法的集群最大可以扩到8192个主机.  
+  
+3\. 序列值占用10个bit位, 可以存储1024个值.  
+  
+```  
+postgres=# select 2^10;  
+ ?column?   
+----------  
+     1024  
+(1 row)  
+```  
+  
+因此可以这么来理解. 在1毫秒内, 每个shard节点, 允许产生1024个唯一值. 1秒产生102.4万个唯一值.  
+  
+整个集群1秒允许产生102.4*8192 = 83.88608亿个唯一值.  
+  
+```  
+postgres=# select 1024*1000;  
+ ?column?   
+----------  
+  1024000  
+(1 row)  
+```  
+  
+前段时间测试过2.0GHz 至强 8核的主机每秒约生成11万个序列值. 所以102.4万个唯一值这个宽度对于一台shard节点来说应该是没有问题的.  
+  
+```  
+PostgreSQL 的序列性能测试 :   
+测试机 :   
+CentOS 5.7 x64  
+PostgreSQL 9.2.1  
+DELL R610  
+CPU 2 * Intel(R) Xeon(R) CPU E5504  @ 2.00GHz  
+  
+1. 测试不开启cache的情况下取序列的速度 :   
+创建序列 :   
+ocz@db-172-16-3-150-> psql   
+psql (9.2.1)  
+Type "help" for help.  
+digoal=> create sequence seq_test;  
+CREATE SEQUENCE  
+查看当前序列ID :   
+digoal=> select * from seq_test ;  
+-[ RECORD 1 ]-+--------------------  
+sequence_name | seq_test  
+last_value    | 1  
+start_value   | 1  
+increment_by  | 1  
+max_value     | 9223372036854775807  
+min_value     | 1  
+cache_value   | 1  
+log_cnt       | 0  
+is_cycled     | f  
+is_called     | f  
+  
+pgbench测试脚本 :   
+ocz@db-172-16-3-150-> cat t.sql  
+select nextval('seq_test');  
+  
+测试结果 :   
+ocz@db-172-16-3-150-> pgbench -M prepared -n -r -f ./t.sql -c 16 -j 4 -T 30 -U digoal digoal  
+transaction type: Custom query  
+scaling factor: 1  
+query mode: prepared  
+number of clients: 16  
+number of threads: 4  
+duration: 30 s  
+number of transactions actually processed: 3085448  
+tps = 102832.533289 (including connections establishing)  
+tps = 102891.321540 (excluding connections establishing)  
+statement latencies in milliseconds:  
+        0.153352        select nextval('seq_test');  
+由此看出不启用cache的情况下每秒可取102891个序列值.  
+  
+2. 测试开启cache的情况下取序列的速度 :   
+digoal=> alter sequence seq_test restart with 1;  
+ALTER SEQUENCE  
+digoal=> alter sequence seq_test cache 100;  
+ALTER SEQUENCE  
+ocz@db-172-16-3-150-> pgbench -M prepared -n -r -f ./t.sql -c 16 -j 4 -T 30 -U digoal digoal  
+transaction type: Custom query  
+scaling factor: 1  
+query mode: prepared  
+number of clients: 16  
+number of threads: 4  
+duration: 30 s  
+number of transactions actually processed: 3359853  
+tps = 111975.743127 (including connections establishing)  
+tps = 112049.881187 (excluding connections establishing)  
+statement latencies in milliseconds:  
+        0.140799        select nextval('seq_test');  
+ocz@db-172-16-3-150-> psql digoal digoal  
+psql (9.2.1)  
+Type "help" for help.  
+digoal=> \x  
+Expanded display is on.  
+digoal=> select * from seq_test ;  
+-[ RECORD 1 ]-+--------------------  
+sequence_name | seq_test  
+last_value    | 3360400  
+start_value   | 1  
+increment_by  | 1  
+max_value     | 9223372036854775807  
+min_value     | 1  
+cache_value   | 100  
+log_cnt       | 32  
+is_cycled     | f  
+is_called     | t  
+获取速度为112049每秒. 略有提高. 但是如果非长连接的话, 将造成巨大的浪费. 如下 :   
+digoal=> alter sequence seq_test restart with 1;  
+ALTER SEQUENCE  
+  
+调整pgbench参数 :   
+  -C           establish new connection for each transaction  
+  
+测试结果 :   
+ocz@db-172-16-3-150-> pgbench -M simple -C -n -r -f ./t.sql -c 16 -j 4 -T 30 -U digoal digoal  
+transaction type: Custom query  
+scaling factor: 1  
+query mode: simple  
+number of clients: 16  
+number of threads: 4  
+duration: 30 s  
+number of transactions actually processed: 25865  
+tps = 861.986914 (including connections establishing)  
+tps = 59712.243088 (excluding connections establishing)  
+statement latencies in milliseconds:  
+        13.960707       select nextval('seq_test');  
+查看最后的序列值 :   
+ocz@db-172-16-3-150-> psql digoal digoal  
+psql (9.2.1)  
+Type "help" for help.  
+digoal=> \x  
+Expanded display is on.  
+digoal=> select * from seq_test ;  
+-[ RECORD 1 ]-+--------------------  
+sequence_name | seq_test  
+last_value    | 2588100  
+start_value   | 1  
+increment_by  | 1  
+max_value     | 9223372036854775807  
+min_value     | 1  
+cache_value   | 100  
+log_cnt       | 32  
+is_cycled     | f  
+is_called     | t  
+实际处理事务为25865个,  但是序列值已经增长到了2588100, 100倍的浪费.  
+测试结果仅供参考.  
+```  
+  
+例子 :   
+  
+假设起点为'2012-01-01', 转换成unix epoch再转换成毫秒后: 1325376000000  
+  
+```  
+postgres=# select EXTRACT(EPOCH FROM '2012-01-01 00:00:00'::timestamp) * 1000;  
+   ?column?      
+---------------  
+ 1325376000000  
+(1 row)  
+```  
+  
+逻辑的shard可以使用schema来区分, 当然也可以使用database来区分. 本例使用schema来区分.  
+  
+需要为每个shard创建生成全局唯一ID的函数 :   
+  
+以 shard_id = 5 这个shard节点为例, 起始epoch = 1325376000000.  
+  
+函数如下 :   
+  
+```  
+CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$  
+DECLARE  
+    our_epoch bigint := 1325376000000;  
+    seq_id bigint;  
+    now_millis bigint;  
+    shard_id int := 5;  
+BEGIN  
+    SELECT nextval('insta5.table_id_seq') % 1024 INTO seq_id;  
+    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;  
+    result := (now_millis - our_epoch) << 23;  
+    result := result | (shard_id << 10);  
+    result := result | (seq_id);  
+END;  
+$$ LANGUAGE PLPGSQL;  
+```  
+  
+shard 5 中对应的表结构, id为全局唯一主键, 默认值来自上面的函数产生的值.  
+  
+```  
+CREATE TABLE insta5.our_table (  
+    "id" bigint NOT NULL DEFAULT insta5.next_id(),  
+    ...rest of table schema...  
+)  
+```  
+  
+## 小结  
+1\. 因为int8是带符号整型, 如果第一个BIT=1, 得出负数.   
+  
+因此前34.87year这个函数产生的是正数, 后34.87year这个函数产生的是负数.  
+  
+```  
+postgres=# select 2^40/1000/60/60/24/365;  
+     ?column?       
+------------------  
+ 34.8652850005074  
+(1 row)  
+```  
+  
+例如 :   
+  
+正值 :   
+  
+```  
+postgres=# select date '2012-01-01'+ interval '34.8652850005074 year'  
+postgres-# ;  
+      ?column?         
+---------------------  
+ 2046-11-01 00:00:00  
+(1 row)  
+  
+postgres=# do language plpgsql $$  
+DECLARE  
+    our_epoch bigint := 1325376000000;  
+    seq_id bigint;  
+    now_millis bigint;  
+    shard_id int := 5;  
+    result bigint;  
+BEGIN  
+    SELECT 112345 % 1024 INTO seq_id;  
+  
+    SELECT FLOOR(EXTRACT(EPOCH FROM '2046-11-01 00:00:00'::timestamp) * 1000) INTO now_millis;  
+    result := (now_millis - our_epoch) << 23;  
+    result := result | (shard_id << 10);  
+    result := result | (seq_id);  
+    raise notice '%', result;  
+END;  
+$$;  
+NOTICE:  9221321628057605849  
+DO  
+```  
+  
+负值 :   
+  
+```  
+postgres=# select date '2012-01-01'+ interval '34.9652850005074 year';  
+      ?column?         
+---------------------  
+ 2046-12-01 00:00:00  
+(1 row)  
+  
+postgres=# do language plpgsql $$                                       
+DECLARE  
+    our_epoch bigint := 1325376000000;  
+    seq_id bigint;  
+    now_millis bigint;  
+    shard_id int := 5;  
+    result bigint;  
+BEGIN  
+    SELECT 112345 % 1024 INTO seq_id;  
+  
+    SELECT FLOOR(EXTRACT(EPOCH FROM '2046-12-01 00:00:00'::timestamp) * 1000) INTO now_millis;  
+    result := (now_millis - our_epoch) << 23;  
+    result := result | (shard_id << 10);  
+    result := result | (seq_id);  
+    raise notice '%', result;  
+END;  
+$$;  
+NOTICE:  -9203679173715945767  
+DO  
+```  
+  
+2\. 这个算法的好处还有1个就是它产生的值是有顺序的, 不是无序的UUID. 因此存储顺序和索引的顺序一致性非常高.  
+  
+对于使用索引查找是非常有效的.  
+  
+并且这个算法对于shard也非常方便.  
+  
+## 参考  
+1\.   
+  
+```  
+postgres=# \do |  
+                                  List of operators  
+   Schema   | Name | Left arg type | Right arg type | Result type |    Description      
+------------+------+---------------+----------------+-------------+-------------------  
+ pg_catalog | |    | bigint        | bigint         | bigint      | bitwise or  
+```  
@@ -2,6 +2,7 @@
 ----  
 ##### 20130128_01.md   [《PostgreSQL Developer Options (debug, trace, system table mod and so on...) 详解》](20130128_01.md)  
 ##### 20130127_01.md   [《PostgreSQL 9.3 will add pg_isready script to test PostgreSQL server if allowed to connect》](20130127_01.md)  
+##### 20130122_01.md   [《PostgreSQL sharding有序UUID最佳实践 - serial global uuid stored in 64bit int8》](20130122_01.md)  
 ##### 20130110_01.md   [《HOW to Change PostgreSQL's TOAST_TUPLE_THRESHOLD》](20130110_01.md)  
 ##### 20130109_01.md   [《PostgreSQL large row|column performance tuning case》](20130109_01.md)  
 ##### 20130105_01.md   [《PostgreSQL partition table name convert to data type》](20130105_01.md)  
 
@@ -236,14 +236,24 @@ relocatable = true
 然后就可以使用 create extension pg_sphere; 创建这个模块了    
 
 修改Makefile，略。      
+  
+## 注意
+pg_sphere，2017年开始重新拉的分支.
+
+https://github.com/pgsphere
+
+http://pgsphere.github.io/
+
+http://pgsphere.github.io/download.html
 
 ## 参考    
 http://pgsphere.projects.pgfoundry.org/index.html    
 
 http://skyview.gsfc.nasa.gov/xaminblog/index.php/tag/pgsphere/    
 
 https://github.com/mnullmei/pgsphere/tree/fixes-1-1-1    
+  
+[pg_sphere 1.1.5手册](20170215_03_pdf_001.pdf)  
 
-                                          
 [Count](http://info.flagcounter.com/h9V1)