digoal
diff --git a/‎201506/20150601_01.md‎
Lines changed: 6 additions & 0 deletions b/‎201506/20150601_01.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎201701/20170112_01.md‎
Lines changed: 216 additions & 0 deletions b/‎201701/20170112_01.md‎
Lines changed: 216 additions & 0 deletions
diff --git a/‎201701/20170112_01_pic_001.png‎
29.5 KB b/‎201701/20170112_01_pic_001.png‎
29.5 KB
diff --git a/‎201701/20170112_02.md‎
Lines changed: 131 additions & 0 deletions b/‎201701/20170112_02.md‎
Lines changed: 131 additions & 0 deletions
@@ -112,6 +112,12 @@ revoke all on view pg_user_mapings from public;
 
 9\. 应用程序配置文件中如果需要配置用户和密码，请确保应用程序服务器的安全。防止配置文件泄露。  
 
+10\. 数据库本身的认证方式加固，但是需要客户端驱动同时来支持，修改认证协议。   
+  
+参考  
+  
+[《PostgreSQL psql 安全设置数据库用户密码的方法之一》](../201701/20170112_01.md)  
+  
 ## 二、数据传输安全
 确保数据传输过程的安全，即使数据被截获，也不需要担心。  
 
 
@@ -0,0 +1,216 @@
+## PostgreSQL psql 安全设置数据库用户密码的方法之一    
+                                
+### 作者               
+digoal                                                                                                      
+                           
+### 日期                                                                                                     
+2017-01-12                                                                                                           
+                             
+### 标签               
+PostgreSQL , psql , 安全 , 密码 , 日志 , password                    
+                                                                                                      
+----                                                                                                    
+                                                                 
+## 背景   
+密码有多重要就不需要多说了，但是你知道密码有多少可能泄露的渠道吗？  
+  
+大多数人可能觉得在设置好密码后，保管好不被泄露就可以了。  
+  
+但是你有没有想过，在设置密码的过程中就泄露了呢？  
+  
+比如数据库中设置用户密码，有多少种可能泄露的渠道？  
+  
+比如，我们在修改数据库用户密码时，可能经历这么长的流程才能最终将新的密码写入数据库的元数据pg_authid中.  
+  
+![pic](20170112_01_pic_001.png)    
+  
+这么多环节，都有可能被不法分子有机可乘。是不是不能简单的认为设置好密码之后就万事大吉了呢？很有可能在你设置的过程中就被截获了。  
+  
+即使MD5被截获或者泄露，也是危险的，详见  
+  
+[《PostgreSQL 对比 MySQL - 秘钥认证》](../201610/20161009_01.md)  
+  
+当然，现在PostgreSQL已经意识到这个问题，在进行协议层认证方面的改造，如下：  
+  
+[《元旦技术大礼包 - 2017金秋将要发布的PostgreSQL 10.0已装备了哪些核武器？》](../201701/20170101_01.md)    
+  
+所以为了你的安全，我建议你仔细阅读以下数据库的安全加固方法  
+  
+[《PostgreSQL 密码安全指南》](../201410/20141009_01.md)    
+  
+[《PostgreSQL 数据库安全指南》](../201506/20150601_01.md)    
+  
+[《DBA专供 冈本003系列 - 数据库安全第一,过个好年》](../201612/20161224_01.md)   
+  
+本文主要介绍以下psql这个客户端做的一个改进，在设置密码时，隐藏掉明文。(但是你要知道，即使这样，也是不够安全的，安全都是相对的)    
+  
+## psql \password  
+psql 新增的一个指令如下  
+  
+```  
+  \password [USERNAME]   securely change the password for a user  
+```  
+  
+对应的源码如下，会将用户输入的文本通过PQencryptPassword函数转换为md5，然后调用PSQLexec执行该ALTER USER XX PASSWORD 'MD5XX';  
+  
+src/bin/psql/command.c  
+  
+```  
+        /* \password -- set user password */  
+        else if (strcmp(cmd, "password") == 0)  
+        {  
+                char       *pw1;  
+                char       *pw2;  
+  
+                pw1 = simple_prompt("Enter new password: ", 100, false);  
+                pw2 = simple_prompt("Enter it again: ", 100, false);  
+  
+                if (strcmp(pw1, pw2) != 0)  
+                {  
+                        psql_error("Passwords didn't match.\n");  
+                        success = false;  
+                }  
+                else  
+                {  
+                        char       *opt0 = psql_scan_slash_option(scan_state, OT_SQLID, NULL, true);  
+                        char       *user;  
+                        char       *encrypted_password;  
+  
+                        if (opt0)  
+                                user = opt0;  
+                        else  
+                                user = PQuser(pset.db);  
+  
+                        encrypted_password = PQencryptPassword(pw1, user);  
+  
+                        if (!encrypted_password)  
+                        {  
+                                psql_error("Password encryption failed.\n");  
+                                success = false;  
+                        }  
+                        else  
+                        {  
+                                PQExpBufferData buf;  
+                                PGresult   *res;  
+  
+                                initPQExpBuffer(&buf);  
+                                printfPQExpBuffer(&buf, "ALTER USER %s PASSWORD ",  
+                                                                  fmtId(user));  
+                                appendStringLiteralConn(&buf, encrypted_password, pset.db);  
+                                res = PSQLexec(buf.data);  
+                                termPQExpBuffer(&buf);  
+                                if (!res)  
+                                        success = false;  
+                                else  
+                                        PQclear(res);  
+                                PQfreemem(encrypted_password);  
+                        }  
+  
+                        if (opt0)  
+                                free(opt0);  
+                }  
+                free(pw2);  
+        }  
+```  
+  
+src/bin/psql/common.c  
+  
+```  
+/*  
+ * PSQLexec  
+ *  
+ * This is the way to send "backdoor" queries (those not directly entered  
+ * by the user). It is subject to -E but not -e.  
+ *  
+ * Caller is responsible for handling the ensuing processing if a COPY  
+ * command is sent.  
+ *  
+ * Note: we don't bother to check PQclientEncoding; it is assumed that no  
+ * caller uses this path to issue "SET CLIENT_ENCODING".  
+ */  
+PGresult *  
+PSQLexec(const char *query)  
+{  
+        PGresult   *res;  
+  
+        if (!pset.db)  
+        {  
+                psql_error("You are currently not connected to a database.\n");  
+                return NULL;  
+        }  
+  
+        if (pset.echo_hidden != PSQL_ECHO_HIDDEN_OFF)  
+        {  
+                printf(_("********* QUERY **********\n"  
+                                 "%s\n"  
+                                 "**************************\n\n"), query);  
+                fflush(stdout);  
+                if (pset.logfile)  
+                {  
+                        fprintf(pset.logfile,  
+                                        _("********* QUERY **********\n"  
+                                          "%s\n"  
+                                          "**************************\n\n"), query);  
+                        fflush(pset.logfile);  
+                }  
+  
+                if (pset.echo_hidden == PSQL_ECHO_HIDDEN_NOEXEC)  
+                        return NULL;  
+        }  
+  
+        SetCancelConn();  
+  
+        res = PQexec(pset.db, query);  
+  
+        ResetCancelConn();  
+  
+        if (!AcceptResult(res))  
+        {  
+                ClearOrSaveResult(res);  
+                res = NULL;  
+        }  
+  
+        return res;  
+}  
+```  
+  
+## 测试  
+为什么说它不是绝对安全呢？因为MD5本身就不安全，另外同样会有诸多渠道可能泄露这个MD5。  
+   
+不过任何数据库都一样，没有绝对的安全，都是相对的安全，所以非常建议大伙参考一下文章末尾的几篇文章来加固你的数据库。   
+   
+```  
+postgres=# set log_statement='all';  
+postgres=# set client_min_messages ='log';  
+postgres=# \password digoal  
+Enter new password:   
+Enter it again:   
+LOG:  statement: ALTER USER digoal PASSWORD 'md5462f71c79368ccf422f8a773ef40074d'  
+  
+postgres=# select * from pg_authid where rolname='digoal';  
+LOG:  statement: select * from pg_authid where rolname='digoal';  
+ rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcanlogin | rolreplication | rolbypassrls | rolconnlimit |             rolpassword             | rolvaliduntil   
+---------+----------+------------+---------------+-------------+-------------+----------------+--------------+--------------+-------------------------------------+---------------  
+ digoal  | f        | t          | f             | f           | t           | f              | f            |           -1 | md5462f71c79368ccf422f8a773ef40074d |   
+(1 row)  
+```  
+  
+## 源码层思考
+从源码层面如何杜绝单一的加密方法呢？  
+  
+比如引入可以识别instance的UUID，例如systemid，多重加密，这样的话可能破解难度会进一步加大，或者避免一些重复的密码问题。  
+  
+## 参考  
+[《PostgreSQL 对比 MySQL - 秘钥认证》](../201610/20161009_01.md)  
+  
+[《元旦技术大礼包 - 2017金秋将要发布的PostgreSQL 10.0已装备了哪些核武器？》](../201701/20170101_01.md)    
+  
+[《PostgreSQL 密码安全指南》](../201410/20141009_01.md)    
+  
+[《PostgreSQL 数据库安全指南》](../201506/20150601_01.md)    
+  
+[《DBA专供 冈本003系列 - 数据库安全第一,过个好年》](../201612/20161224_01.md)   
+  
+            
+[Count](http://info.flagcounter.com/h9V1)                                                                                                  
+                               
@@ -0,0 +1,131 @@
+## 电商内容去重\内容筛选应用(如何高效识别转载\盗图\侵权?) - 文本相似、图片集相似、数组相似的优化和索引技术
+                   
+### 作者               
+digoal   
+    
+### 日期                                                                                                     
+2017-01-12                                                                                                           
+          
+### 标签               
+PostgreSQL , rum , tsvector , array , simlar , 相似度 , 内容去重 , 内容筛选 , 转载 , 盗图 , 侵权           
+    
+----   
+      
+## 背景  
+
+
+## 多种方法
+gin 和 gist哪个适合？
+
+因为重复度很小，或者本身就是要限制重复度，所以gin更适合。
+
+gin本身依旧是BTREE结构，而分词本身的元素总集合，或者数组的元素总集合由业务决定，比如总共有1亿商品，每条记录1000个商品，总共约100万记录。那么GIN的树也有1亿，也就几层的结构。
+
+搜索时，做多匹配1000个INDEX ITEM，就可以计算出相似度.
+
+图:
+
+
+
+### smlar
+
+tsvector -> array -> int[] -> intarray(union) -> smlar(index)
+
+%
+
+
+### rum
+
+tsvector
+
+
+## 相似度算法
+[《从相似度算法谈起 - Effective similarity search in PostgreSQL》](../201612/20161222_02.md)  
+
+## 文本相似
+
+rum
+[《从难缠的模糊查询聊开 - PostgreSQL独门绝招之一 GIN , GiST , SP-GiST , RUM 索引原理与技术背景》](../201612/20161231_01.md) 
+[《PostgreSQL 全文检索加速 快到没有朋友 - RUM索引接口(潘多拉魔盒)》](../201610/20161019_01.md)
+[《PostgreSQL 文本数据分析实践之 - 相似度分析》](../201608/20160817_01.md)  
+
+
+## 图片集相似
+
+## 对不同图片，但是相似图片的处理
+不在本文讨论范围，可以参考wavelet
+
+
+## 对同一图片的处理
+图片转换为HASH VALUE，存储为数组
+
+
+## 数组相似
+[《从相似度算法谈起 - Effective similarity search in PostgreSQL》](../201612/20161222_02.md) 
+http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/
+
+
+## hll
+矢量
+
+
+
+## 参考
+相似度
+[《从相似度算法谈起 - Effective similarity search in PostgreSQL》](../201612/20161222_02.md)  
+
+wavelet
+[《PostgreSQL 在视频、图片去重，图像搜索业务中的应用》](../201611/20161126_01.md)
+
+
+rum
+[《从难缠的模糊查询聊开 - PostgreSQL独门绝招之一 GIN , GiST , SP-GiST , RUM 索引原理与技术背景》](../201612/20161231_01.md) 
+[《PostgreSQL 全文检索加速 快到没有朋友 - RUM索引接口(潘多拉魔盒)》](../201610/20161019_01.md)
+[《PostgreSQL 文本数据分析实践之 - 相似度分析》](../201608/20160817_01.md)  
+https://github.com/postgrespro/rum
+
+距离(相似度)算法参考 
+src/rum_ts_utils.c 
+
+数组(文本)相似度算法
+
+[《从相似度算法谈起 - Effective similarity search in PostgreSQL》](../201612/20161222_02.md) 
+
+http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/
+
+
+KNN with TF-IDF based Framework for Text Categorization
+
+http://www.sciencedirect.com/science/article/pii/S1877705814003750
+
+数据挖掘-基于贝叶斯算法及KNN算法的newsgroup18828文本分类器的JAVA实现（上）
+
+http://blog.csdn.net/yangliuy/article/details/7400984
+
+TF-IDF与余弦相似性的应用（一）：自动提取关键词
+http://www.ruanyifeng.com/blog/2013/03/tf-idf.html
+
+TF-IDF与余弦相似性的应用（二）：找出相似文章
+http://www.ruanyifeng.com/blog/2013/03/cosine_similarity.html
+
+TF-IDF
+http://baike.baidu.com/view/1228847.htm
+
+
+hll
+https://research.neustar.biz/2013/02/04/open-source-release-postgresql-hll/
+http://docs.pipelinedb.com/probabilistic.html#hyperloglog
+https://www.citusdata.com/blog/2016/10/12/count-performance/
+
+excluding 约束
+https://www.postgresql.org/docs/9.6/static/sql-createtable.html
+
+pg_trgm
+https://www.postgresql.org/docs/9.6/static/pgtrgm.html
+
+中文分词
+https://github.com/jaiminpan/pg_jieba.git
+      
+      
+[Count](http://info.flagcounter.com/h9V1)                                                                                                  
+