Skip to content

Commit 1ce1bb3

Browse files
committed
sql_ascii;
1 parent 9901d97 commit 1ce1bb3

File tree

15 files changed

+763
-75
lines changed

15 files changed

+763
-75
lines changed

201212/20121228_01.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
## PostgreSQL WHY ERROR: invalid byte sequence for encoding "UTF8"
2+
3+
### 作者
4+
digoal
5+
6+
### 日期
7+
2012-12-28
8+
9+
### 标签
10+
PostgreSQL , SQL_ASCII , 乱码 , 不检测编码合法性 , client_encoding
11+
12+
----
13+
14+
## 背景
15+
使用PostgreSQL 的朋友可能遇到过类似ERROR: invalid byte sequence for encoding "UTF8": 0x00 的报错.
16+
17+
这是什么原因呢? 本文就来解释一下 :
18+
19+
首先我们这里说的是UTF8字符集, 我的测试环境如下 :
20+
21+
```
22+
ocz@db-172-16-3-150-> psql digoal digoal
23+
psql (9.2.1)
24+
Type "help" for help.
25+
digoal=> \l
26+
List of databases
27+
Name | Owner | Encoding | Collate | Ctype | Access privileges
28+
-----------+----------+----------+---------+-------+-----------------------
29+
digoal | postgres | UTF8 | C | C | =Tc/postgres +
30+
| | | | | postgres=CTc/postgres+
31+
| | | | | digoal=CTc/postgres
32+
postgres | postgres | UTF8 | C | C |
33+
skycac | postgres | UTF8 | C | C | =Tc/postgres +
34+
| | | | | postgres=CTc/postgres+
35+
| | | | | skycac=CTc/postgres
36+
template0 | postgres | UTF8 | C | C | =c/postgres +
37+
| | | | | postgres=CTc/postgres
38+
template1 | postgres | UTF8 | C | C | =c/postgres +
39+
| | | | | postgres=CTc/postgres
40+
(5 rows)
41+
```
42+
43+
Encoding = UTF8.
44+
45+
这个字符集的详细信息可以去看本文参考部分提到的几篇文章.
46+
47+
UTF8是变长的, 1-6个字节.
48+
49+
它需要遵循如下编码规则 :
50+
51+
![pic](20121228_01_pic_001.jpg)
52+
53+
实际能使用的比特位总数是7, 11, 16, 21, 26, 31.
54+
55+
上图每个字节中的x表示可以实际使用的位置. 其他的位置必须固定, 这么设计的好处之一是读到第一个字节的时候就知道这个字符占用几个字节.
56+
57+
正因为有以上规定, 凡是不符合这个规则的都视为非法字符.
58+
59+
合法使用的例子 :
60+
61+
![pic](20121228_01_pic_002.jpg)
62+
63+
Binary UTF-8的黑色数字就是固定位置的数字.
64+
65+
例如在数据库中查询以上Hexadecimal UTF-8 :
66+
67+
```
68+
digoal=> select E'\x24';
69+
?column?
70+
----------
71+
$
72+
(1 row)
73+
digoal=> select E'\xC2\xA2';
74+
?column?
75+
----------
76+
77+
(1 row)
78+
digoal=> select E'\xe2\x82\xac';
79+
?column?
80+
----------
81+
?
82+
(1 row)
83+
```
84+
85+
反向转换也是可以的 :
86+
87+
```
88+
digoal=> select 'a'::bytea;
89+
bytea
90+
-------
91+
\x61
92+
(1 row)
93+
digoal=> select 'abc'::bytea;
94+
bytea
95+
----------
96+
\x616263
97+
(1 row)
98+
digoal=> select '你好'::bytea;
99+
bytea
100+
----------------
101+
\xe4bda0e5a5bd
102+
(1 row)
103+
```
104+
105+
如果输入的字符编码违反了图中的规定, 就会报错 :
106+
107+
例如 :
108+
109+
10001111 转换成16进制是8F, 查询就报错 :
110+
111+
```
112+
digoal=> select E'\x8f';
113+
ERROR: invalid byte sequence for encoding "UTF8": 0x8f
114+
```
115+
116+
又或者 :
117+
118+
```
119+
digoal=> select E'\x00';
120+
ERROR: invalid byte sequence for encoding "UTF8": 0x00
121+
```
122+
123+
0x00报错又是为什么呢? 它是合法的UTF8字符!
124+
125+
因为 :psql does not support embedded NUL bytes in variable values. NUL就是E'\x00'.
126+
127+
如果要存储NUL, 请使用bytea类型 :
128+
129+
```
130+
digoal=> select '\x00'::bytea;
131+
bytea
132+
-------
133+
\x00
134+
(1 row)
135+
```
136+
137+
convert函数观察编码合法性
138+
139+
```
140+
postgres=# \df *convert*
141+
List of functions
142+
Schema | Name | Result data type | Argument data types | Type
143+
------------+--------------+------------------+---------------------+--------
144+
pg_catalog | convert | bytea | bytea, name, name | normal
145+
pg_catalog | convert_from | text | bytea, name | normal
146+
pg_catalog | convert_to | bytea | text, name | normal
147+
(3 rows)
148+
```
149+
150+
转义参考 :
151+
152+
String Constants with C-style Escapes
153+
154+
String Constants with Unicode Escapes
155+
156+
## 参考
157+
1\. http://en.wikipedia.org/wiki/UTF-8
158+
159+
2\. http://en.wikipedia.org/wiki/Unicode
160+
161+
3\. http://tools.ietf.org/html/rfc3629
162+
163+
4\. http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html
164+
165+
5\. http://stackoverflow.com/questions/1347646/postgres-error-on-insert-error-invalid-byte-sequence-for-encoding-utf8-0x0?rq=1
166+
167+
[Count](http://info.flagcounter.com/h9V1)
168+

201212/20121228_01_pic_001.jpg

34 KB
Loading

201212/20121228_01_pic_002.jpg

44.2 KB
Loading

201212/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
##### 20121217_01.md [《performance tuning about multi-rows query aggregated to single-row query》](20121217_01.md)
44
##### 20121218_01.md [《PostgreSQL plpgsql variadic argments , parameters - 可变参数个数》](20121218_01.md)
55
##### 20121218_02.md [《PostgreSQL aggregate function customize》](20121218_02.md)
6+
##### 20121228_01.md [《PostgreSQL WHY ERROR: invalid byte sequence for encoding "UTF8"》](20121228_01.md)

0 commit comments

Comments
 (0)