digoal
2016-11-25
PostgreSQL , int128 , clang , gcc , icc
PostgreSQL 9.4以及以前的版本,在INT,INT2,INT8的聚合计算中,为了保证数据不会溢出,中间结果使用numeric来存储。
numeric是PostgreSQL自己实现的一种数值类型,可以存储非常大的数值(估计是做科学计算的需求),但是牺牲了一定的性能。
为了提高聚合,特别是大数据量的聚合时的性能,社区借用了编译器支持的int128类型,作为数据库int, int2, int8的中间计算结果,从而提升计算性能。
gcc,clang,icc都支持int128
1. gcc
6.8 128-bit Integers
As an extension the integer scalar type __int128 is supported for targets which have an integer mode wide enough to hold 128 bits.
Simply write __int128 for a signed 128-bit integer, or unsigned __int128 for an unsigned 128-bit integer.
There is no support in GCC for expressing an integer constant of type __int128 for targets with long long integer less than 128 bits wide.
2. icc
From what I can tell, at least icc 13.0.1+ support __int128_t and __uint128_t. Courtesy of Matt Godbolt's Compiler Explorer:
__int128_t ai (__int128_t x, __int128_t y) {
return x + y;
}
__int128_t mi (__int128_t x, __int128_t y) {
return x * y;
}
__int128_t di (__int128_t x, __int128_t y) {
return x / y;
}
__int128_t ri (__int128_t x, __int128_t y) {
return x % y;
}
compiles to:
L__routine_start_ai_0:
ai:
add rdi, rdx #2.14
mov rax, rdi #2.14
adc rsi, rcx #2.14
mov rdx, rsi #2.14
ret #2.14
L__routine_start_mi_1:
mi:
mov rax, rdi #6.14
imul rsi, rdx #6.14
imul rcx, rdi #6.14
mul rdx #6.14
add rsi, rcx #6.14
add rdx, rsi #6.14
ret #6.14
L__routine_start_di_2:
di:
push rsi #9.44
call __divti3 #10.14
pop rcx #10.14
ret #10.14
L__routine_start_ri_3:
ri:
push rsi #13.44
call __modti3 #14.14
pop rcx #14.14
ret #14.14
with icc 13.0.1 (http://goo.gl/UnxEFt).
编译时根据编译器的特性自动判断是否使用int128特性.
Add, optional, support for 128bit integers.
We will, for the foreseeable future, not expose 128 bit datatypes to
SQL. But being able to use 128bit math will allow us, in a later patch,
to use 128bit accumulators for some aggregates; leading to noticeable
speedups over using numeric.
So far we only detect a gcc/clang extension that supports 128bit math,
but no 128bit literals, and no *printf support. We might want to expand
this in the future to further compilers; if there are any that that
provide similar support.
config/c-compiler.m4 diff | blob | blame | history
configure diff | blob | blame | history
configure.in diff | blob | blame | history
src/include/c.h diff | blob | blame | history
src/include/pg_config.h.in diff | blob | blame | history
src/include/pg_config.h.win32 diff | blob | blame | history
There was recently talk about if we should start using 128-bit integers
(where available) to speed up the aggregate functions over integers
which uses numeric for their internal state. So I hacked together a
patch for this to see what the performance gain would be.
Previous thread:
http://www.postgresql.org/message-id/20141017182500.GF2075@alap3.anarazel.de
What the patch does is switching from using numerics in the aggregate
state to int128 and then convert the type from the 128-bit integer in
the final function.
The functions where we can make use of int128 states are:
- sum(int8)
- avg(int8)
- var_*(int2)
- var_*(int4)
- stdev_*(int2)
- stdev_*(int4)
The initial benchmark results look very promising. When summing 10
million int8 I get a speedup of ~2.5x and similarly for var_samp() on 10
million int4 I see a speed up of ~3.7x. To me this indicates that it is
worth the extra code. What do you say? Is this worth implementing?
The current patch still requires work. I have not written the detection
of int128 support yet, and the patch needs code cleanup (for example: I
used an int16_ prefix on the added functions, suggestions for better
names are welcome). I also need to decide on what estimate to use for
the size of that state.
The patch should work and pass make check on platforms where __int128_t
is supported.
The simple benchmarks:
CREATE TABLE test_int8 AS SELECT x::int8 FROM generate_series(1,
10000000) x;
Before:
# SELECT sum(x) FROM test_int8;
sum
----------------
50000005000000
(1 row)
Time: 2521.217 ms
After:
# SELECT sum(x) FROM test_int8;
sum
----------------
50000005000000
(1 row)
Time: 1022.811 ms
CREATE TABLE test_int4 AS SELECT x::int4 FROM generate_series(1,
10000000) x;
Before:
# SELECT var_samp(x) FROM test_int4;
var_samp
--------------------
8333334166666.6667
(1 row)
Time: 3808.546 ms
After:
# SELECT var_samp(x) FROM test_int4;
var_samp
--------------------
8333334166666.6667
(1 row)
Time: 1033.243 ms
3. https://www.postgresql.org/message-id/flat/544BB5F1.50709%40proxel.se#544BB5F1.50709@proxel.se
4. http://stackoverflow.com/questions/16365840/128-bit-integers-supporting-and-in-the-intel-c-compiler
5. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.