digoal
2018-01-13
PostgreSQL , HTAP , PPAS , enterprisedb , 资源隔离 , cpu , io
随着硬件发展,单机的能力越来越强,单个数据库可能服务于多个业务,或者服务于混合场景。例如即有分析型也有在线事务型处理(HTAP)。
资源隔离管理成为非常重要的需求。
对于Linux的用户,使用cgroup可以用来隔离进程的网络、CPU、IO、内存等。PostgreSQL为进程模型,使用cgroup可以实现资源隔离。
《Linux cgroup - cpu与cpuset子系统讲解》
《cgroup告诉你如何计算 PostgreSQL 数据库实例用了多少内存》
在内核层面实现资源隔离也是一个不错的选择,可以实现库级、会话级、用户级的资源限制和隔离。例如Greenplum的resource queue。
《Greenplum 内存与负载管理(resource queue)最佳实践》
在PostgreSQL商用发行版本Enterprisedb中,提供了内核层面的resource queue功能,可以控制用户的IO和CPU使用。
控制包括:
1、可以用多少个CPU CORE
2、每秒可以产生多少KB 的 dirty IO(实际上是shared buffer dirty page的产生速度,非dirty page都不算。通过pg_stat_statements.shared_blks_dirtied统计)
1、整个PG集群允许多少个活跃资源组(默认16),重启生效。
vi postgresql.conf
edb_max_resource_groups = 16
edb_max_resource_groups 务必设置足够大,大于你需要创建的资源组个数。
2、设置当前会话使用哪个资源组,设置优先级如下:
会话级 > 连接参数 > 用户级 > 数据库级 > 集群级。
edb_resource_group = ''
1、创建资源组
Command: CREATE RESOURCE GROUP
Description: define a new resource group
Syntax:
CREATE RESOURCE GROUP name
postgres=# create resource group abc;
2、配置资源组的限制,可以设置CPU和IO两种限制。
Command: ALTER RESOURCE GROUP
Description: change the definition of a resource group
Syntax:
ALTER RESOURCE GROUP name RENAME TO new_name
ALTER RESOURCE GROUP name SET parameter { TO | = } { value | DEFAULT }
ALTER RESOURCE GROUP name RESET parameter
cpu_rate_limit
dirty_rate_limit
-- ABC这个组可以使用32个CPU核。
alter resource group abc set cpu_rate_limit = 32;
-- abc这个组,每秒最多产生 400MB dirty shared buffer.
alter resource group abc set dirty_rate_limit =409600;
-- 0表示不限制。
3、查看系统中创建了多少资源组,以及资源组的限制。
postgres=# select * from edb_resource_group ;
rgrpname | rgrpcpuratelimit | rgrpdirtyratelimit
----------+------------------+--------------------
b | 0 | 0
abc | 32 | 409600
(2 rows)
4、设置当前会话资源组,当前会话生效。
postgres=# set edb_resource_group ='b';
SET
4.1、连接时,指定资源组参数。
enterprisedb@pg11-test-> psql options=-cedb_resource_group=rs2
psql.bin (11.1.7)
Type "help" for help.
postgres=# show edb_resource_group ;
edb_resource_group
--------------------
rs2
(1 row)
4.2、设置用户级资源组,使用这个用户连接时生效。
postgres=# alter role enterprisedb set edb_resource_group ='b';
ALTER ROLE
4.3、设置库级资源组,连接到这个库时生效。
postgres=# alter database postgres set edb_resource_group ='abc';
ALTER DATABASE
4.4、设置集群级资源组,集群级默认资源组。
vi $PGDATA/postgresql.conf
edb_resource_group = 'abc'
5、查看集群中的用户、数据库级资源控制配置参数。
postgres=#
select
coalesce(datname,'ALL') as dbs,
coalesce(rolname,'ALL') as roles,
setconfig
from pg_db_role_setting t1
left join pg_database t2
on (t1.setdatabase=t2.oid)
left join pg_roles t3
on (t1.setrole=t3.oid);
dbs | roles | setconfig
----------+--------------+------------------------
ALL | enterprisedb | {edb_resource_group=b}
(1 row)
6、查看当前会话属于哪个资源组。
postgres=# show edb_resource_group ;
edb_resource_group
--------------------
b
(1 row)
7、查看当前资源组的资源使用情况。
per_process_cpu_rate_limit 表示这个组内,当前平均每个进程的CPU核数使用限制。
per_process_dirty_rate_limit 表示这个组内,当前平均每个进程限制的产生dirty page的速度(KB/s)
随着进程数的加减,平均值会有变化。每个进程的限制并不是平均数,组内的进程可以相互抢资源,如果只有一个活跃进程在组内,则这个活跃进程可以使用这个组的所有资源。
postgres=# select * from edb_all_resource_groups ;
group_name | active_processes | cpu_rate_limit | per_process_cpu_rate_limit | dirty_rate_limit | per_process_dirty_rate_limit
------------+------------------+----------------+----------------------------+------------------+------------------------------
b | 1 | 0 | 16777216 | 0 | 16777216
abc | 0 | 32 | | 409600 |
(2 rows)
组与组之间没有牵制,所以多个组可以配置超过机器的总资源限制(类似云计算的超卖),所以超卖在系统繁忙时,可能相互抢占。
8、查询资源组系统参数配置。
postgres=# select * from pg_show_all_file_settings() where name='edb_resource_group';
sourcefile | sourceline | seqno | name | setting | applied | error
--------------------------------------------+------------+-------+--------------------+---------+---------+-------
/data04/ppas11/pg_root4000/postgresql.conf | 179 | 4 | edb_resource_group | rs3 | t |
(1 row)
周期性的统计组内进程的资源使用情况,并限制资源(采用sleep process方式)的使用。
PostgreSQL 代码中,getrusage是来获得CPU资源的使用情况的。
(仅限制shared buffer中的dirty page产生速度)
1、安装pg_stat_statements插件
reset 统计信息,执行query,查询在同一个resource group中的所有pid的统计信息
edb=# SELECT query, rows, total_time, shared_blks_dirtied FROM pg_stat_statements;
-[ RECORD 1 ]-------+--------------------------------------------------
query | INESRTINTO t1 VALUES (generate_series (?,?), ?);
rows | 10000
total_time | 2432.32
shared_blks_dirtied | 10003
。。。。。
2、分别计算在同一个resource group中的所有pid的dirty统计。
First note that the times of session 1 (28407.435) and session 2 (31343.458) are close to each other as they are both in the same resource group with dirty_rate_limit set to 6144, as compared to the times of session 3 (52727.846) and session 4 (56063.697), which are in the resource group with dirty_rate_limit set to 3072. The latter group has a slower dirty rate limit so the expected processing time is longer as is the case for sessions 3 and 4.
资源组1的两个进程,分别统计计算,然后相加,不能超过限制。
The actual dirty rate for session 1 is calculated as follows.
- The number of blocks dirtied per millisecond (ms) is 10003 blocks / 28407.435 ms, which yields 0.35212612 blocks per millisecond.
- Multiply the result by 1000 to give the number of shared blocks dirtied per second (1 second = 1000 ms), which yields 352.12612 blocks per second.
- Multiply the result by 8.192 to give the number of kilobytes dirtied per second (1 block = 8.192 kilobytes), which yields approximately 2885 kilobytes per second.
The actual dirty rate for session 2 is calculated as follows.
- The number of blocks dirtied per millisecond (ms) is 10003 blocks / 31343.458 ms, which yields 0.31914156 blocks per millisecond.
- Multiply the result by 1000 to give the number of shared blocks dirtied per second (1 second = 1000 ms), which yields 319.14156 blocks per second.
- Multiply the result by 8.192 to give the number of kilobytes dirtied per second (1 block = 8.192 kilobytes), which yields approximately 2614 kilobytes per second.
The combined dirty rate from session 1 (2885 kilobytes per second) and from session 2 (2614 kilobytes per second) yields 5499 kilobytes per second, which is near the set dirty rate limit of the resource group (6144 kilobytes per seconds).
资源组2的两个进程,分别统计计算,然后相加,不能超过限制。
The actual dirty rate for session 3 is calculated as follows.
- The number of blocks dirtied per millisecond (ms) is 10003 blocks / 52727.846 ms, which yields 0.18971001 blocks per millisecond.
- Multiply the result by 1000 to give the number of shared blocks dirtied per second (1 second = 1000 ms), which yields 189.71001 blocks per second.
- Multiply the result by 8.192 to give the number of kilobytes dirtied per second (1 block = 8.192 kilobytes), which yields approximately 1554 kilobytes per second.
The actual dirty rate for session 4 is calculated as follows.
- The number of blocks dirtied per millisecond (ms) is 10003 blocks / 56063.697 ms, which yields 0.17842205 blocks per millisecond. Multiply the result by 1000 to give the number of shared blocks dirtied per second (1 second = 1000 ms), which yields 178.42205 blocks per second.
- Multiply the result by 8.192 to give the number of kilobytes dirtied per second (1 block = 8.192 kilobytes), which yields approximately 1462 kilobytes per second.
The combined dirty rate from session 3 (1554 kilobytes per second) and from session 4 (1462 kilobytes per second) yields 3016 kilobytes per second, which is near the set dirty rate limit of the resource group (3072 kilobytes per seconds).
Thus, this demonstrates how EDB Resource Manager keeps the aggregate dirty rate of the active processes in its groups close to the dirty rate limit set for each group.
略,实际上也是算出resource group中每个进程的平均cpu消耗,相加,不超过resource group配置的cpu (核数)限制。
edb=# ALTER RESOURCE GROUP resgrp_b SET cpu_rate_limit TO .4; -- 0.4 核
ALTER RESOURCE GROUP
edb=# ALTER RESOURCE GROUP resgrp_c SET cpu_rate_limit TO .3; -- 0.4 核
ALTER RESOURCE GROUP
$ top
top - 17:45:09 up 5:23, 8 users, load average: 0.47, 0.17, 0.26
Tasks: 203 total, 4 running, 199 sleeping, 0 stopped, 0 zombie
Cpu(s): 70.2%us, 0.0%sy, 0.0%ni, 29.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0
Mem: 1025624k total, 806140k used, 219484k free, 25296k buffers
Swap: 103420k total, 13404k used, 90016k free, 374092k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29857 enterpri 20 0 195m 4820 3324 S 19.9 0.5 4:25.02 edb-postgres
28915 enterpri 20 0 195m 5900 4212 R 19.6 0.6 9:07.50 edb-postgres
29023 enterpri 20 0 195m 4744 3248 R 16.3 0.5 4:01.73 edb-postgres
11019 enterpri 20 0 195m 4120 2764 R 15.3 0.4 0:04.92 edb-postgres
2907 user 20 0 98.7m 12m 9112 S 1.3 1.2 0:56.54 vmware-user-lo
3040 user 20 0 278m 22m 14m S 1.3 2.2 4:38.73 knotify4
The two resource groups in use have CPU usage limits of 40% and 30%. The sum of the %CPU column for the first two edb-postgres processes is 39.5 (approximately 40%, which is the limit for resgrp_b) and the sum of the %CPU column for the third and fourth edb-postgres processes is 31.6 (approximately 30%, which is the limit for resgrp_c).
The sum of the CPU usage limits of the two resource groups to which these processes belong is 70%. The following output shows that the sum of the four processes borders around 70%.
$ while [[ 1 -eq 1 ]]; do top -d0.5 -b -n2 | grep edb-postgres | awk '{ SUM += $9} END { print SUM / 2 }'; done
61.8
76.4
72.6
69.55
64.55
79.95
68.55
71.25
74.85
62
74.85
76.9
72.4
65.9
74.9
68.25
查看集群中的用户、数据库级资源控制配置参数。
postgres=#
select
coalesce(datname,'ALL'),
coalesce(rolname,'ALL'),
setconfig
from pg_db_role_setting t1
left join pg_database t2
on (t1.setdatabase=t2.oid)
left join pg_roles t3
on (t1.setrole=t3.oid);
dbs | roles | setconfig
----------+----------+--------------------------
ALL | postgres | {edb_resource_group=rs1}
ALL | r1 | {edb_resource_group=rs2}
edb | ALL | {edb_resource_group=rs2}
postgres | ALL | {edb_resource_group=rs1}
(4 rows)
查看当前资源组的资源使用情况。
per_process_cpu_rate_limit 表示这个组内,当前平均每个进程的CPU核数使用限制。
per_process_dirty_rate_limit 表示这个组内,当前平均每个进程限制的产生dirty page的速度(KB/s)
随着进程数的加减,平均值会有变化。每个进程的限制并不是平均数,组内的进程可以相互抢资源,如果只有一个活跃进程在组内,则这个活跃进程可以使用这个组的所有资源。
postgres=# select * from edb_all_resource_groups ;
group_name | active_processes | cpu_rate_limit | per_process_cpu_rate_limit | dirty_rate_limit | per_process_dirty_rate_limit
------------+------------------+----------------+----------------------------+------------------+------------------------------
b | 1 | 0 | 16777216 | 0 | 16777216
abc | 0 | 32 | | 409600 |
(2 rows)
postgres=# select * from edb_all_resource_groups ;
group_name | active_processes | cpu_rate_limit | per_process_cpu_rate_limit | dirty_rate_limit | per_process_dirty_rate_limit
------------+------------------+----------------+----------------------------+------------------+------------------------------
rs1 | 1 | 4 | 16777216 | 10000 | 16777216
rs3 | 0 | 0 | | 0 |
rs2 | 0 | 2 | | 2000 |
(3 rows)
查看当前会话属于哪个资源组。
postgres=# show edb_resource_group ;
edb_resource_group
--------------------
b
(1 row)
https://www.enterprisedb.com/docs/en/10.0/EPAS_Guide_v10/toc.html
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.