Skip to content

Commit 92f4f0b

Browse files
committed
new doc
1 parent 2117aa7 commit 92f4f0b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+7241
-3
lines changed

201507/20150730_01.md

Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
## PostgreSQL database cann't startup because memory overcommit
2+
3+
### 作者
4+
digoal
5+
6+
### 日期
7+
2015-07-30
8+
9+
### 标签
10+
PostgreSQL , oom , 资源限制
11+
12+
----
13+
14+
## 背景
15+
你可能遇到过类似的数据库无法启动的问题,
16+
17+
```
18+
postgres@digoal-> FATAL: XX000: could not map anonymous shared memory: Cannot allocate memory
19+
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 3322716160 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
20+
LOCATION: CreateAnonymousSegment, pg_shmem.c:398
21+
```
22+
23+
通过查看meminfo可以得到原因。
24+
25+
```
26+
CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
27+
this is the total amount of memory currently available to
28+
be allocated on the system. This limit is only adhered to
29+
if strict overcommit accounting is enabled (mode 2 in
30+
'vm.overcommit_memory').
31+
The CommitLimit is calculated with the following formula:
32+
CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
33+
overcommit_ratio / 100 + [total swap pages]
34+
For example, on a system with 1G of physical RAM and 7G
35+
of swap with a `vm.overcommit_ratio` of 30 it would
36+
yield a CommitLimit of 7.3G.
37+
For more details, see the memory overcommit documentation
38+
in vm/overcommit-accounting.
39+
Committed_AS: The amount of memory presently allocated on the system.
40+
The committed memory is a sum of all of the memory which
41+
has been allocated by processes, even if it has not been
42+
"used" by them as of yet. A process which malloc()'s 1G
43+
of memory, but only touches 300M of it will show up as
44+
using 1G. This 1G is memory which has been "committed" to
45+
by the VM and can be used at any time by the allocating
46+
application. With strict overcommit enabled on the system
47+
(mode 2 in 'vm.overcommit_memory'),allocations which would
48+
exceed the CommitLimit (detailed above) will not be permitted.
49+
This is useful if one needs to guarantee that processes will
50+
not fail due to lack of memory once that memory has been
51+
successfully allocated.
52+
```
53+
54+
依据vm.overcommit_memory设置的值,
55+
56+
当vm.overcommit_memory=0时,不允许普通用户overcommit, 但是允许root用户轻微的overcommit。
57+
58+
当vm.overcommit_memory=1时,允许overcommit.
59+
60+
当vm.overcommit_memory=2时,Committed_AS不能大于CommitLimit。
61+
62+
commit 限制 计算方法
63+
64+
```
65+
The CommitLimit is calculated with the following formula:
66+
CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
67+
overcommit_ratio / 100 + [total swap pages]
68+
For example, on a system with 1G of physical RAM and 7G
69+
of swap with a `vm.overcommit_ratio` of 30 it would
70+
yield a CommitLimit of 7.3G.
71+
[root@digoal postgresql-9.4.4]# free
72+
total used free shared buffers cached
73+
Mem: 1914436 713976 1200460 72588 32384 529364
74+
-/+ buffers/cache: 152228 1762208
75+
Swap: 1048572 542080 506492
76+
[root@digoal ~]# cat /proc/meminfo |grep Commit
77+
CommitLimit: 2005788 kB
78+
Committed_AS: 132384 kB
79+
```
80+
81+
这个例子的2G就是以上公式计算得来。
82+
83+
overcommit限制的初衷是malloc后,内存并不是立即使用掉,所以如果多个进程同时申请一批内存的话,不允许OVERCOMMIT可能导致某些进程申请内存失败,但实际上内存是还有的。所以Linux内核给出了几种选择,2是比较靠谱或者温柔的做法。1的话风险有点大,因为可能会导致OOM。
84+
85+
所以当数据库无法启动时,要么你降低一下数据库申请内存的大小(例如降低shared_buffer或者max conn),要么就是修改一下overcommit的风格。
86+
87+
## 参考
88+
1\. kernel-doc-2.6.32/Documentation/filesystems/proc.txt
89+
90+
```
91+
MemTotal: Total usable ram (i.e. physical ram minus a few reserved
92+
bits and the kernel binary code)
93+
MemFree: The sum of LowFree+HighFree
94+
MemAvailable: An estimate of how much memory is available for starting new
95+
applications, without swapping. Calculated from MemFree,
96+
SReclaimable, the size of the file LRU lists, and the low
97+
watermarks in each zone.
98+
The estimate takes into account that the system needs some
99+
page cache to function well, and that not all reclaimable
100+
slab will be reclaimable, due to items being in use. The
101+
impact of those factors will vary from system to system.
102+
This line is only reported if sysctl vm.meminfo_legacy_layout = 0
103+
Buffers: Relatively temporary storage for raw disk blocks
104+
shouldn't get tremendously large (20MB or so)
105+
Cached: in-memory cache for files read from the disk (the
106+
pagecache). Doesn't include SwapCached
107+
SwapCached: Memory that once was swapped out, is swapped back in but
108+
still also is in the swapfile (if memory is needed it
109+
doesn't need to be swapped out AGAIN because it is already
110+
in the swapfile. This saves I/O)
111+
Active: Memory that has been used more recently and usually not
112+
reclaimed unless absolutely necessary.
113+
Inactive: Memory which has been less recently used. It is more
114+
eligible to be reclaimed for other purposes
115+
HighTotal:
116+
HighFree: Highmem is all memory above ~860MB of physical memory
117+
Highmem areas are for use by userspace programs, or
118+
for the pagecache. The kernel must use tricks to access
119+
this memory, making it slower to access than lowmem.
120+
LowTotal:
121+
LowFree: Lowmem is memory which can be used for everything that
122+
highmem can be used for, but it is also available for the
123+
kernel's use for its own data structures. Among many
124+
other things, it is where everything from the Slab is
125+
allocated. Bad things happen when you're out of lowmem.
126+
SwapTotal: total amount of swap space available
127+
SwapFree: Memory which has been evicted from RAM, and is temporarily
128+
on the disk
129+
Dirty: Memory which is waiting to get written back to the disk
130+
Writeback: Memory which is actively being written back to the disk
131+
AnonPages: Non-file backed pages mapped into userspace page tables
132+
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
133+
Mapped: files which have been mmaped, such as libraries
134+
Slab: in-kernel data structures cache
135+
SReclaimable: Part of Slab, that might be reclaimed, such as caches
136+
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
137+
PageTables: amount of memory dedicated to the lowest level of page
138+
tables.
139+
NFS_Unstable: NFS pages sent to the server, but not yet committed to stable
140+
storage
141+
Bounce: Memory used for block device "bounce buffers"
142+
WritebackTmp: Memory used by FUSE for temporary writeback buffers
143+
CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
144+
this is the total amount of memory currently available to
145+
be allocated on the system. This limit is only adhered to
146+
if strict overcommit accounting is enabled (mode 2 in
147+
'vm.overcommit_memory').
148+
The CommitLimit is calculated with the following formula:
149+
CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
150+
overcommit_ratio / 100 + [total swap pages]
151+
For example, on a system with 1G of physical RAM and 7G
152+
of swap with a `vm.overcommit_ratio` of 30 it would
153+
yield a CommitLimit of 7.3G.
154+
For more details, see the memory overcommit documentation
155+
in vm/overcommit-accounting.
156+
Committed_AS: The amount of memory presently allocated on the system.
157+
The committed memory is a sum of all of the memory which
158+
has been allocated by processes, even if it has not been
159+
"used" by them as of yet. A process which malloc()'s 1G
160+
of memory, but only touches 300M of it will show up as
161+
using 1G. This 1G is memory which has been "committed" to
162+
by the VM and can be used at any time by the allocating
163+
application. With strict overcommit enabled on the system
164+
(mode 2 in 'vm.overcommit_memory'),allocations which would
165+
exceed the CommitLimit (detailed above) will not be permitted.
166+
This is useful if one needs to guarantee that processes will
167+
not fail due to lack of memory once that memory has been
168+
successfully allocated.
169+
VmallocTotal: total size of vmalloc memory area
170+
VmallocUsed: amount of vmalloc area which is used
171+
VmallocChunk: largest contiguous block of vmalloc area which is free
172+
```
173+
174+
2\. kernel-doc-2.6.32/Documentation/vm/overcommit-accounting
175+
176+
```
177+
The Linux kernel supports the following overcommit handling modes
178+
179+
0 - Heuristic overcommit handling. Obvious overcommits of
180+
address space are refused. Used for a typical system. It
181+
ensures a seriously wild allocation fails while allowing
182+
overcommit to reduce swap usage. root is allowed to
183+
allocate slighly more memory in this mode. This is the
184+
default.
185+
186+
1 - Always overcommit. Appropriate for some scientific
187+
applications.
188+
189+
2 - Don't overcommit. The total address space commit
190+
for the system is not permitted to exceed swap + a
191+
configurable amount (default is 50%) of physical RAM.
192+
Depending on the amount you use, in most situations
193+
this means a process will not be killed while accessing
194+
pages but will receive errors on memory allocation as
195+
appropriate.
196+
197+
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
198+
199+
The overcommit amount can be set via `vm.overcommit_ratio' (percentage)
200+
or `vm.overcommit_kbytes' (absolute value).
201+
202+
The current overcommit limit and amount committed are viewable in
203+
/proc/meminfo as CommitLimit and Committed_AS respectively.
204+
205+
Gotchas
206+
-------
207+
208+
The C language stack growth does an implicit mremap. If you want absolute
209+
guarantees and run close to the edge you MUST mmap your stack for the
210+
largest size you think you will need. For typical stack usage this does
211+
not matter much but it's a corner case if you really really care
212+
213+
In mode 2 the MAP_NORESERVE flag is ignored.
214+
215+
216+
How It Works
217+
------------
218+
219+
The overcommit is based on the following rules
220+
221+
For a file backed map
222+
SHARED or READ-only - 0 cost (the file is the map not swap)
223+
PRIVATE WRITABLE - size of mapping per instance
224+
225+
For an anonymous or /dev/zero map
226+
SHARED - size of mapping
227+
PRIVATE READ-only - 0 cost (but of little use)
228+
PRIVATE WRITABLE - size of mapping per instance
229+
230+
Additional accounting
231+
Pages made writable copies by mmap
232+
shmfs memory drawn from the same pool
233+
234+
Status
235+
------
236+
237+
o We account mmap memory mappings
238+
o We account mprotect changes in commit
239+
o We account mremap changes in size
240+
o We account brk
241+
o We account munmap
242+
o We report the commit status in /proc
243+
o Account and check on fork
244+
o Review stack handling/building on exec
245+
o SHMfs accounting
246+
o Implement actual limit enforcement
247+
248+
To Do
249+
-----
250+
o Account ptrace pages (this is hard)
251+
```
252+

201507/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
### 文章列表
22
----
3+
##### 20150730_01.md [《PostgreSQL database cann't startup because memory overcommit》](20150730_01.md)
34
##### 20150717_02.md [《PostgreSQL (User defined Operator) UDO & Operator Optimization Information》](20150717_02.md)
45
##### 20150717_01.md [《PostgreSQL function's SECURITY DEFINER | INVOKER, SET configuration_parameter { TO value | = value | FROM CURRENT }》](20150717_01.md)
56
##### 20150703_01.md [《PostgreSQL Oracle 兼容性之 - orafce (包、函数、DUAL)》](20150703_01.md)

0 commit comments

Comments
 (0)