Skip to content

Commit 9083861

Browse files
committed
new doc
1 parent d8a266e commit 9083861

File tree

4 files changed

+336
-0
lines changed

4 files changed

+336
-0
lines changed

201803/20180325_02.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,74 @@ https://lwn.net/Articles/376606/
393393
yum install -y libhugetlbfs*
394394
```
395395

396+
### 直接mount hugetlbfs
397+
https://www.ibm.com/developerworks/cn/linux/l-cn-hugetlb/index.html
398+
399+
本文的例子摘自 Linux 内核源码中提供的有关说明文档 (Documentation/vm/hugetlbpage.txt) 。使用 hugetlbfs 之前,首先需要在编译内核 (make menuconfig) 时配置CONFIG_HUGETLB_PAGE和CONFIG_HUGETLBFS选项,这两个选项均可在 File systems 内核配置菜单中找到。
400+
401+
内核编译完成并成功启动内核之后,将 hugetlbfs 特殊文件系统挂载到根文件系统的某个目录上去,以使得 hugetlbfs 可以访问。命令如下:
402+
403+
```
404+
mount none /mnt/huge -t hugetlbfs
405+
```
406+
407+
此后,只要是在 /mnt/huge/ 目录下创建的文件,将其映射到内存中时都会使用 2MB 作为分页的基本单位。值得一提的是,hugetlbfs 中的文件是不支持读 / 写系统调用 ( 如read()或write()等 ) 的,一般对它的访问都是以内存映射的形式进行的。为了更好地介绍大页面的应用,接下来将给出一个大页面应用的例子,该例子同样也是摘自于上述提到的内核文档,只是略有简化。
408+
409+
直接read,write会报错:
410+
411+
```
412+
mount none /mnt -t hugetlbfs
413+
414+
cd /mnt
415+
416+
[root@pg11-test mnt]# dd if=/dev/zero of=./test.img bs=1M count=1000
417+
dd: error writing ‘./test.img’: Invalid argument
418+
1+0 records in
419+
0+0 records out
420+
0 bytes (0 B) copied, 0.000605269 s, 0.0 kB/s
421+
```
422+
423+
可以这样使用
424+
425+
```
426+
清单 1. Linux 大页面应用示例
427+
428+
#include <fcntl.h>
429+
#include <sys/mman.h>
430+
#include <errno.h>
431+
432+
#define MAP_LENGTH (10*1024*1024)
433+
434+
int main()
435+
{
436+
int fd;
437+
void * addr;
438+
439+
/* create a file in hugetlb fs */
440+
fd = open("/mnt/huge/test", O_CREAT | O_RDWR);
441+
if(fd < 0){
442+
perror("Err: ");
443+
return -1;
444+
}
445+
446+
/* map the file into address space of current application process */
447+
addr = mmap(0, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
448+
if(addr == MAP_FAILED){
449+
perror("Err: ");
450+
close(fd);
451+
unlink("/mnt/huge/test");
452+
return -1;
453+
}
454+
455+
/* from now on, you can store application data on huage pages via addr */
456+
457+
munmap(addr, MAP_LENGTH);
458+
close(fd);
459+
unlink("/mnt/huge/test");
460+
return 0;
461+
}
462+
```
463+
396464
## 小结
397465
1、查看、修改Linux目前支持的大页大小。
398466

201902/20190211_03.md

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
## linux 内存文件系统使用 - tmpfs, ramfs, shmfs
2+
3+
### 作者
4+
digoal
5+
6+
### 日期
7+
2019-02-11
8+
9+
### 标签
10+
PostgreSQL , hugetlbfs , hugepage , memory filesystem , ramfs , tmpfs , shmfs
11+
12+
----
13+
14+
## 背景
15+
在做一些测试时,如果IO设备很烂的话,可以直接使用内存文件系统,避免IO上引入的一些开销影响测试结果。
16+
17+
用法很简单:
18+
19+
### tmpfs or shmfs
20+
mount a shmfs with a certain size to /dev/shm, and set the correct permissions.
21+
22+
For tmpfs you do not need to specify a size. Tmpfs or shmfs allocated memory is pageable.
23+
24+
For example:
25+
26+
Example Mount shmfs:
27+
28+
```
29+
# mount -t shm shmfs -o size=20g /dev/shm
30+
31+
Edit /etc/fstab:
32+
33+
shmfs /dev/shm shm size=20g 0 0
34+
```
35+
36+
OR
37+
38+
Example Mount tmpfs:
39+
40+
```
41+
# mount –t tmpfs tmpfs /dev/shm
42+
43+
Edit /etc/fstab:
44+
45+
none /dev/shm tmpfs defaults 0 0
46+
```
47+
48+
### ramfs
49+
ramfs is similar to shmfs, except that pages are not pageable or swappable.
50+
51+
This approach provides the commonly desired effect. ramfs is created by:
52+
53+
```
54+
umount /dev/shm
55+
56+
mount -t ramfs ramfs /dev/shm
57+
```
58+
59+
## 例子
60+
61+
```
62+
[root@pg11-test ~]# mkdir /mnt/tmpfs
63+
[root@pg11-test ~]# mkdir /mnt/ramfs
64+
```
65+
66+
1、tmpfs
67+
68+
```
69+
mount -t tmpfs tmpfs /mnt/tmpfs -o size=10G,noatime,nodiratime,rw
70+
mkdir /mnt/tmpfs/a
71+
chmod 777 /mnt/tmpfs/a
72+
```
73+
74+
2、ramfs
75+
76+
```
77+
mount -t ramfs ramfs /mnt/ramfs -o noatime,nodiratime,rw,data=writeback,nodelalloc,nobarrier
78+
mkdir /mnt/ramfs/a
79+
chmod 777 /mnt/ramfs/a
80+
```
81+
82+
ramfs无法在mount时限制大小,即使限制了也不起作用,在df结果中也看不到这个挂载点,但是实际上已经挂载。
83+
84+
```
85+
[root@pg11-test ~]# mount
86+
tmpfs on /mnt/tmpfs type tmpfs (rw,noatime,nodiratime,size=10485760k)
87+
ramfs on /mnt/ramfs type ramfs (rw,noatime,nodiratime,data=writeback,nodelalloc,nobarrier)
88+
89+
[root@pg11-test ~]# df -h
90+
Filesystem Size Used Avail Use% Mounted on
91+
/dev/vda1 197G 17G 171G 9% /
92+
devtmpfs 252G 0 252G 0% /dev
93+
tmpfs 252G 936K 252G 1% /dev/shm
94+
tmpfs 252G 676K 252G 1% /run
95+
tmpfs 252G 0 252G 0% /sys/fs/cgroup
96+
/dev/mapper/vgdata01-lv03 4.0T 549G 3.5T 14% /data03
97+
/dev/mapper/vgdata01-lv02 4.0T 335G 3.7T 9% /data02
98+
/dev/mapper/vgdata01-lv01 4.0T 1.5T 2.6T 37% /data01
99+
tmpfs 51G 0 51G 0% /run/user/0
100+
/dev/mapper/vgdata01-lv04 2.0T 621G 1.3T 32% /data04
101+
tmpfs 10G 0 10G 0% /mnt/tmpfs
102+
```
103+
104+
### 内存文件系统性能
105+
#### PostgreSQL fsync测试接口,测试内存文件系统fsync性能。
106+
107+
```
108+
su - digoal
109+
110+
111+
digoal@pg11-test-> pg_test_fsync -f /mnt/tmpfs/a/1
112+
5 seconds per test
113+
O_DIRECT supported on this platform for open_datasync and open_sync.
114+
115+
Compare file sync methods using one 8kB write:
116+
(in wal_sync_method preference order, except fdatasync is Linux's default)
117+
open_datasync n/a*
118+
fdatasync 1137033.436 ops/sec 1 usecs/op
119+
fsync 1146431.736 ops/sec 1 usecs/op
120+
fsync_writethrough n/a
121+
open_sync n/a*
122+
* This file system and its mount options do not support direct
123+
I/O, e.g. ext4 in journaled mode.
124+
125+
Compare file sync methods using two 8kB writes:
126+
(in wal_sync_method preference order, except fdatasync is Linux's default)
127+
open_datasync n/a*
128+
fdatasync 622763.705 ops/sec 2 usecs/op
129+
fsync 625990.998 ops/sec 2 usecs/op
130+
fsync_writethrough n/a
131+
open_sync n/a*
132+
* This file system and its mount options do not support direct
133+
I/O, e.g. ext4 in journaled mode.
134+
135+
Compare open_sync with different write sizes:
136+
(This is designed to compare the cost of writing 16kB in different write
137+
open_sync sizes.)
138+
1 * 16kB open_sync write n/a*
139+
2 * 8kB open_sync writes n/a*
140+
4 * 4kB open_sync writes n/a*
141+
8 * 2kB open_sync writes n/a*
142+
16 * 1kB open_sync writes n/a*
143+
144+
Test if fsync on non-write file descriptor is honored:
145+
(If the times are similar, fsync() can sync data written on a different
146+
descriptor.)
147+
write, fsync, close 317779.892 ops/sec 3 usecs/op
148+
write, close, fsync 317769.037 ops/sec 3 usecs/op
149+
150+
Non-sync'ed 8kB writes:
151+
write 529490.541 ops/sec 2 usecs/op
152+
153+
digoal@pg11-test-> pg_test_fsync -f /mnt/ramfs/a/1
154+
5 seconds per test
155+
O_DIRECT supported on this platform for open_datasync and open_sync.
156+
157+
Compare file sync methods using one 8kB write:
158+
(in wal_sync_method preference order, except fdatasync is Linux's default)
159+
open_datasync n/a*
160+
fdatasync 1146515.453 ops/sec 1 usecs/op
161+
fsync 1149912.760 ops/sec 1 usecs/op
162+
fsync_writethrough n/a
163+
open_sync n/a*
164+
* This file system and its mount options do not support direct
165+
I/O, e.g. ext4 in journaled mode.
166+
167+
Compare file sync methods using two 8kB writes:
168+
(in wal_sync_method preference order, except fdatasync is Linux's default)
169+
open_datasync n/a*
170+
fdatasync 621456.930 ops/sec 2 usecs/op
171+
fsync 624811.200 ops/sec 2 usecs/op
172+
fsync_writethrough n/a
173+
open_sync n/a*
174+
* This file system and its mount options do not support direct
175+
I/O, e.g. ext4 in journaled mode.
176+
177+
Compare open_sync with different write sizes:
178+
(This is designed to compare the cost of writing 16kB in different write
179+
open_sync sizes.)
180+
1 * 16kB open_sync write n/a*
181+
2 * 8kB open_sync writes n/a*
182+
4 * 4kB open_sync writes n/a*
183+
8 * 2kB open_sync writes n/a*
184+
16 * 1kB open_sync writes n/a*
185+
186+
Test if fsync on non-write file descriptor is honored:
187+
(If the times are similar, fsync() can sync data written on a different
188+
descriptor.)
189+
write, fsync, close 314754.770 ops/sec 3 usecs/op
190+
write, close, fsync 314509.045 ops/sec 3 usecs/op
191+
192+
Non-sync'ed 8kB writes:
193+
write 517299.869 ops/sec 2 usecs/op
194+
```
195+
196+
#### 本地磁盘性能如下:
197+
198+
```
199+
digoal@pg11-test-> pg_test_fsync -f /data01/digoal/1
200+
5 seconds per test
201+
O_DIRECT supported on this platform for open_datasync and open_sync.
202+
203+
Compare file sync methods using one 8kB write:
204+
(in wal_sync_method preference order, except fdatasync is Linux's default)
205+
open_datasync 46574.176 ops/sec 21 usecs/op
206+
fdatasync 40183.743 ops/sec 25 usecs/op
207+
fsync 36875.852 ops/sec 27 usecs/op
208+
fsync_writethrough n/a
209+
open_sync 42927.560 ops/sec 23 usecs/op
210+
211+
Compare file sync methods using two 8kB writes:
212+
(in wal_sync_method preference order, except fdatasync is Linux's default)
213+
open_datasync 17121.111 ops/sec 58 usecs/op
214+
fdatasync 26438.641 ops/sec 38 usecs/op
215+
fsync 24562.907 ops/sec 41 usecs/op
216+
fsync_writethrough n/a
217+
open_sync 15698.199 ops/sec 64 usecs/op
218+
219+
Compare open_sync with different write sizes:
220+
(This is designed to compare the cost of writing 16kB in different write
221+
open_sync sizes.)
222+
1 * 16kB open_sync write 28793.172 ops/sec 35 usecs/op
223+
2 * 8kB open_sync writes 15720.156 ops/sec 64 usecs/op
224+
4 * 4kB open_sync writes 10007.818 ops/sec 100 usecs/op
225+
8 * 2kB open_sync writes 5698.259 ops/sec 175 usecs/op
226+
16 * 1kB open_sync writes 3116.232 ops/sec 321 usecs/op
227+
228+
Test if fsync on non-write file descriptor is honored:
229+
(If the times are similar, fsync() can sync data written on a different
230+
descriptor.)
231+
write, fsync, close 33399.473 ops/sec 30 usecs/op
232+
write, close, fsync 33216.001 ops/sec 30 usecs/op
233+
234+
Non-sync'ed 8kB writes:
235+
write 376584.982 ops/sec 3 usecs/op
236+
```
237+
238+
性能对比,显而易见。
239+
240+
## 其他
241+
mount hugetlbfs,使用huge page的文件系统,但是不支持read, write接口,需要使用mmap的用法。
242+
243+
详见
244+
245+
https://www.ibm.com/developerworks/cn/linux/l-cn-hugetlb/index.html
246+
247+
## 参考
248+
https://docs.oracle.com/cd/E11882_01/server.112/e10839/appi_vlm.htm#UNXAR397
249+
250+
http://www.cnblogs.com/jintianfree/p/3993893.html
251+
252+
https://lwn.net/Articles/376606/
253+
254+
https://www.ibm.com/developerworks/cn/linux/l-cn-hugetlb/index.html
255+
256+
[《PostgreSQL Huge Page 使用建议 - 大内存主机、实例注意》](../201803/20180325_02.md)
257+
258+
259+
<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a>
260+
261+
262+
## [digoal's 大量PostgreSQL文章入口](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")
263+
264+
265+
## [免费领取阿里云RDS PostgreSQL实例、ECS虚拟机](https://free.aliyun.com/ "57258f76c37864c6e6d23383d05714ea")
266+

201902/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
### 文章列表
44
----
5+
##### 20190211_03.md [《linux 内存文件系统使用 - tmpfs, ramfs, shmfs》](20190211_03.md)
56
##### 20190211_02.md [[] PostgreSQL 轻量级周边工具 pg_lightool》](20190211_02.md)
67
##### 20190211_01.md [[] PG wal日志解析工具功能增强并更名为WalMiner》](20190211_01.md)
78
##### 20190205_01.md [《KEY 管理 - kms (Key management services) , hsm (hardware security modules) , hsm aas》](20190205_01.md)

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ digoal's|PostgreSQL|文章|归类
4242

4343
### 所有文档如下
4444
----
45+
##### 201902/20190211_03.md [《linux 内存文件系统使用 - tmpfs, ramfs, shmfs》](201902/20190211_03.md)
4546
##### 201902/20190211_02.md [[] PostgreSQL 轻量级周边工具 pg_lightool》](201902/20190211_02.md)
4647
##### 201902/20190211_01.md [[] PG wal日志解析工具功能增强并更名为WalMiner》](201902/20190211_01.md)
4748
##### 201902/20190205_01.md [《KEY 管理 - kms (Key management services) , hsm (hardware security modules) , hsm aas》](201902/20190205_01.md)

0 commit comments

Comments
 (0)