Author: | Karel Ha |
---|---|
Contact: | [email protected], [email protected] |
Created on: | Sat Oct 31 2015 |
Move to a folder in AFS:
mathemage@mathemage-TTL-TEKNOPRO:~$ cd /afs/cern.ch/user/k/kha/ mathemage@mathemage-TTL-TEKNOPRO:/afs/cern.ch/user/k/kha$ mkdir bufread-lhcb
and clone the created repo on GitHub:
mathemage@mathemage-TTL-TEKNOPRO:/afs/cern.ch/user/k/kha$ git clone [email protected]:mathemage/bufread-lhcb.git
Implement buffered read for LHCb DAQ system. Current system uses blocksizes, which are suboptimal in terms of throughput. It has been discovered that the ideal blocksize for disk-read is 16 MB. However, system call for read() uses different sizes.
Therefore, it's more advisable to load from a disk into a temporary buffer in memory and then read() from this buffer instead. The size of the buffer should be 16 MB in order to exploit optimal disk-read speed.
In case of a read() that overlaps the end of buffer, a secondary buffer of 16 MB is available. Once it's reached, the new data is loaded into the original (primary) buffer and the roles of the primary and the secondary buffers are swapped.
Code has been written in C language. It compiles to .so file. With LD_PRELOAD trick it can intercept system calls open(), close() and read(), which are used by cp command.
- MIN() = preprocessor macro for minimum of two values
- *orig_open_f_type, *orig_close_f_type resp. *orig_read_f_type = pointer to functions data type, for retrieving original system version of syscalls open(), close() resp. read()
- init_buffers() = allocate and initialize memory (array of arrays) for buffers (done only once in the program)
- open() = intercepted version of open(); primary and secondary buffers (on the index of file descriptor) are allocated and pre-filled with data from file
- close() = intercepted version of close(); primary and secondary buffers (corresponding to the file descriptor) are deallocated
- read() = intercepted version of read(); see section Algorithm for the explanation
Compile by make, make all or make libbufread. This creates the shared library libbufread.so.
Remove temporary and *.so files by make clean.
The flags -fPIC and -ldl are needed to store the original versions of system calls.
bytes_to_load, btl = the maximum number of bytes that are possible to read at this moment, at the current step. It is set to minimum of count, bs - cur_pos (number of bytes left to the end of the primary buffer) and ba, because it may not be greater than any of these three values. The three diagrams of buffers shows the three possible scenarios:
- the one leading from count is the case, where count is the minimum of all three values. Note that remaining bytes now reside only in the primary buffer.
- the one leading from bs - cur_pos is the case, where bs - cur_pos is the minimum of all three values.
- If count or ba are strictly greater, then cur_pos will reach the beginning of the secondary buffer. In that case, the roles of primary and secondary buffer will be swapped and the secondary one will be pre-filled with data from the file.
- the one leading from ba is the case, where ba is the minimum of all three values.
- If count is strictly greater, then there's not enough bytes until the end of file and via the bytes_to_load this will reflect in the total bytes_read (there have been less bytes read than requested by read().
- If bs - cur_pos is strictly greater, then all bytes available reside in the current (primary) buffer. That's why the secondary buffer is crossed out.
count = the number of requested bytes left to read. This number is decreased accordingly with every load from primary buffer to the final buffer buf.
bs, BLOCKSIZE = the size of the primary and also secondary buffer. It is set to 16 MB.
cur_pos, cur_pos[], cur_pos[fd], current_positions[], current_positions[fd], in the source code also called current_positions[] = the number of bytes from the beginning of the primary buffer that have been already used / loaded. It advances every time the data are copied from the primary buffer to the final buffer buf. There's one integer variable per each file descriptor fd.
ba, bytes_available[], bytes_available[fd] = "bytes available". The number of unread bytes from the file that have been loaded to the primary and secondary buffer, but they haven't been copied to the final buffer buf yet. ba doesn't have to be a multiple of bs, in case the end of file is reached (valid bytes from the file don't reside the whole buffer, see the diagram where ba is the minimum).
buf = the final buffer given as an argument to read()
prim[], prim[fd], primary_buffers[], primary_buffers[fd] = the primary buffer corresponding to the file descriptor fd
sec[], sec[fd], secondary_buffers[], secondary_buffers[fd] = the secondary buffer corresponding to the file descriptor fd
bytes_read = the total number of bytes copied from the primary and secondary buffers to buf, altogether across all steps. It usually equals to count, unless the end of file is reached prematurely (in which case it equals to the filesize). bytes_read serves as the return value of read()
swap_buffers(), swap_buffers(fd) = exchange the role of the primary and secondary buffer by swapping their pointers
orig_read() = the original system version of the syscall read()
request to open a new file
- open the file using the original syscall open() and retrieve the file descriptor fd from the return value
- iterate over lines of whitelist.conf (i.e. a configuration file containing all directories, in which bufread is allowed)
- if the file is located in a subdirectory of some directory listed in whitelist.conf, allocate primary and secondary buffers of BLOCKSIZE size, on indices corresponding to fd
- initialize the corresponding current positions to 0
- pre-fill both buffers with initial data from the file, increasing the corresponding bytes_available value accordingly
- return fd as the file descriptor
request to read count bytes from a file descriptor fd to buffer buf
- if pointers to both buffers are not NULL (i.e. the buffers have been allocated), continue with bufread version of read(). Otherwise use the original system call read().
- until count is non-zero and there are still bytes available from the file, repeat:
- calculate bytes_to_load (see subsection Legend)
- copy bytes_to_load bytes from primary buffer (on correct reading position, determined by current_positions) to buffer buf
- shift bytes_to_load forward in buffer buf (pointer arithmetics) and in the primary buffer (by increasing current_positions[fd])
- decrease number of bytes_available and count by (already processed) bytes_to_load bytes
- increase number of bytes_read by (already read) bytes_to_load bytes
- if the secondary buffer is reached:
- swap the pointers to buffers (and thus their respective roles)
- pre-fill the secondary buffer with data from the file
- increase bytes_available accordingly by number of bytes read from the file
- re-initialize current_positions to 0
- return bytes_read
request to close the file
- close the file using the original syscall close() and retrieve the return value return_result
- deallocate both buffers provided the corresponding pointers are not NULL
- return return_result
- store abs_pathname = pathname with prepended current if pathname is a relative path
- for each line dir of whitelist.conf
- sanitize dir
- remove the trailing newline
- append a slash (if it's missing)
- prepend the current directory (if dir is a relative path)
- check if dir is a prefix of the tested abs_pathname -> if so, break the loop
- sanitize dir
- close files, clean up and return the boolean result "found"
Testing is done by copying chosen input files and checking, whether the output files differ from them. Using LD_PRELOAD trick, the regular read() is intercepted and the bufread version is used instead.
This script generates several input files with random content using /dev/urandom. The size of the files are multiples (see $factors in the script) of the chosen blocksize $bs (set to 16 MB). The writing permission are removed at the end, since these files serve as exclusively input files.
An example of a run:
mathemage@mathemage-TTL-TEKNOPRO:/afs/cern.ch/user/k/kha/bufread-lhcb/src$ ./gen-io-testfiles.sh head -c 0 < /dev/urandom > /tmp/io-testfiles/0-B.in && chmod a-w /tmp/io-testfiles/0-B.in head -c 8388608 < /dev/urandom > /tmp/io-testfiles/8388608-B.in && chmod a-w /tmp/io-testfiles/8388608-B.in head -c 16777216 < /dev/urandom > /tmp/io-testfiles/16777216-B.in && chmod a-w /tmp/io-testfiles/16777216-B.in head -c 25165824 < /dev/urandom > /tmp/io-testfiles/25165824-B.in && chmod a-w /tmp/io-testfiles/25165824-B.in head -c 33554432 < /dev/urandom > /tmp/io-testfiles/33554432-B.in && chmod a-w /tmp/io-testfiles/33554432-B.in head -c 41943040 < /dev/urandom > /tmp/io-testfiles/41943040-B.in && chmod a-w /tmp/io-testfiles/41943040-B.in head -c 50331648 < /dev/urandom > /tmp/io-testfiles/50331648-B.in && chmod a-w /tmp/io-testfiles/50331648-B.in head -c 58720256 < /dev/urandom > /tmp/io-testfiles/58720256-B.in && chmod a-w /tmp/io-testfiles/58720256-B.in head -c 67108864 < /dev/urandom > /tmp/io-testfiles/67108864-B.in && chmod a-w /tmp/io-testfiles/67108864-B.in head -c 75497472 < /dev/urandom > /tmp/io-testfiles/75497472-B.in && chmod a-w /tmp/io-testfiles/75497472-B.in head -c 1677721600 < /dev/urandom > /tmp/io-testfiles/1677721600-B.in && chmod a-w /tmp/io-testfiles/1677721600-B.in head -c 16777216000 < /dev/urandom > /tmp/io-testfiles/16777216000-B.in && chmod a-w /tmp/io-testfiles/16777216000-B.in head -c 52707134 < /dev/urandom > /tmp/io-testfiles/52707134-B.in && chmod a-w /tmp/io-testfiles/52707134-B.in head -c 527071340 < /dev/urandom > /tmp/io-testfiles/527071340-B.in && chmod a-w /tmp/io-testfiles/527071340-B.in head -c 5270713401 < /dev/urandom > /tmp/io-testfiles/5270713401-B.in && chmod a-w /tmp/io-testfiles/5270713401-B.in
This script
- recompiles the library.
- For every input file in the given directory $dir
- creates an empty output file with *.out extension.
- adjust permission rights accordingly.
- intercept read() by adding the bufread library to LD_PRELOAD variable.
- copy from the input file to the output file.
- compares the input file and the output file by diff and exits with failure if they differ.
An example of a run:
mathemage@mathemage-TTL-TEKNOPRO:/afs/cern.ch/user/k/kha/bufread-lhcb/src$ ./test-bufread.sh rm -f *.o *.so gcc -Wall -g -O2 -fPIC -ldl -shared -I../include libbufread.c -o libbufread.so Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/0-B.in /tmp/io-testfiles/0-B.out -f && diff /tmp/io-testfiles/0-B.in /tmp/io-testfiles/0-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/16777216000-B.in /tmp/io-testfiles/16777216000-B.out -f && diff /tmp/io-testfiles/16777216000-B.in /tmp/io-testfiles/16777216000-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/1677721600-B.in /tmp/io-testfiles/1677721600-B.out -f && diff /tmp/io-testfiles/1677721600-B.in /tmp/io-testfiles/1677721600-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/16777216-B.in /tmp/io-testfiles/16777216-B.out -f && diff /tmp/io-testfiles/16777216-B.in /tmp/io-testfiles/16777216-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/25165824-B.in /tmp/io-testfiles/25165824-B.out -f && diff /tmp/io-testfiles/25165824-B.in /tmp/io-testfiles/25165824-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/33554432-B.in /tmp/io-testfiles/33554432-B.out -f && diff /tmp/io-testfiles/33554432-B.in /tmp/io-testfiles/33554432-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/41943040-B.in /tmp/io-testfiles/41943040-B.out -f && diff /tmp/io-testfiles/41943040-B.in /tmp/io-testfiles/41943040-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/50331648-B.in /tmp/io-testfiles/50331648-B.out -f && diff /tmp/io-testfiles/50331648-B.in /tmp/io-testfiles/50331648-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/5270713401-B.in /tmp/io-testfiles/5270713401-B.out -f && diff /tmp/io-testfiles/5270713401-B.in /tmp/io-testfiles/5270713401-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/527071340-B.in /tmp/io-testfiles/527071340-B.out -f && diff /tmp/io-testfiles/527071340-B.in /tmp/io-testfiles/527071340-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/52707134-B.in /tmp/io-testfiles/52707134-B.out -f && diff /tmp/io-testfiles/52707134-B.in /tmp/io-testfiles/52707134-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/58720256-B.in /tmp/io-testfiles/58720256-B.out -f && diff /tmp/io-testfiles/58720256-B.in /tmp/io-testfiles/58720256-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/67108864-B.in /tmp/io-testfiles/67108864-B.out -f && diff /tmp/io-testfiles/67108864-B.in /tmp/io-testfiles/67108864-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/75497472-B.in /tmp/io-testfiles/75497472-B.out -f && diff /tmp/io-testfiles/75497472-B.in /tmp/io-testfiles/75497472-B.out Executing: LD_PRELOAD=:./libbufread.so cp /tmp/io-testfiles/8388608-B.in /tmp/io-testfiles/8388608-B.out -f && diff /tmp/io-testfiles/8388608-B.in /tmp/io-testfiles/8388608-B.out
The script identical to the one as above, with restriction to input files of size < 30 MB.
The script tests intercepted cp for relative filepaths. It copies and compares all the infiles in the test/ folder.
These are some suggestions for future work and how to improve the project:
- Check for file permissions and flags. Possibly duplicate them (see open() function).
- Set open_files_limit dynamically, perhaps using ulimit -n.
- As of now, even the destinations of cp must be in the whitelist -> deal with this by adding a new feature?
- Set the path to whitelist.conf in a more flexible way: now it needs to be in the same folder as libbufread.so -> with system variable? It cannot be a command-line argument, though, because we are passing libbufread.so as a shared library.
- Stress-test bufread in a multi-threaded / parallel environment, when there are several threads / instances of cp copying between various files. My supervisor Rainer Schwemmer recommends fio tool for this task: http://freecode.com/projects/fio
- Implement and use the safe version of malloc() instead. This safe_malloc() should check, if memory was allocated and the returned pointer is not NULL, otherwise the program should exit.