Skip to content

atheriel/xrprof

Repository files navigation

xrprof

GitHub Actions CI Status travis-ci build status

xrprof (formerly rtrace) is an external sampling profiler for R on Linux and Windows.

Many R users will be familiar with using the built-in sampling profiler Rprof() to generate data on what their code is doing, and there are several excellent tools to facilitate understanding these samples (or serve as a front-end), including the profvis package.

However, the reach of Rprof() and related tools is limited: the profiler is "internal", in the sense that it must be manually switched on to work, either during interactive work (for example, to profile an individual function), or perhaps by modifying the script to include Rprof() calls before running it again.

In contrast, xrprof can be used to profile code that is already running:

$ Rscript myscript.R &
# sudo may be required.
$ xrprof -p <PID> -F 50 > Rprof.out

External sampling profilers have proven extremely useful for diagnosing and fixing performance issues (or other bugs) in production environments. This project joins a large list similar tools for other languages, such as perf (the Linux system profiler), jstack (for Java), rbspy (for Ruby), Pyflame (for Python), VSPerfCmd for C#/.NET, and many others.

Building

On Linux

xrprof depends on libelf and libunwind, so you must have their headers to compile the program. For example, on Debian-based systems (including Ubuntu), you can install these with

$ sudo apt-get install libelf-dev libunwind-dev libcap2-bin

A simple Makefile is provided. Build the binary with

$ git clone https://github.com/atheriel/xrprof
$ cd xrprof
$ make

To install the profiler to your system, use

$ sudo make install

This will install the binary to /usr/local/bin and use setcap to mark it for use without sudo. The install target supports prefix and DESTDIR.

On Windows

You must have a build environment set up. For R users, the best option is to use R's own Rtools for Windows (which is also used to install packages from source). You can then launch "Rtools MinGW 64-bit" from the Start Menu and navigate to the source directory; then run

$ git clone https://github.com/atheriel/xrprof
$ cd xrprof
$ make -f Makefile.win

The resulting xrprof.exe program can be run from cmd.exe or PowerShell.

Usage

The profiler has a simple interface:

Usage: xrprof [-F <freq>] [-d <duration>] -p <pid>

The Rprof.out format is written to standard output and errors or other messages are written to standard error.

On Windows, R's process ID (PID) can be looked up in Task Manager.

Along with the sampling profiler itself, there is also a stackcollapse-Rprof.R script in tools/ that converts the Rprof.out format to one that can be understood by Brendan Gregg's FlameGraph tool. You can use this to produce graphs like the one below:

$ stackcollapse-Rprof.R Rprof.out | flamegraph.pl > Rprof.svg

Example FlameGraph

Running Under Docker

A public Docker image is available at atheriel/xrprof. Since xrprof reads the memory of other running programs, it must be run as a privileged container in the host PID namespace. For example:

$ docker run --privileged --pid=host -it atheriel/xrprof -p <PID>

Okay, How Does it Work?

Much like other sampling profilers, the program uses Linux's ptrace system calls to attach to running R processes and a mix of ptrace and process_vm_readv to read the memory contents of that process, following pointers along the way.

The R-specific aspect of this is to locate and decode the R_GlobalContext structure inside of the R interpreter that stores information on the currently executing R code.

In order to defeat address space randomization, xrprof will search through the ELF files loaded into memory (at /proc/<pid>/maps) for the symbols required, either in the executable itself or in libR.so (if it appears R has been compiled to use it).

xrprof is mount-namespace-aware, so it supports profiling R processes running inside Docker containers.

On Windows, xrprof makes use of APIs like ReadProcessMemory(), NtSuspendProcess(), and SymFromName() to achieve the analogous result.

Credits

The project was inspired by Julia Evan's blog posts on writing rbspy and later by my discovery of Evan Klitzke's work (and writing) on Pyflame.

License

This project contains portions of the source code of R itself, which is copyright the R Core Developers and licensed under the GPLv2.

The remaining code is copyright its authors and also available under the same license, GPLv2.