Skip to content

Latest commit

 

History

History
78 lines (67 loc) · 3.82 KB

conceptual.org

File metadata and controls

78 lines (67 loc) · 3.82 KB

Conceptual Introduction

In most operating system APIs, when you run a syscall, you implicitly operate on the current process. In rsyscall, we explicitly specify the process in which we want to run a syscall.

For rsyscall to create a new process in which the user can run syscalls, it starts a process running the “rsyscall server stub”, which reads syscall requests from a file descriptor and sends back responses.

Several parts of the Linux API are non-trivial to use in such an environment; fork, clone and exec, among others. We have designed and implemented clean methods for using such syscalls, and make them all available to the user.

The rsyscall API allows us to initialize new processes in a new way. Some other systems specify all the characteristics of the new process up front, like NT’s CreateProcess or POSIX’s posix_spawn; but that requires explicit support for everything we want to change about the new process. Still other systems copy the attributes for the new process from the current process, like fork; but that’s inefficient.

rsyscall allows you to create a new process which shares everything (address space, file descriptor tables, etc) with the current process, and so is cheap to create, as with traditional threading models. Then, you can mutate this process by calling arbitrary syscalls inside of it, and gradually unshare things through calls to unshare and execve. This is more efficient than fork, and more powerful than posix_spawn.

Besides the efficiency benefit, there’s also a comprehensibility benefit of rsyscall relative to fork (and similar calls like vfork). fork is a single system call which returns twice, and splits your program into two contexts of execution; among other things, this makes it difficult to coordinate actions between both contexts.

In rsyscall there is a single running program, which is a completely conventional straight-line program. This single running program explicitly acts on both processes in a “single-threaded” manner, freely interleaving actions in either process. For example, one line can open a file in the child process, then the next line can use that file in the parent process, since the file descriptor table is shared.

In the classification of threading systems as “1:1”, “N:1”, and “M:N”, rsyscall is in a new category: “1:N”. A single “application-level thread” maps on to multiple “kernel-level entities”. This is possible and useful because we explicitly denote which “kernel-level entity” (which process) we want to run a syscall in.

In addition to the process creation and manipulation benefits, we also provide robust, wide-ranging, low-level support for many Linux features. rsyscall is useful for writing many kinds of scripts and applications which make heavy use of the Linux API. Such programs don’t have to give up on high-level language features and use C, nor do they have to use other languages with complex runtimes that can silently interfere with features like unshare and clone.

We’ve implemented the rsyscall API initially in Python, along with a limited C API, to maximize its usability.

In Python, we treat each syscall as a method on some object, such as Process or FileDescriptor or ChildPid. We’ve also used the type annotation features of Python 3. This is idiomatic for Python, but we’d love to support other languages. Other languages will likely have different approaches to the API. If you’d like to work on supporting other languages, just file an issue. Any language is interesting, but particularly interesting would be languages with rich type systems, like Haskell and OCaml, and languages that currently have poor APIs for interacting with Linux, like Java.

To learn more about the specifics of the API, take a look at the documentation.