-
Notifications
You must be signed in to change notification settings - Fork 115
Implementation details
Some of these details are out of date, especially due to the decision to abolish shadow threads in favor of synchronous tracing.
JQF uses the ASM toolkit to instrument Java bytecode on-the-fly as classes are loaded by the JVM. JQF uses a heavily modified version of the janala2 classes, originally developed by Koushik Sen for performing concolic execution of Java programs.
Janala instruments all applications classes and several JDK classes (classes with names java.lang.*
and sun.*
are not instrumented). The instrumentation injects a static method call after every bytecode instruction to a class that essentially snoops on everything that is being executed. There is one static method call per bytecode instruction type; the arguments to this call are a uniquely generated instruction identifier (the iid
) as well as instruction operands.
The snooping class can therefore recreate the entire execution state of any JVM thread by maintaining a shadow operand stack and call stack.
In JQF, we ignore snoops on most arithmetic and logical data manipulating instructions. We are mainly interested in control-flow instructions such as method calls and branches. Unfortunately, Java has many different ways of invoking method calls (e.g. INVOKESTATIC
, INVOKEINTERFACE
, INVOKEVIRTUAL
, INVOKESPECIAL
) as well as branching (e.g. IF_CMPNE
, GOTO
, TABLESWITCH
, LOOKUPSWITCH
) plus many other control-flow instructions (e.g. IRETURN
, ARETURN
, ATHROW
). Additionally, there is a lot of noise from instrumentation across multiple JVM threads plus the fact that the JVM hijacks application threads to invoke class loaders when classes are referenced for the first time. Therefore, JQF collects these bytecode instructions and forms a coherent sequence of trace events that are emitted to a Guidance instance.
Trace events provide a higher-level abstraction than bytecode instructions. For example, conditional jumps, gotos and switch statements are analyzed and a BRANCH
event is emitted. Similarly, the various method call and return instructions are coalesced into CALL
and RETURN
events.
By matching call and return events, JQF builds a shadow call stack per-thread. This is useful for many reasons such as ignoring events generated before and after a test method is invoked in the fuzzing loop (so as not to slow-down the test input generation) as well as ignoring events generated by JVM class-loading.
For maximizing performance, JQF maintains a shadow thread for each application thread, called a tracer. A tracer processes the bytecode instructions executed by its application thread and emits the appropriate trace events to the Guidance
instance in a coherent sequence.
The static methods invoked by the injected code in application threads put these instructions on thread-local queues, which are shared only with the thread's own tracer. These queues are implemented as a lock-free single-producer single-consumer circular buffers with busy waiting -- this is only guaranteed to be deadlock-free if the producer and consumer thread are fixed and distinct, which is true for JQF because the producer is always the application thread and the consumer is always its shadow thread.
AFL is designed to perform coverage-guided fuzzing of C programs that take input through a file or STDIN. AFL communicates with target programs using a 256KB shared memory region called trace_bits
, which is a map of 216 keys to 8-bit counters. AFL instruments the target program such that on each basic block transition (i.e. control-flow graph edge), the program increments the 8-bit counter corresponding to the key associated with that edge (this association is done statically at instrumentation time). The AFL fuzzing loop is roughly as follows:
u8* trace_bits = create_shared_memory();
while(1) {
input = generate_next_input();
run_target(input);
if (has_new_bits(trace_bits)) {
save_input(input);
}
}
Where has_new_bits
is a side-effecting function that returns true only if trace_bits
has new non-zero bits as compared to the union of all bits seen cumulatively in all the previous iterations.
The AFL integration in JQF is via a proxy program that sits in between AFL and JQF and communicates with each of them.
The proxy masquerades as a target C program to AFL, in that running the proxy with some input causes the trace_bits
to be updated. However, instead of being an AFL-instrumented C program, the proxy is in fact just a mediator that communicates with AFLGuidance
using FIFO pipes.
The AFLGuidance
instance registers a callback with JQF to handle trace events by updating its own copy of trace_bits
, which are then sent to the proxy whenever Guidance#handleResult()
is invoked. The proxy in turn updates the shared memory region and signals to AFL that a run is complete. The method Guidance#hasInput()
blocks until the proxy receives input from AFL. If a test run results in a FAILURE (e.g. due to an assertion error), then JQF informs the proxy as such, which fakes a segmentation fault signal to AFL; thus, AFL marks the input as a crash and saves it for further analysis.
The source code examples in the wiki pages can be freely re-used under the same license as the rest of JQF.