Hooks in angr are very powerful! You can use them to modify a program's behavior in any way you could imagine. However, the exact way you might want to program a specific hook may be non-obvious. This chapter should serve as a guide when programming SimProcedures.
Here's an example that will remove all bugs from any program:
>>> from angr import Project, SimProcedure
>>> project = Project('examples/fauxware/fauxware')
>>> class BugFree(SimProcedure):
... def run(self, argc, argv):
... print('Program running with argc=%s and argv=%s' % (argc, argv))
... return 0
# this assumes we have symbols for the binary
>>> project.hook_symbol('main', BugFree())
# Run a quick execution!
>>> simgr = project.factory.simulation_manager()
>>> simgr.run() # step until no more active states
Program running with argc=<SAO <BV64 0x0>> and argv=<SAO <BV64 0x7fffffffffeffa0>>
<SimulationManager with 1 deadended>
Now, whenever program execution reaches the main function, instead of executing the actual main function, it will execute this procedure! It just prints out a message, and returns.
Now, let's talk about what happens on the edge of this function!
When entering the function, where do the values that go into the arguments come from?
You can define your run()
function with however many arguments you like, and the SimProcedure runtime will automatically extract from the program state those arguments for you, via a calling convention, and call your run function with them. Similarly, when you return a value from the run function, it is placed into the state (again, according to the calling convention), and the actual control-flow action of returning from a function is performed, which depending on the architecture may involve jumping to the link register or jumping to the result of a stack pop.
It should be clear at this point that the SimProcedure we just wrote is meant to totally replace whatever function it is hooked over top of. In fact, the original use case for SimProcedures was replacing library functions. More on that later.
On a Project
class, the dict project._sim_procedures
is a mapping from address to SimProcedure
instances.
When the execution pipeline reaches an address that is present in that dict, that is, an address that is hooked, it will execute project._sim_procedures[address].execute(state)
.
This will consult the calling convention to extract the arguments, make a copy of itself in order to preserve thread safety, and run the run()
method.
It is important to produce a new instance of the SimProcedure for each time it is run, since the process of running a SimProcedure necessarily involves mutating state on the SimProcedure instance, so we need separate ones for each step, lest we run into race conditions in multithreaded environments.
This hierarchy implies that you might want to reuse a single SimProcedure in multiple hooks.
What if you want to hook the same SimProcedure in several places, but tweaked slightly each time?
angr's support for this is that any additional keyword arguments you pass to the constructor of your SimProcedure will end up getting passed as keyword args to your SimProcedure's run()
method.
Pretty cool!
If you were paying attention to the example earlier, you noticed that when we printed out the arguments to the run()
function, they came out as a weird <SAO <BV64 0xSTUFF>>
class.
This is a SimActionObject
.
Basically, you don't need to worry about it too much, it's just a thin wrapper over a normal bitvector.
It does a bit of tracking of what exactly you do with it inside the SimProcedure---this is helpful for static analysis.
You may also have noticed that we directly returned the Python int 0
from the procedure.
This will automatically be promoted to a word-sized bitvector!
You can return a native number, a bitvector, or a SimActionObject.
When you want to write a procedure that deals with floating point numbers, you will need to specify the calling convention manually.
It's not too hard, just provide a cc to the hook: cc = project.factory.cc_from_arg_kinds((True, True), ret_fp=True)
and project.hook(address, ProcedureClass(cc=mycc))
This method for passing in a calling convention works for all calling conventions, so if angr's autodetected one isn't right, you can fix that.
How can you exit a SimProcedure?
We've already gone over the simplest way to do this, returning a value from run()
.
This is actually shorthand for calling self.ret(value)
.
self.ret()
is the function which knows how to perform the specific action of returning from a function.
SimProcedures can use lots of different functions like this!
ret(expr)
: Return from a functionjump(addr)
: Jump to an address in the binaryexit(code)
: Terminate the programcall(addr, args, continue_at)
: Call a function in the binaryinline_call(procedure, *args)
: Call another SimProcedure in-line and return the results
That second-last one deserves some looking-at. We'll get there after a quick detour...
What if we want to add a conditional branch out of a SimProcedure? In order to do that, you'll need to work directly with the SimSuccessors object for the current execution step.
The interface for this is self.successors.add_successor(state, addr, guard, jumpkind)
.
All of these parameters should have an obvious meaning if you've followed along so far.
Keep in mind that the state you pass in will NOT be copied and WILL be mutated, so be sure to make a copy beforehand if there will be more work to do!
How can we call a function in the binary and have execution resume within our SimProcedure?
There is a whole bunch of infrastructure called the "SimProcedure Continuation" that will let you do this.
When you use self.call(addr, args, continue_at)
, addr
is expected to be the address you'd like to call, args
is the tuple of arguments you'd like to call it with, and continue_at
is the name of another method in your SimProcedure class that you'd like execution to continue at when it returns.
This method must have the same signature as the run()
method.
Furthermore, you can pass the keyword argument cc
as the calling convention that ought to be used to communicate with the callee.
When you do this, you finish your current step, and execution will start again at the next step at the function you've specified.
When that function returns, it has to return to some concrete address!
That address is specified by the SimProcedure runtime: an address is allocated in angr's externs segment to be used as the return site for returning to the given method call.
It is then hooked with a copy of the procedure instance tweaked to run the specified continue_at
function instead of run()
, with the same args and kwargs as the first time.
There are two pieces of metadata you need to attach to your SimProcedure class in order to use the continuation subsystem correctly:
- Set the class variable
IS_FUNCTION = True
- Set the class variable
local_vars
to a tuple of strings, where each string is the name of an instance variable on your SimProcedure whose value you would like to persist to when you return. Local variables can be any type so long as you don't mutate their instances.
You may have guessed by now that there exists some sort of auxiliary storage in order to hold on to all this data.
You would be right!
The state plugin state.callstack
has an entry called .procedure_data
which is used by the SimProcedure runtime to store information local to the current call frame.
angr tracks the stack pointer in order to make the current top of the state.callstack
a meaningful local data store.
It's stuff that ought to be stored in memory in a stack frame, but the data can't be serialized and/or memory allocation is hard.
As an example, let's look at the SimProcedure that angr uses internally to run all the shared library initializers for a full_init_state
for a linux program:
class LinuxLoader(angr.SimProcedure):
NO_RET = True
IS_FUNCTION = True
local_vars = ('initializers',)
def run(self):
self.initializers = self.project.loader.initializers
self.run_initializer()
def run_initializer(self):
if len(self.initializers) == 0:
self.project._simos.set_entry_register_values(self.state)
self.jump(self.project.entry)
else:
addr = self.initializers[0]
self.initializers = self.initializers[1:]
self.call(addr, (self.state.posix.argc, self.state.posix.argv, self.state.posix.environ), 'run_initializer')
This is a particularly clever usage of the SimProcedure continuations.
First, notice that the current project is available for use on the procedure instance.
This is some powerful stuff you can get yourself into; for safety you generally only want to use the project as a read-only or append-only data structure.
Here we're just getting the list of dynamic intializers from the loader.
Then, for as long as the list isn't empty, we pop a single function pointer out of the list, being careful not to mutate the list, since the list object is shared across states, and then call it, returning to the run_initializer
function again.
When we run out of initializers, we set up the entry state and jump to the program entry point.
Very cool!
As a brief aside, you can store global variables in state.globals
.
This is a dictionary that just gets shallow-copied from state to successor state.
Because it's only a shallow copy, its members are the same instances, so the same rules as local variables in SimProcedure continuations apply.
You need to be careful not to mutate any item that is used as a global variable unless you know exactly what you're doing.
We've already looked at the class variable IS_FUNCTION
, which allows you to use the SimProcedure continuation.
There are a few more class variables you can set, though these ones have no direct benefit to you - they merely mark attributes of your function so that static analysis knows what it's doing.
NO_RET
: Set this to true if control flow will never return from this functionADDS_EXITS
: Set this to true if you do any control flow other than returningIS_SYSCALL
: Self-explanatory
Furthermore, if you set ADDS_EXITS
, you may also want to define the method static_exits()
.
This function takes a single parameter, a list of IRSBs that would be executed in the run-up to your function, and asks you to return a list of all the exits that you know would be produced by your function in that case.
The return value is expected to be a list of tuples of (address (int), jumpkind (str)).
This is meant to be a quick, best-effort analysis, and you shouldn't try to do anything crazy or intensive to get your answer.
The process of writing and using a SimProcedure makes a lot of assumptions that you want to hook over a whole function. What if you don't? There's an alternate interface for hooking, a user hook, that lets you streamline the process of hooking sections of code.
>>> @project.hook(0x1234, length=5)
... def set_rax(state):
... state.regs.rax = 1
This is a lot simpler! The idea is to use a single function instead of an entire SimProcedure subclass. No extraction of arguments is performed, no complex control flow happens.
Control flow is controlled by the length argument.
After the function finishes executing in this example, the next step will start at 5 bytes after the hooked address.
If the length argument is omitted or set to zero, execution will resume executing the binary code at exactly the hooked address, without re-triggering the hook.
The Ijk_NoHook
jumpkind allows this to happen.
If you want more control over control flow coming out of a user hook, you can return a list of successor states.
Each successor will be expected to have state.regs.ip
, state.scratch.guard
, and state.scratch.jumpkind
set.
The IP is the target instruction pointer, the guard is a symbolic boolean representing a constraint to add to the state related to it being taken as opposed to the others, and the jumpkind is a VEX enum string, like Ijk_Boring
, representing the nature of the branch.
The general rule is, if you want your SimProcedure to either be able to extract function arguments or cause a program return, write a full SimProcedure class. Otherwise, use a user hook.
As you should recall from the section on loading a binary, dynamically linked programs have a list of symbols that they must import from the libraries they have listed as dependencies, and angr will make sure, rain or shine, that every import symbol gets resolved by some address, whether it's a real implementaion of the function or just a dummy address hooked with a do-nothing stub.
As a result, you can just use the Project.hook_symbol
API to hook the address referred to by a symbol!
This means that you can replace library functions with your own code.
For instance, to replace rand()
with a function that always returns a consistent sequence of values:
>>> class NotVeryRand(SimProcedure):
... def run(self, return_values=None):
... rand_idx = self.state.globals.get('rand_idx', 0) % len(return_values)
... out = return_values[rand_idx]
... self.state.globals['rand_idx'] = rand_idx + 1
... return out
>>> project.hook_symbol('rand', NotVeryRand(return_values=[413, 612, 1025, 1111]))
Now, whenever the program tries to call rand()
, it'll return the integers from the return_values
array in a loop.