-
Notifications
You must be signed in to change notification settings - Fork 47
Debugging the Disassembly
As we don't have any symbols for the Civ4 exes, the only option you have if you want to debug it is to use the disassembly view:
For fairly self explanatory reasons this is a more painful process than debugging the DLL, where you can step through the source code, inspect variables by name etc. However you can still use the same tools of stepping through the code and inspecting variables, it is just more difficult (and some times too difficult to be worth it).
These are the symbolic names for the "variables" that assembly operations operate on.
In our case they are: EAX
, EBX
, ECX
, EDX
, ESI
, EDI
, EIP
, ESP
, EBP
, EFL
.
Some of these registers have special meanings and uses depending on the context.
The E*X
registers are general purpose registers, but in some cases are assigned specific uses. For instance when calling a member function of an object, ECX
often points to the this
pointer.
You can view the value of these at all times using the Registers debug window (Debug > Windows > Registers, at the very bottom).
You will see these everywhere, either directly in the assembly, or passed around in registers. A lot of the assembly code is just moving values to and from memory addresses via registers (the mov
instruction when used with dword ptr
)
They can refer to functions or data, it is all just bytes in memory that can be addressed.
Here you can see a couple of different uses:
- To jump directly to a position in the code (
je
means "jump if equal", where "equal" was calculated in a previous instruction further up). Once compiled the code is entirely static in memory, and thus can be referred to by hard coded addressed in this manner. If statements in the code will generally be compiled into jump commands of this type. - A value being copied from a memory address, where the address is specified by another register with an offset applied, into a register. This usually indicates values being read from class members. e.g. Reading
eax+4
, ifeax
points to the start of aCvUnit
, would return theCvEntity*
of that unit. This is because, whenCvUnit
is laid out in memory,CvEntity* m_pEntity;
is at an offset of 4 bytes from the start of the object (the first 4 bytes are taken by something called a vtable pointer which I won't go into here). - A value being copied from a register directly into an address, where the address is specified by another register with an offset applied. This generally indicates a class data member is being written to.
This can be used to push values for later use. For instance: when calling functions it is common to push the function parameters onto the stack. The function itself then just pops them off the stack so it can use them.
The call
instruction is used to call functions. The functions are always specified by an address, either specified directly (a literal value), in a register, or read from a specified memory address.
Here you can see all three in that order:
How these are provided to the function depends on something called the "calling convention". See here for a detailed guide on the topic. The CvGameCoreDLL
uses something called the thiscall calling convention (at least when calling member functions of objects). It is specifically designed to work well for class functions that need a this
pointer. The this
pointer itself is passed in via the ECX
register, and the rest of the parameters are pushed onto the stack in reverse order.
Here I will present a few tricks that can help you make sense out of what you are looking at.
You can still use the Watch windows (Locals, Autos, Watch) when looking at the disassembly, you just need to know a couple of things.
This is what can allow you to see the C++ objects that the assembly is operating on as you would in normal source debugging.
Simply cast any number or register to the type you want to view it as. You will need to qualify the type with the dll name so that the debugger knows where to look for the type definition, this is done be prepending CvGameCoreDLL.dll!
(note the exclamation mark at the end) to the class name:
Here you can see that ESI
contains the address of a CvUnit
object. I can either cast it directly or I can cast a memory address directly.
How do you know what type to cast it to? Well you will need to work it out, or you can try guessing. For instance if you have part of the DLL callstack intact then you can guess from context what objects the exe might be operating on and try them out. If the casted object is gibberish then probably it isn't that type (OR it is a pointer invalidation bug and the memory was already freed and reused by something else).
You need to give a context and cast to a function pointer by prefixing the address/register with {,,CvGameCoreDLL.dll}(void(*)())
.
You can see here without the context the register just shows the address, but if you add the context it will automatically show you the DLL function name that is being referred to by that register.
This trick is particularly useful when you see addresses being referred to directly in the disassembly like ds:[0BC1988h]
. These addresses refer to our functions in the CvGameCoreDLL.dll. This means when you see them you can work out what functions the exe was calling.
For instance, lets say you get a crash and the disassembly looks like this:
You can see the red-line where the crash occurred. Above the red line there are a few call
operations that target ds:
addresses, one of them underlined in green. These are good candidates for looking at in the watch window. One thing to note is that the dword ptr ds:[...]
syntax means that the actual address of the function is stored in memory at the address specified in the []
. This means we need to dereference this memory like *(int*)...
and then lookup the function it points at:
You can see the context underlined in orange, the dereferenced address in green, and the resolved function name in yellow.
Note that in the disassembly window hex values are specified by trailing h
, but in the watch window they must use leading 0x
instead, as in C++.