-
Notifications
You must be signed in to change notification settings - Fork 165
Compiler and Code Generator Debugging
Table of Contents
This page contains notes and tips related to debugging the Open Watcom compilers and especially the code generator. It is assumed the reader is familiar with the pertinent Compiler Architecture.
An important first step is isolating the problem. This may take some effort but is usually well worth the trouble. The vast majority of bugs can be condensed into testcases with under five lines of code; this is extremely helpful in making it much easier to zero in on the issue and not be distracted by irrelevancies. Not infrequently, isolating the problem also provides clues as to what might be causing it.
For C code, if the initial testcase is complex, it is recommended to run it through a preprocessor and strip away anything that does not directly contribute to the problem. Typedefs may be removed, macros are (obviously) expanded and conditionally compiled code does not get in the way. Function bodies should be replaced by extern declarations to the maximum extent possible.
For debugging front end issues, it is necessary to get rid of headers and as many redundant declarations as possible so that they don't get in the way. For back end problems, headers don't matter much but it is critical to minimize the amount of generated code and reduce its complexity. The only exception to this rule is a situation where the compiler crashes (not that this ever happens!), in which case executing the compiler under debugger control may be enough.
Tip: If the compiler is not crashing but dying with an internal error, place a breakpoint on a function called Zoiks
.
The next step in debugging the compilers is getting a debug build. Apart from having full debugging information, debug builds include a number of routines to dump internal data structures, and especially in the case of the C++ front end also a number of additional sanity checks.
Depending on one's area of interest, a debug build of the code generator may or may not be needed. For work that is restricted purely to the language front end, this is not necessary. For anything involving lower level code generation, debug cg is needed. It is often useful even for bugs in the front ends where the calls to the cg are incorrect. Pre-made debug directories are supplied for Win32 and OS/2, for instance bld/cg/intel/386/nt386.dbg
(Win32-hosted 386 code generator). Simply run wmake in the appropriate directory. Feel free to create new ones for your host platform.
Similarly, there are pre-made debug directories for the front ends, such as bld/cc/dnt386.386
(Win32-hosted 386 C front end). Note that the makefiles turn on debugging and wmake may be run without any additional input, unlike the usual situation where debugging needs to be explicitly requested.
Once a debug compiler is built, a number of avenues are available. One of the simplest is tracing of the cg library calls. This is accomplished using the -lc
switch (C compiler only), or specifying #pragma on( dump_cg )
in the source. Understanding the output takes a little practice, but isn't tremendously difficult.
Tracing the cg interface calls is often a good starting point when it isn't clear whether a problem is in the front end or back end. A review of the call trace typically shows whether the generated code corresponds to the calls (and hence the front end is incorrect) or the generated code is different (and hence the back end is incorrect).
The C front end does not offer a whole lot in the way of debug instrumentation and hence a debugger with cleverly planted breakpoints and strategically selected watches is by far the most productive tool.
There are, however, several useful routines callable from the debugger. Most of them are located in cfedump.c
, but a few are also in cmac1.c
.
The most basic way to use those routines is to simply execute them from the debugger using the call command, such as executing
call DumpProgram
on the debugger command line (reminder: the command line may be brought up by hitting colon, ala vi). This will execute the routine and print information on the debuggee's console.
A more interesting way is capturing the program's output and displaying it within the debugger. Note that this works well on OS/2 but may not work on other platforms, depending on debugger trap file capabilities.
vc DumpProgram
will call the DumpProgram
and View its Captured output in a debugger window.
The above mentioned DumpProgram
routine will print a human-readable version of the cfe's representation of the source module. Examine the cfedump.c
module to see which functions are available. Keep in mind that if a function takes arguments, you must supply them.
For instance the DumpStmt
function takes a single argument of TREEPTR type. If you call it without any argument, it's almost certainly going to crash. But suppose you're debugging the AddrFold
function in cdinit.c
and the debugger stopped in the middle of it. You could run
vc DumpStmt(CurFuncNode)
to see what goodies the CurFuncNode
global is pointing to. But you can also execute
vc DumpStmt(tree)
to dump what the tree argument to AddrFold
represents.
The debugger includes a good expression evaluator so you shouldn't feel constrained in what information you can dump; you could for instance run
vc DumpStmt(tree->left->right)
to dig a little deeper into the expression tree.
The C++ front end provides a good amount of debug instrumentation in several flavours. There is considerable passive debugging help in the form of debug assertions. In some cases, simply running a debug version of the compiler will trigger an assertion and point in the direction of the problem.
The C++ compiler also provides a way to dump various internal structures to the console. This is controlled through #pragma
statements. See bld/plusplus/notes/debug.txt
for further information.
For diagnosing many problems related to code generation, a good way to start is with a breakpoint at the call to FixReturns
in function Generate (in generate.c
). Generate is called for every function in the program to generate its code.
The code generator provides a wealth of debug routines to print internal data structures. These are mostly located in modules whose names start with dump. Taking a brief look at these modules to see what's available is highly recommended (needless to say, implementing additional dumping functions is always an option).
The most useful function is DumpBlk
. This function will dump the pseudo-assembly representation of the current function, with some additional information. While this pseudo-assembly is not formally documented, reading it is easy for anyone passingly familiar with assembler. All instructions are simple operations such as MOV, ADD, XOR, CALL, NOP. Results and operands are 'names', often machine registers or memory locations, but also constants and (initially) temporaries. It is often instructive to watch how
the pseudo-assembly changes in the process of optimization and code generation, and running DumpBlk
after each function call and viewing its output often shows where the code generation went wrong.
An example may be helpful. Consider the following C source code:
int foo( int a, int b )
{
return( a > b ? 3 : 4 );
}
Running DumpBlk
early inside Generate will show the following output:
002A93B0 Block 1(1) L002A4260 foo Depth 0
----Jmp -------------LBL -------------------------------------------------------
00000000 Origins:
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A94F0 ( 1): nop XX
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A95F0 ( 2): parm I4 ==> EAX
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A9670 ( 3): cnv I4 I4 EAX ==> t1(a)
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A9730 ( 4): parm I4 ==> EDX
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A97B0 ( 5): cnv I4 I4 EDX ==> t2(b)
00000000000000000000000000000000 00000000
002A9488 Destinations: L002A9830
002A9FB0 Block 2(2) L002A9830 *** NULL *** Depth 0
--------Cond -------------------------------------------------------------------
00000000 Origins:
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
Line number=3
002A9F30 ( 6): if > I4 T=0 t1(a), t2(b) then Block 7(3333) else Block 129(2793392)
00000000000000000000000000000000 00000000
002AA088 Destinations: L002AA1D0, L002A9E70
002A9870 Block 3(3) L002AA1D0 *** NULL *** label dies Depth 0
----Jmp ------------------------------------------------------------------------
00000000 Origins:
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA250 ( 7): mov I4 00000003 ==> t4
00000000000000000000000000000000 00000000
002A9948 Destinations: L002A9EB0
002AA2D0 Block 4(4) L002A9E70 *** NULL *** label dies Depth 0
----Jmp ------------------------------------------------------------------------
00000000 Origins:
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA3D0 ( 8): mov I4 00000004 ==> t4
00000000000000000000000000000000 00000000
002AA3A8 Destinations: L002A9EB0
002AA450 Block 5(5) L002A9EB0 *** NULL *** label dies Depth 0
Ret ----------------------------------------------------------------------------
00000000 Origins:
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA550 ( 9): mov I4 t4 ==> t3
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
Line number=4
002AA650 ( 10): mov U4 t3 ==> EAX
DADADADADADADADADADADADADADADADA DADADADA EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA5D0 ( 11): nop XX
00000000000000000000000000000000 00000000
002AA528 Destinations:
First off, the 0xDADA pattern is what the memory allocator pre-initializes memory to. Any memory containing this pattern is unused. The code is split into basic blocks, all nicely separated. Several types of blocks are shown here. The Jmp blocks simply transfer control to another block. The Cond block ends with a conditional and will jump to one of two basic blocks depending on the result of the comparison. The final block is marked with Ret and represents return from the function.
Where applicable, instructions are preceded by source code line number. Each instruction has its ID, initially sequential numbers, from 1 to 11 in the example. Every block has information about its origins (who jumps to it) and destinations (where it jumps to), but it hasn't been filled in yet. Note that each basic block except the first has one or more origins and each basic block except the last has one or two destinations. Blocks with no origins (which may arise during optimization) are dead code and wi ll be culled.
The first block represents function prolog and contains information about passed arguments. In this case, because the 386 compiler with register calling convention was used, the parameters (parm) were passed in registers EAX and EDX. Observe that the type of each instruction is provided, in this case I4 for signed 32-bit integer. These are converted (cnv) into temporaries t1 and t2. The dump provides information about the variable that a temporary corresponds to (if any); also note that the conversions do not really change type and will likely be eliminated shortly.
The second block contains a comparison instruction (if) with a 'greater than' condition code. The two temporaries t1 and t2 are compared and control will be transferred to one of two blocks that follow.
The next two blocks simply assign (mov) a constant, either 3 or 4, to temporary t4. The final block moves t4 to t3, which is then assigned to EAX and used as function return value.
After working through the pre-optimization steps and just before register allocation, the pseudo-assembly is much transformed:
002A93B0 Block 1(1) L002A4260 foo Depth 0
----Jmp -------------LBL -------------------------------------------------------
00000000 Origins:
IN 00000000000000000000000000000000 OUT 00000006000000000000000000000000
DEF 00000006000000000000000000000000 USE 00000000000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
00000000000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS
002A94F0 ( 0): nop XX
00000000000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS
002A95F0 ( 10): parm I4 ==> EAX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS
002A9670 ( 20): cnv I4 I4 EAX ==> t1(a)
00000004000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS
002A9730 ( 30): parm I4 ==> EDX
00000004000000000000000000000000 00000000 EDX:ESP:FS:ES:DS:CS:SS
002A97B0 ( 40): cnv I4 I4 EDX ==> t2(b)
00000006000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS
002AA250 ( 50): nop XX
00000006000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS
002A9488 Destinations: Block 2(5)
002AA450 Block 2(5) L002A9830 *** NULL *** label dies Depth 0
Ret ----------------------------------------------------------------------------
002A9488 Origins: Block 1(1)
IN 00000006000000000000000000000000 OUT 00000000000000000000000000000000
DEF 00000001000000000000000000000000 USE 00000006000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
00000006000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS
Line number=3
002A9F30 ( 60): if <= I4 T=0 t1(a), t2(b) ==> t5
00000000000000000000000000000000 00000002 ESP:FS:ES:DS:CS:SS
002AA550 ( 70): cnv I4 U1 t5 ==> t6
00000000000000000000000000000000 00000001 ESP:FS:ES:DS:CS:SS
002AA9F0 ( 80): add I4 t6, 00000003 ==> t4
00000001000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS
Line number=4
002AA650 ( 90): mov U4 t4 ==> EAX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS
002AA5D0 ( 100): nop XX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS
002AA3D0 ( 110): nop XX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS
002AA528 Destinations:
We only have two basic blocks now because the code generator figured out that the IA-32 SETcc instruction can be used for the conditional. The pseudo-assembly instruction is still called if, but now it is a conditional assignment, not conditional jump. The long rows of 0xDADA are gone too, mostly replaced by zeros. This data tells the code generator which registers are 'live' at each point. At the top of each block, a summary for the block is provided with information about registers that are INput or OUTp ut, DEFined and USEd, LOADed and STORed.
Some of the old temporaries are now gone but new ones have shown up (the temp numbers are not reused). The register allocator will get rid of them. After register allocation is performed and the code further optimized, the pseudo-assembly looks like this:
002A93B0 Block 1(1) L002A4260 foo Depth 0
----Jmp -------------LBL -------------------------------------------------------
00000000 Origins:
IN 00000000000000000000000000000000 OUT 00000006000000000000000000000000
DEF 00000006000000000000000000000000 USE 00000000000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
00000000000000000000000000000000 00000000 EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002A94F0 ( 0): nop XX
00000000000000000000000000000000 00000000 EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA250 ( 10): nop XX
00000000000000000000000000000000 00000000 EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002A9488 Destinations: L002A9830
002AA450 Block 2(5) L002A9830 *** NULL *** label dies Depth 0
Ret ----------------------------------------------------------------------------
00000000 Origins:
IN 00000006000000000000000000000000 OUT 00000000000000000000000000000000
DEF 00000001000000000000000000000000 USE 00000006000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
00000000000000000000000000000000 00000000 EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
Line number=3
002A9F30 ( 20): if <= I4 T=0 EAX, EDX ==> AL
00000000000000000000000000000000 00000000 ESP:FS:ES:DS:CS:SS:AL:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA550 ( 30): cnv I4 U1 AL ==> EAX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA9F0 ( 40): add I4 EAX, 00000003 ==> EAX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
Line number=4
002AA5D0 ( 50): nop XX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA3D0 ( 60): nop XX
00000000000000000000000000000000 00000000 EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA528 Destinations:
The prolog is now empty because parameters arrived in registers and didn't need any further work. There are now only three instructions left. It may be easiest to compare them with the final generated code:
0000 foo_:
0000 39 D0 cmp eax,edx
0002 0F 9E C0 setle al
0005 0F B6 C0 movzx eax,al
0008 83 C0 03 add eax,0x00000003
000B C3 ret
The if pseudo-instruction got turned into two machine instructions, CMP and SETLE. Conversion from U1 type to I4 is handled by MOVZX, although it could be also implemented using AND. The final ADD looks the same way in IA-32 assembler as it looked in the pseudo-assembler, and of course there is now also a return instruction.
- Welcome
- Building
- Open Watcom Documentation
- Notes
- Relicensing effort
- Debugging
- OW tools usage Overview
- OW tools usage with CMake
- OW tools usage with Visual Studio Code
- Open Watcom 1.9 Wiki
OW Development
WGML Development
- WGML
- Augmented Devices
- Binary Device Files
- Common File Blocks
- COP Files
- Device File Blocks
- Device Function Language
- Device Function Notes
- Device Functions
- Directory File Format
- Drawing Boxes
- Driver File Blocks
- File and Directory Names
- Font File Blocks
- Fonts
- GML Tag Notes
- Keyword Statistics
- Macros and User Defined Tags
- Meta Data
- Page Layout Subsystem
- Search Paths
- Sequencing
- System Symbol Notes
- Tabs and Tabbing
- whpcvt Utility interaction