Skip to content

Files

Latest commit

 

History

History

2-b_compiler

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
C BOOTSTRAP STAGE 2

The main product of stage 2 is a compiler.  The original intention had 
been to implement a B compiler, perhaps with the caveat that the few
constructs that changed syntactically between B and C would be 
implemented in the C way -- for example, += not =+, and 'extern' not
'extrn'.  But it rapidly became obvious that the B memory model was
not forwards compatible with the C memory model on machines with byte
addressing.  Implementing B on a x86 would require pointers to be
represented as integer offsets into memory, and a pointer dereference,
*ptr, would translate into a ModR/M + SIB instruction: (%ebx, %eax, 4) 
where %ebx is NULL and %eax is the pointer value.  Taking the address 
of an automatic variable would be worse and involve an explicit bit-
shift.

Nevertheless, B's lack of a type system significantly simplifies the 
implementation, and this feature of B has been retained in the stage 4 
compiler.  Our single type is a 32-bit integer which also serves as an
address.  Incrementing the value increments the underlying address by 
one, as with a char* in C.  This means, that unlike in B, incrementing 
an address does not move to the next integer in an array: use ptr += 4 
for that.  However, subscripting with [] works with 32-bit word offsets,
so that ptr[1] is equivalent to *(ptr + 4).  To treat a pointer as a
string and get character-level access, B uses two functions lchar(s,n)
and rchar(s,n,c) to get and set a character, respectively, at the given
offset.  These are provided in char.s.  When types are introduced in a
subsequent stage, this behaviour can be preserved because subscripting
an int (other than with a pointer) is not legal in C.

For forwards compatibility, certain type constructs are allowed and
completely ignored.  The (otherwise unsupported) int keyword may be
placed immediately after auto, and the identifiers in an auto 
declaration can be preceded with one or more *.  A list of parameter 
declarations may precede the opening brace of a function.

Summary of differences from the historical B:

  * Compound assignment operators are spelt OP= instead of =OP.
  * There are no relop assignment operators (e.g. =<, =>=, ===).
  * Definitions require '=' (i.e. 'i = 42' not 'i 42').
  * Arrays require a size (i.e. 'auto a[1] = {0}' not 'auto a[] = {0}').
  * Arrays with too many intialisers do not expand to accommodate them.
  * The '{' ... '}' around single-statement functions are required.
  * We support logical && and || complete with short circuiting.
  * We support the 'continue' keyword from C.
  * We support C's 'do' ... 'while' loop construct.
  * We allow 'static' on global variables and functions.
  * The return statement does not require brackets.
  * We don't allow backspace (character 0x7F) or dot (.) in identifiers.
  * The escape characters in strings is \ not *, and there is no \e.
  * We don't support the switch statement, or therefore case labels.
  * We don't support goto and labeled statements.
  * Not a difference, but B did not support 'for' loops and neither do we.

The stage 2 compiler is a simple afair, making a single pass over the 
input file and code generation is done straight out of the parser,
without building an abstract syntax tree (AST) representation.  This 
means that the code generated is very inefficient, and even very obvious
optimisations are not made.  As an example, all lvalue-to-rvalue
conversions are done as separate statements, so to read a local auto
variable, we generated LEA -offset(%ebp), %eax; MOVL (%eax), %eax
instead of the more obvious MOVL -offset(%ebp), %eax.

  Usage: bc -S file.b