After tampering with the code and inevitably deciding to figure out a lot of stuffs by my own, I have stuck in certain topics and unstuck myself eventually. The following is a list of all such topics that might bring frustration to people (like me) who are uninitiated in the Conjuration magic of Compiler Design:
We use flags to indicate that the program is in a break chain, or a return chain, and there are two key issues we need to take care of:
- These flags must propogate through the chain, and most importantly whoever calls the codeblock() should follow up with checks against the flags. If the checks return True then the caller functions must return as well.
- These flags must be reset ONLY in the CALLER of functions they serve
- TODO: We can probably modify the logic of break flag -- instead of use two flags, maybe we can just use one flag but reset it in stmt(). There is no need to reset it in codeblock()
- break flag serves while statement, so we need to reset it in the CALLER of while statement, that is, either codeblock() or stmt(). codeblock() is a bit special as it is the caller of many other functions, thus we need to use a second break flag to tell codeblock() that we already broke out of the while loop we intended to
- return flag serves return statement, so we need to reset it in the CALLER of return statement, that is, functioncallstmt()
This is the most difficult one:
-
In the tokenizer, the
DEDENT
tokens must have the correct columns (check bookmarkrm00
intokenizer.py
) -
In the parser, whenever a
break
occurs, the program should perform the following:-
in break(), it must consume all DEDENT, including the one usually reserved for the parent codeblock() -- the one within while loop (
parser.py
, bookmarkrm01
) -
in break(), a flag must be set to pass to all its parent caller functions (
parser.py
, bookmarkrm01
) -
while
loops must set another flag to indicate that the code breaks out of thewhile
loop, everything in the loop has been skipped (including the saidDEDENT
reserved for thecodeblock
within thewhile
loop), seecodeblock()
for why we need a second flag (parser.py
, bookmarkrm02
) (parser.py
, bookmarkrm03
) -
Now
while
loops can have two kinds of call stacks, one iswhilestmt()
<-compoundstmt()
<-codeblock()
, this is when thewhile
loop is wrapped inside of acodeblock
, and the other iswhilestmt()
<-compoundstmt()
->stmt()
, this is when thewhile
loop is at the outmost layer, i.e. NOT wrapped within acodeblock
- For situation 1, the parent
codeblock()
should clear the two flags -- but we need to make sure that it IS the right one, e.g. thiscodeblock
actually wraps anif
which has abreak
, and both are within awhile
loop, so you have a call stack that looks like:
break()
<-simplestmt()
<-stmt()
<-codeblock()
<-ifstmt()
<-compoundstmt()
<-stmt()
<-codeblock()
<-whilestmt()
<-compoundstmt()
<-stmt()
<-codeblock()
<-...,and only the last one is the correct one to reset the two flags -- we need to make sure that the second flag has been set by the
while
loop -- which means thiscodeblock
is indeed the grandparent of the while loop- For situation 2, it's the same stuff but now it looks like:
break()
<-simplestmt()
<-stmt()
<-codeblock()
<-ifstmt()
<-compoundstmt()
<-stmt()
<-codeblock()
<-whilestmt()
<-compoundstmt()
<-stmt()
<-program()
<-...,Now that
stmt()
must clear the two flags -- and always check whether both are set. (check bookmarkrm04
intokenizer.py
)- We might be able to use the common parent node --
stmt()
to reset the two flags but this needs more experiments.
- For situation 1, the parent
-
Pre-requsite modifications:
The two symbol tables are to take the following formats:
- Entries that have a simple value is a variable
- Entires that have a complex value is a function imprint Maybe later we should just add a type field...
{
"a": 3,
"foo":{
"parameters": [
{"a": 1},
{"b": 2},
{"c": 3}
]
}
}
Consider the following code:
blah = 100
# Whatever before foo()
def foo(a, b, c):
d = a + b
e = 2 * c
while d <= e:
print(d)
if 5 * d < e:
d *= 2
else:
d += 1
e -= 1
def bar():
para1 = 2
para2 = 5
global blah
foo(para1, para2, blah)
# Whatever to be executed before
bar()
So imagine how the interpreter should behave when running this part of the code
blah = 100
Interpreter first check whether functioncalldepth
is 0:
- If
0
then we are in global, business usual - If not
0
then we are in local, checklocalsymboltable
and upsert properly
def foo(a, b, c):
Interpreter should call defstmt()
which stores the following information into globalsymboltable
(right now functions can only be defined in global scope):
"foo": <value of tokenindex>
And of course if "foo" is already being defined it raises a RuntimeError (However, why shouldn't we allow same name for variable and function?). Then whenever there is a function call to foo()
the interpreter can move tokenindex
to this place, after saving the return address of course.
Then the interpreter should skip all tokens until it hits a token that:
- Is of column 1 (back to global scope)
- Is not of
INDENT
orDEDENT
def bar():
Same story as point 2
bar()
This is a function call. The following should happen:
-
Check whether we are in global scope
- If
True
, we don't need to worry about backing up the local env - If
False
, we are making a function call within a function. We need to back up the local env and clear the local env for the upcoming callee function
- If
-
Push the return address (index of first token following the function call) onto the return address stack
-
Since there is no parameter, we keep
localsymboltable
as is (or clear it just in case) -
Check the global symbol table and jump to the token index of
bar()
codeblock
para1 = 2
Is para1
in globalvardeclared
?
- Yes -> Update
globalsymboltable
- No -> Upsert
para1
ontolocalsymboltable
para2 = 5
Same as above
global blah
Add blah
into the globalvardeclared
set
foo(para1, para2, blah)
This is a function call. The following should happen:
-
Check whether we are in global scope
- If
True
, we don't need to worry about backing up the local env - If
False
, we are making a function call within a function. We need to back up the local env and clear the local env for the upcoming callee function
- If
-
Push the return address (index of first token following the function call) onto the return address stack
-
There are three parameters. Push onto
localsymboltable
for the callee function to pick up later (recall that we already backup the local env intolocalsymboltablebackup
) -> This is to put the first value into the "parameters" list of "foo", and so on. -
Check the global symbol table and jump to the token index of
foo()
codeblock
d = a + b
The interpreter checks localsymboltable
and finds d
to be a new symbol. On the right side of the assignment both variables are known.
e = 2 * c
The interpreter checks localsymboltable
and finds d
to be a new symbol. On the right side of the assignment both variables are known.
while d <= e:
This goes into whilestmt()
print(d)
if 5 * d < e:
d *= 2
else:
d += 1
e -= 1
The only problem is that we must ensure that the function returns properly:
- Pops return address (index of token) from
returnaddrstack
and set tokenindex to it; - Decrease
functioncalldepth