CS3012: Formal Languages and Compilers
Intermediate Code Generation
Why use intermediate code ?
analysis independent from target language
optimisation independent from targetlanguage
porting to new machines requires onlya change of one component of thecompiler
We will generate three-address code,
using syntax-directed definitions.
CS3012: Formal Languages and Compilers
Three Address Code
Statements in this language are of the form:
x := y op z
where x, y and z are names, constants orcompiler-generated temporary variables, andop stands for any operator.
A more complicated statement like
d := a+b*c
would have to be translated to
t1 := b * cd := a + t1
where t1 is a compiler-generated temporary variable.
CS3012: Formal Languages and Compilers
expression: a := b * c + b / c
postfix: abc*bc/+ :=
syntax tree: :=
a
*
b
+
b
/
c c
three-address code:
t1 := b * ct2 := b / ct3 := t1 + t2
a := t3
CS3012: Formal Languages and Compilers
Three-address statements
x := y op z assignment
x := op y unary assignment
x := y copy
goto L unconditional jump
if x relop y goto L conditional jump
param x procedure call
call p n procedure call
return y procedure call
x := y[i] indexed assignment
x[i] := y indexed assignment
CS3012: Formal Languages and Compilers
A Syntax-directed Translation
To generate three-address code from source,we will use syntax-directed definitions.
First, we will consider the language ofassignments and expressions.
S will have one attribute "code", which willcontain the three-address code fragment ofthe assignment.
E will have two attributes:
code - the corresponding code fragmentplace - the name that will hold the value corresponding to E.
The notation gen(x ":=" y "+" z) represents
x := y + z
The notation <fragment> || expr meansconcatenate the expression onto the endof the code fragment.
CS3012: Formal Languages and Compilers
1)
2)
3)
4)
5)
6)
S -> id := E
E1 -> E2 + E3
E1 -> E2 * E3
E1 -> -E2
E1 -> ( E2 )
E -> id
S.code := E.code || gen(id.place ":=" E.place)
E1.place := newtemp();E1.code := E2.code || E3.code || gen(E1.place ":=" E2.place "+" E3.place)
E1.place := newtemp();E1.code := E2.code || E3.code || gen(E1.place ":=" E2.place "*" E3.place)
E1.place := newtemp();E1.code := E2.code || gen(E1.place ":=" "uminus" E2.place)
E1.place := newtemp();E1.code := E2.code
E.place = id.place;E.code := ""
CS3012: Formal Languages and Compilers
a := b * c + b * -c
S
a := E8n
b c
c
b E5n-
E3n + E7n
E1n * E2n E4n * E6n
CS3012: Formal Languages and Compilers
Constructing the Attributes
E1n
E2n
E3n
E4n
E5n
E6n
E7n
E8n
S
b
c
t1
b
c
t2
t3
t4
E1n.code || E2n.code || t1 := b * c
E5n.code || t2 := uminus c
E4n.code || E6n.code || t3 := b * t2
E3n.code || E7n.code || t4 := t1 + t3
E8n.code || a := t4
place code
CS3012: Formal Languages and Compilers
Flow of Control
We can extend that syntax-directeddefinition to handle flow of controlstatements:
S1 -> while E do S2
S1.begin := newlabel();S1.after := newlabel();S1.code := gen(S1.begin ":") || E.code ||
gen("if" E.place "= 0 goto" S1.after) || S2.code || gen("goto" S1.begin) || gen(S1.after ":")
The attributes "begin" and "after" will hold labels,and newlabel() will return a new label.
CS3012: Formal Languages and Compilers
E.code
codelabels
S1.begin :
S1.after :
if E.place = 0 goto S1.after
S2.code
goto S1.begin
...
...
CS3012: Formal Languages and Compilers
Looking up the Symbol Table
1)
2)
3)
4)
5)
6)
S -> id := E
E1 -> E2 + E3
E1 -> E2 * E3
E1 -> -E2
E1 -> ( E2 )
E -> id
p := lookup(id.name);if p nil then emit(p ":=" E.place)else error
E1.place := newtemp();emit(E1.place ":=" E2.place "+" E3.place)
E1.place := newtemp();emit(E1.place ":=" E2.place "*" E3.place)
E1.place := newtemp();emit(E1.place ":= uminus" E2.place)
E1.place := E2.place
p := lookup(id.name);if p nil then E.place := pelse error
CS3012: Formal Languages and Compilers
res := a * (alpha + -b)
Assume res, a, alpha and b have already been declared, and placed in the symbol table:
lexptr
:->res->a->alpha->b
token
:ID_TID_TID_TID_T
attributes
:
index
:5678
CS3012: Formal Languages and Compilers
processed string
res := a
res :=E1
res :=E1 * (alpha
res :=E1 * (E2
res :=E1 * (E2 + -b
res :=E1 * (E2 + -E3
res :=E1 * (E2 + E4
res :=E1 * (E5
res :=E1 * (E5)
res :=E1 * E6
res :=E7
S
attributes
E1.place = <6>
E2.place = <7>
E3.place = <8>
E4.place = <9>
E5.place = <10>
E6.place = <10>
E7.place = <11>
output
<9> := uminus<8>
<10> := <7>+<9>
<11> := <6>*<10>
<5> := <11>
CS3012: Formal Languages and Compilers
Arrays
We will store the elements of an array in ablock of consecutive locations.
A is an array
w is the width of each element
low is the lower bound on the index
base is the address of A
The ith element of A begins at location:
base + (i - low) * w
ori * w + (base - (low * w))
= c
We then store c with A in the symbol table, andthe address of A[i} then is c + (i * w)
CS3012: Formal Languages and Compilers
Multi-dimensional Arrays
We will consider arrays stored row by row
low1 is the lower bound on the first index
low2 is the lower bound on the second
n2 is the upper bound on the second index
The address of A[i,j] is:
base + ((i - low1)*n2 + (j - low2))* w
or((i * n2) + j)*w + (base - ((low1 * n2) + low2)*w)
CS3012: Formal Languages and Compilers
Grammar of Array References
The obvious grammar for indexing array elements is:
L -> id [Elist] | idElist -> Elist , E | E
We will use, however, a different grammar, thatalows us to build up the index limits as weconstruct the Elists:
L -> Elist | idElist -> Elist , E | id [ E
We also need:attributes: Elist.ndim - number of dimensions
Elist.place - temp valueL.place - position in symbol tableL.offset - offset into the array
functions: limit(array,i) - the limit of the ith dimension of the array
c(array) - returns the pre-computed formula
width(array) - returns w
CS3012: Formal Languages and Compilers
The syntax-directed definition
1) S -> L := E
2) E1 -> E2 + E3
3) E1 -> (E2)
4) E -> L
5) L -> Elist ]
if L.offset = nullthen emit(L.place ":=" E.place)else emit(L.place "[" L.offset "] :=" E.place)
E1.place := newtemp();emit(E1.place ":=" E2.place "+" E3.place)
E1.place := E2.place
if L.offset = null then E.place = L.placeelse E.place := newtemp(); emit(E.place ":=" L.place "[" L.offset "]")
L.place := newtemp();L.offset := newtemp();emit(L.place ":=" c(Elist.array))emit(L.offset ":=" Elist.place "*" width(Elist.array))
CS3012: Formal Languages and Compilers
6) L -> id
7) Elist1 -> Elist2 , E
8) Elist -> id [ E
L.place := id.placeL.offset := null
t := newtemp();m := Elist2.ndim + 1;emit(t ":=" Elist2.place "*" limit(Elist2.array, m))emit(t ":=" t "+" E.place);Elist1.array := Elist2.array;Elist1.place := t;Elist1.ndim := m
Elist.array := id.place;Elist.place := E.place;Elist.ndim := 1
CS3012: Formal Languages and Compilers
Type Conversion
We have seen before how to compute thetype expression for complex expressionsusing more than one data type.
It is the job of the compiler to constructthe necessary three-address code to do anyautomatic type conversion required .
We will assume that there are two basictypes, integer and real, and we may have toconvert integers to reals.
We assume that there is a function
inttoreal
and two different "+" operators:
int+
real+
CS3012: Formal Languages and Compilers
Semantic rule for E1 -> E2 + E3
E1.place := newtemp();if E2.type = integer and E3.type = integer thenbegin
emit(E1.place ":=" E2.place "int+" E3.place);E1.type := integer
endelse if E2.type = real and E3.type = real thenbegin
emit(E1.place ":=" E2.place "real+" E3.place);E1.type = real
endelse if E2.type = integer and E3.type = real thenbegin
u := newtemp();emit(u ":= inttoreal" E2.place);emit(E1.place ":=" u "real+" E3.place);E1.type = real
endelse if
CS3012: Formal Languages and Compilers
else if E2.type = real and E3.type = integer thenbegin
u := newtemp();emit(u ":= inttoreal" E2.place);emit(E1.place ":=" E2.place "real+" u);E1.type := real
endelse E1.type = type_error;
We would also require similar semantic rules for
E1 -> E2 * E3
using operators "int*" and "real*".
CS3012: Formal Languages and Compilers
generating the code
processed string
id1 := id2 * (id3 + -id4)
id1 :=E1 * (id3 + -id4)
id1 :=E1 * (E2 + -id4)
id1 :=E1 * (E2 + -E3)
id1 :=E1 * (E2 + E4)
id1 :=E1 * (E5)
id1 :=E1 * E6
id1 :=E7
S
attributes
E1.place = <6>
E2.place = <7>
E3.place = <8>
E4.place = <9>
E5.place = <10>
E6.place = <10>
E7.place = <11>
output
<9> := uminus<8>
<10> := <7>+<9>
<11> := <6>*<10>
<5> := <11>