Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | nicholas-phelps |
View: | 217 times |
Download: | 0 times |
1
Structure of a Compiler
• Front end of a compiler is efficient and can be automated
• Back end is generally hard to automate and finding the optimum solution requires exponential time
• Intermediate code generation can effect the performance of the back end
InstructionSelection
InstructionScheduling
RegisterAllocation
Scanner ParserSemantic Analysis
Code Optimization
IntermediateCode
GenerationIR
2
Intermediate Representations
• Abstract Syntax Trees (AST)
• Directed Acyclic Graphs (DAG)
• Control Flow Graphs (CFG)
• Static Single Assignment Form (SSA)
• Stack Machine Code
• Three Address Code
• Hybrid approaches mix graphical and linear representations
– SGI and SUN compilers use three address code but provide ASTs for loops if-statements and array references
– Use three-address code in basic blocks in control flow graphs
high-level
low-level
Graphical IRs
Linear IRs
3
Abstract Syntax Trees (ASTs)
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
Statements
<
AssignStmt
+
*
x
IfStmt
AssignStmt AssignStmt
x x y+ yxy
/
5 y 3*
5 y
5
4
Directed Acyclic Graphs (DAGs)
• Use directed acyclic graphs to represent expressions
– Use a unique node for each expression
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
Statements
<
AssignStmt
*
IfStmt
AssignStmt AssignStmt
x +y
/
5
3
5
Control Flow Graphs (CFGs)
• Nodes in the control flow graph are basic blocks
– A basic block is a sequence of statements always entered at the beginning of the block and exited at the end
• Edges in the control flow graph represent the control flow
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
if (x < y) goto B1 else goto B2
x = 5*y + 5*y/3 y = 5
x = x+y
B1 B2
B0
B3
• Each block has a sequence of statements• No jump from or to the middle of the block• Once a block starts executing, it will execute till the end
6
Stack Machine Code if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
load x load y iflt L1 goto L2L1: push 5 load y multiply push 5 load y multiply push 3 divide add store x goto L3L2: push 5 store yL3: load x load y add store x
pops the toptwo elements andcompares them
pops the top two elements, multipliesthem, and pushes theresult back to the stack
pushes the valueat the location x to the stack
stores the value at the top of the stack to the location x
7
Three-Address Code
• Each instruction can have at most three operands
• Assignments– x := y– x := y op z op: binary arithmetic or logical operators– x := op y op: unary operators (minus, negation, integer to
float conversion)• Branch
– goto L Execute the statement with labeled L next
• Conditional Branch– if x relop y goto L relop: <, =, <=, >=, ==, !=
• if the condition holds we execute statement labeled L next• if the condition does not hold we execute the statement following
this statement next
Data structures for three address codes
• Quadruples
– Has four fields: op, arg1, arg2 and result
• Triples
– Temporaries are not used and instead references to instructions are made
• Indirect triples
– In addition to triples we use a list of pointers to triples
Quadruples
• A record structure with 4 fields
– op, arg1, arg2 and result
• Examples
– For x := y op z we have:
• y in arg1, z in arg2 and x in result
– For unary operators, arg2 not used
• Content of fields are pointers to ST entries
Triples
• Temps generated in quadruples must be entered in symbol table
• To avoid this, we can refer to a temp value by the location of the relevant statement
– We can have records with only 3 fields
• op, arg1 and arg2
– Fields arg1 and arg2 can be pointers to ST entries or to triple structure for temp values
Indirect Triples
• Listing of pointers to triples, rather than triples themselves
• Example
– We can use an array to list pointers to triples in the desired order
Example
• b * minus c + b * minus c
t1 = minus ct2 = b * t1t3 = minus ct4 = b * t3t5 = t2 + t4a = t5
Three address code
minus*
minus c t3*+=
c t1b t2t1
b t4t3t2 t5t4t5 a
arg1 resultarg2op
Quadruples
minus*
minus c*+=
cb (0)
b (2)(1) (3)a
arg1 arg2op
Triples
(4)
012345
minus*
minus c*+=
cb (0)
b (2)(1) (3)a
arg1 arg2op
Indirect Triples
(4)
012345
(0)(1)
(2)(3)(4)(5)
op353637383940
Examples
1. X=(a+b)*-c/d
13
14
Three-Address Code Generation for a Simple Grammar
Productions Semantic RulesS id := E id.place lookup(id.name);
S.code E.code || gen(id.place ‘:=‘ E.place);E E1 + E2 E.place newtemp();
E.code E1.code || E2.code || gen(E.place ‘:=‘ E1.place ‘+’ E2.place);E E1 * E2 E.place newtemp();
E.code E1.code || E2.code || gen(E.place ‘:=‘ E1.place ‘*’ E2.place);E ( E1 )E.code E1.code;
E.place E1.place;E E1 E.place newtemp();
E.code E1.code || gen(E.place ‘:=‘ ‘uminus’ E1.place);E id E.place lookup(id.name);
E.code ‘’ (empty string)
Attributes: E.place: location that holds the value of expression EE.code: sequence of instructions that are generated for E
Procedures: newtemp(): Returns a new temporary each time it is calledgen(): Generates instruction (have to call it with appropriate arguments)lookup(id.name): Returns the location of id from the symbol table
15
Stack Machine Code Generation for a Simple Grammar
Productions Semantic RulesS id := E id.place lookup(id.name);
S.code E.code || gen(‘store’ id.place);E E1 + E2 E.code E1.code || E2.code || gen(‘add’);
(arguments for the add instruction are in the top of the stack)E E1 * E2 E.code E1.code || E2.code || gen(‘multiply’);E ( E1 )E.code E1.code; E E1 E.code E1.code || gen( ‘negate‘);E id E.code gen(‘load’ id.place)
Attributes: E.code: sequence of instructions that are generated for E(no place for an expression is needed since the result of an expressionis stored in the operand stack)
Procedures: newtemp(): Returns a new temporary each time it is calledgen(): Generates instruction (have to call it with appropriate arguments)lookup(id.name): Returns the location of id from the symbol table
16
Code Generation for Boolean Expressions
• Two approaches
– Numerical representation
– Implicit representation
• Numerical representation
– Use 1 to represent true, use 0 to represent false
– For three-address code store this result in a temporary
– For stack machine code store this result in the stack
• Implicit representation
– For the boolean expressions which are used in flow-of-control statements (such as if-statements, while-statements etc.) boolean expressions do not have to explicitly compute a value, they just need to branch to the right instruction
– Generate code for boolean expressions which branch to the appropriate instruction based on the result of the boolean expression
17
Boolean Expressions: Numerical Representation
Attributes : E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for Eid.place: location for id
Global variable: nextstat: Returns the location of the next instruction to be generated(each call to gen() increments nextstat by 1)
Productions Semantic Rules
E id1 relop id2 E.place newtemp();E.code gen(‘if’ id1.place relop.op id2.place ‘goto’ nextstat+3);
|| gen(E.place ‘:=‘ ‘0’) || gen(‘goto’ nextstat+2)|| gen(E.place ‘:=‘ ‘1’);
E E1 and E2 E.place newtemp();E.code E1.code || E2.code
|| gen(E.place ‘:=‘ E1.place ‘and’ E2.place);
18
Boolean Expressions: Implicit Representation
Productions Semantic Rules
E id1 relop id2 E.code gen(‘if’ id1.place relop.op id2.place ‘goto’ E.true)|| gen(‘goto’ E.false);
E E1 and E2 E1.true newlabel();E1.false E. false; (short-circuiting)E2.true E. true;E2.false E. false;E.code E1.code || gen(E1.true ‘:’) || E2.code ;
Attributes : E.code: sequence of instructions that are generated for EE.false: instruction to branch to if E evaluates to falseE.true: instruction to branch to if E evaluates to true(E.code is synthesized whereas E.true and E.false are inherited)id.place: location for id
can be any relational operator:==, <=, >= !=
These places will be filled with lables later on when they become available
This generated labelwill be inserted to the place for E1.true in the code generated for E1
19
Example
100 if x < y goto 103101 t1 := 0102 goto 104103 t1 := 1104 if a = b goto 107105 t2 := 0106 goto 108107 t2 := 1108 t3 := t1 and t2
Input boolean expression:x < y and a == b
Numerical representation:
if x < y goto L1goto LFalse
L1: if a = b goto LTruegoto LFalse...
LTrue:
LFalse:
Implicit representation:
These are the locations ofthree-address code instructions, they are not labels
These labels will be generatedlater on, and will be insertedto the corresponding places
20
Flow-of-Control Statements
If-then-else
• Branch based on the result of boolean expression
Loops
• Evaluate condition before loop (if needed)
• Evaluate condition after loop
• Branch back to the top if condition holds
Merges test with last block of loop body
While, for, do, and until all fit this basic model
Pre-test
Loop body
Post-test
Next block
21
Flow-of-Control Statements: Code Structure
E.code
S1.code
goto S.next
S2.code
to E.true
to E.false
E.true:
E.false:
S.next:
S if E then S1 else S2
if E evaluates to true
if E evaluates to false
E.code
S1.code
goto S.begin
to E.true
to E.false
E.true:
E.false:
S.begin:
S while E do S1
Another approach is to place E.code after S1.code
22
Flow-of-Control Statements
Productions Semantic RulesS if E then S1 else S2 E.true newlabel();
E.false newlabel(); S1.next S. next;S2.next S. next;S.code E.code || gen(E.true ‘:’) || S1.code
|| gen(‘goto’ S.next) || gen(E.false ‘:’) || S2.code ;
S while E do S1 S.begin newlabel();E.true newlabel(); E.false S. next;S1.next S. begin;S.code gen(S.begin ‘:’) || E.code || gen(E.true ‘:’) || S1.code
|| gen(‘goto’ S.begin);
S S1 ; S2 S1.next newlabel();S2.next S.next;S.code S1.code || gen(S1.next ‘:’) || S2.code
Attributes : S.code: sequence of instructions that are generated for SS.next: label of the instruction that will be executed immediately after S(S.next is an inherited attribute)
23
Example
Input code fragment:
while (a < b) { if (c < d) x = y + z; else x = y – z}
L1: if a < b goto L2goto LNext
L2: if c < d goto L3goto L4
L3: t1 := y + zx := t1goto L1
L4: t2 := y – zx := t2goto L1
LNext: ...
24
Backpatching
• E.true, E.false, S.next may not be computed in a single pass (they are inherited attributes)
• Backpatching is a technique for generating labels for E.true, E.false, S.next and inserting them to the appropriate locations
• Basic idea
– Keep lists E.truelist, E.falselist, S.nextlist
• E.truelist: the list of instructions where the label for E.true have to be inserted when it becomes available
• S.nextlist: the list of instructions where the label for S.next have to be inserted when it becomes available
– When labels E.true, E.false, S.next are computed these labels are inserted to the instructions in these lists
25
Flow-of-Control Statements: Case Statements
Case Statements1 Evaluate the controlling expression2 Branch to the selected case3 Execute the code for that case4 Branch to the statement after the case
Part 2 is the key
Strategies
• Linear search (nested if-then-else constructs)
• Build a table of case values & binary search it
• Directly compute an address (requires dense case set)
– Use an array of labels that is addressed by the case value
26
Type Conversions
Mixed-type expressions
• Insert conversions as needed from conversion table
– i2f r1, r2 (convert the integer value in register r1 to float, and store the result in register r2)
• Most languages have symmetric conversion tables
Typical Addition
Table
translation of a simple if-statement•
•
Backpatching
• Previous codes for Boolean expressions insert symbolic labels for jumps
• It therefore needs a separate pass to set them to appropriate addresses• We can use a technique named backpatching to avoid this• We assume we save instructions into an array and labels will be
indices in the array• For nonterminal B we use two attributes B.truelist and B.falselist
together with following functions:– makelist(i): create a new list containing only I, an index into the array
of instructions– Merge(p1,p2): concatenates the lists pointed by p1 and p2 and returns
a pointer to the concatenated list– Backpatch(p,i): inserts i as the target label for each of the instruction
on the list pointed to by p
Backpatching for Boolean Expressions•
•
Backpatching for Boolean Expressions• Annotated parse tree for x < 100 || x > 200 && x ! = y
Flow-of-Control Statements
Translation of a switch-statement