Compiler QBank From CD

Anna UniversitySolved Question Papers

B.E./B.Tech. 6th SemesterComputer Science and

Engineering

Delhi Chennai

FM.indd 1 4/27/2014 6:01:29 PM

Copyright 2015 Pearson India Education Services Pvt. Ltd

This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, resold, hired out, or otherwise circulated without the publishers prior written consent in any form of binding or cover other than that in which it is published and without a similar condition including this condition being imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior written permission of both the copyright owner and the publisher of this book. ISBN 978-93-325-4246-4 First Impression

Published by Pearson India Education Services Pvt. Ltd, CIN: U72200TN2005PTC057128, formerly known as TutorVista Global Pvt. Ltd, licensee of Pearson Education in South Asia. Head Office: 7th Floor, Knowledge Boulevard, A-8(A), Sector 62, Noida 201 309, U.P., India. Registered Office: Module G4, Ground Floor, Elnet Software City, TS-140, Blocks 2 and 9, Rajiv Gandhi Salai, Taramani, Chennai 600 113, Tamil Nadu, India. Fax: 080-30461003, Phone: 080-30461060, www.pearson.co.in, Email: [email protected].

Semester-VI

Principles of Compiler Design

02-Priniciples of complier Design-CSE.indb 102-Priniciples of complier Design-CSE.indb 1 12/13/2012 5:14:18 PM12/13/2012 5:14:18 PM

The aim of this publication is to supply information taken from sources believed to be valid and reliable. This is not an attempt to render any type of professional advice or analysis, nor is it to be treated as such. While much care has been taken to ensure the veracity and currency of the information presented within, neither the publisher nor its authors bear any responsibility for any damage arising from inadvertent omissions, negligence or inaccuracies (typographical or factual) that may have found their way into this book.

EEE_Sem-VI_Chennai_FM.indd ivEEE_Sem-VI_Chennai_FM.indd iv 12/7/2012 6:40:43 PM12/7/2012 6:40:43 PM

B.E./B.TECH. DEGREE EXAMINATION,

MAY/JUNE 2012

Sixth Semester

Computer Science and Engineering

CS 2352 /CS 62/ 10144 CS 602 PRINCIPLES OF COMPILER DESIGN

(Regulation 2008)

Time: Three hours Maximum: 100 marks

Answer All Questions

PART A (10 2 = 20 marks)

1. Mention few cousins of the compiler.

2. What are the possible error recovery actions in lexical analyzer?

3. De ne an ambiguous grammar.

4. What is dangling reference?

5. Why are quadruples preferred over triples in an optimizing compiler?

6. list out the motivations for back patching

7. De ne ow graph.

8. How to perform register assignment for outer loops?

9. What is the use of algebraic identities in optimization of basic blocks?

10. List out two properties of reducible ow graph.

PART B (5 16 = 80 marks)

11. (a) (i) What are the various of the compiler? Explain each phase in detail (10)

(ii) Brie y explain the compiler construction tools. (6)


(Common to Information Technology)

2.4 B.E./B.Tech. Question Papers

Or

(b) (i) What are the issues in Lexical analysis? (4)

(ii) Elaborate in detail the recognition of tokens. (12)

12. (a) (i) Construct the predictive parser for the following grammar:S (L)/aL L, S/S (10)

(ii) Describe the con icts that may occur during shift reduce parsing (6)

Or

(b) (i) Explain the detail about the speci cation of a simple type checker (10)

(ii) How to subdivide a runtime memory in to code and data areas. Explain. (6)

13. (a) (i) Describe the various types of three address statements. (8)

(ii) How names can be looked up in the symbol table? Discuss. (8)

Or

(b) (i) Discuss the different methods for translating Boolean expres-sions in detail. (12)

(ii) Explain the following grammar for a simple procedure call state-ment. S->call id(Elist) (4)

14. (a) (i) Explain in detail about the various issues in design of code generator (10)

(ii) Write an algorithm to partition a sequence of three address state-ments into basic blocks. (6)

Or

(b) (i) Explain the code generation algorithm in detail. (8)

(ii) Construct the DAG for the following basic block (8)d: = b*ce: a+b


Principles of Compiler Design (May/June 2012) 2.5

b: = b*ca:= e-d

14. (a) (i) Explain the principle sources of optimization in detail. (8)

(ii) Discuss the various Peephole optimization in detail. (8)

Or

(b) (i) How to trace the data ow analysis of structured program? Discuss. (6)

(ii) Explain the common sub expression elimination, copy propa-gation and transformations for moving loop invariant computa-tions in detail. (10)


SolutionsPART A

1. Cousins of compiler means the context in which the compiler typically operates. Such contexts are basically the programs such as Preprocessor, assemblers, loader and link editors.

2. a. Deleting an extraneous characterb. inserting a missing characterc. replacing an incorrect character by a correct characterd. transposing two adjacent characters

3. A grammar that produces more than one parse for some sentence is said to be ambiguous grammar.Example Given grammar G: E E+E | E*E | (E) | - E | idThe sentence id+id*id has the following two distinct leftmost derivations:E E+ E E E* EE id + E E E + E * EE id + E * E E id + E * EE id + id * E E id + id * EE id + id * id E id + id * id

4. If a heap variable is destroyed, any remaining pointer variable or object reference that still refers to it is said to contain a dangling reference. Un-like lower level languages such as C, dereferencing a dangling reference will not crash or corrupt your IDL session. It will, however, fail with an error message.For example: ; Create a new heap variable. A = PTR_NEW(23) ; Print A and the value of the heap variable A points to. PRINT, A, *A IDL prints: 23

5. In the quadruple representation using temporary names the entries in the symbol table against those temporaries can be obtained The advantage with quadruple representation is that one can quickly access the value of temporary variables using symbol table . use of tem-poraries introduces the level of indirection for the use of symbol table in quadruple representation



Whereas ,in triple representation the pointers are used , by using pointers one can access directly the symbol table entry

6. To overcome the problem of problem of processing the incomplete infor-mation in one pass the backpatching technique is used

7. A ow graph is a directed graph in which the ow control information is added to the basic blocks.

(i) The nodes to the ow graph are represented by basic blocks (ii) The block whose leader is the rst statement is called initial block.(iii) There is a directed edge from block B1 to block B2 if B2 imme-

diately follows B1 in the given sequence. We can say that B1 is a predecessor of B2.

8. Consider that there are two loops L1 is outer and L2 is an inner loop. And allocation of variable a is to be done to some register . the approximate scenario is as given below:

Loop L1

} L1L2

} L1L2

... Loop L2

Following criteria should be adopted for register assignment for outer loop1. if a is allocated in Loop L2 then it should not be allocated in L1-L22. if a is allocated in L1 and it is not allocated in L2 then store on a en-

trance to L2 and load a while leaving L23. if a is allocated in L2 and not in L1 then load a on entrance of L2 and

store a on exit from L2

9. There is no end to the amount of algebraic simpli cation that can be attempted through peephole optimization. Only a few algebraic identities occur frequently enough that it is worth considering implementing them. For example, statements such asx := x+0x:= x/1x := x * 1

10. The reducible graph is a ow graph in which there are two types of edges forward edges and backward edges. These edges have following properties,1. the forward edge form an acyclic graph 2. the back edges are such edges whose head dominates their tail.



PART B

11. (a) (i) Phases of CompilerA Compiler operates in phases, each of which transforms the source program from one representation into another. The following are the phases of the compiler:

Main phases:

1) Lexical analysis2) Syntax analysis3) Semantic analysis4) Intermediate code generation5) Code optimization6) Code generation

Sub-Phases:

1) Symbol table management2) Error handling

Lexical Analysis:

It is the rst phase of the compiler. The lexical analysis is called scanning. It is the phase of compilation in which the complete source code is scanned and broken up into group of stings called tokens.

It reads the characters one by one, starting from left to right and forms the tokens. Token represents a logically cohesive sequence of characters such as keywords, operators, identi ers, special symbols etc.Example: position: =initial + rate*601.The identi er position2. The assignment symbol = 3. The identi er initial4. The plus sign5. he identi er rate6.The multiplication sign7. The constant number 60

Syntax Analysis:

Syntax analysis is the second phase of the compiler. It is also known as parser. It gets the token stream as input from the lexi-cal analyzer of the compiler and generates syntax tree as the output.



Syntax tree:

It is a tree in which interior nodes are operators and exterior nodes are operands.Example: For position: =initial + rate*60, syntax tree is

Position +

initial

rate 60

=

Semantic Analysis:

Semantic Analysis is the third phase of the compiler. It gets input from the syntax analysis as parse tree and checks whether the given syntax is correct or not.

It performs type conversion of all the data types into real data types.

=

Position +

initial

rate (int to float)

60

Intermediate code generation:Intermediate code generation gets input from the semantic analy-sis and converts the input into output as intermediate code such as three-address code.

This code is in variety of forms as quadruple, triple and indirect triple .The three-address code consists of a sequence of instruc-tions, each of which has almost three operands.Examplet1:=int to oat(60) t2:= rate *t1t3:=initial+t2position:=t3

Code Optimization:

Code Optimization gets the intermediate code as input and pro-duces optimized intermediate code as output. This phase reduces the redundant code and attempts to improve the intermediate code so that faster-running machine code will result.



During the code optimization, the result of the program is not affected. To improve the code generation, the optimization involves deduction and removal of dead code (unreachable code). calculation of constants in expressions and terms. collapsing of repeated expression into temporary string. loop unrolling. moving code outside the loop. removal of unwanted temporary variables.

Eample:t1:= rate *60position:=initial+t1

Code Generaton:

Code Generation gets input from code optimization phase and produces the target code or object code as result.

Intermediate instructions are translated into a sequence of ma-chine instructions thatPerform the same task.The code generation involves allocation of register and memory generation of correct references generation of correct data types generation of missing code Machine instructions:

MOV rate, R1MUL #60, R1MOV initial, R2ADD R2, R1MOV R1, position

Symbol Table Management:

Symbol table is used to store all the information about identi ers used in the program.It is a data structure containing a record for each identi er, with elds for the attributes of the identi er.It allows to nd the record for each identi er quickly and to store or retrieve data from that record.Whenever an identi er is detected in any of the phases, it is stored in the symbol table.



Error Handling:

Each phase can encounter errors. After detecting an error, a phase must handle the error so that compilation can proceed.

In lexical analysis, errors occur in separation of tokens.In syntax analysis, errors occur during construction of syntax

tree.In semantic analysis, errors occur when the compiler detects

constructs with right syntactic structure but no meaning and dur-ing type conversion.

In code optimization, errors occur when the result is affected by the optimization.

In code generation, it shows error when code is missing etc.

m

Object Program

Syntax Analyzer

Lexical Analyzer

Error Detectionand Handling

Intermediate Code Generator

Code Optimization

Code Generation

Semantic AnalyzerSymbol TableManagement

Fig: Phases of compiler

(ii) These are specialized tools that have been developed for helping implement various phases of a compiler. The following are the compiler construction tools:1. Scanner Generator2. Parser Generators3. Syntax-Directed Translation4. Automatic Code Generators5. Data-Flow Engines



1. Scanner Generator:

These generate lexical analyzers, normally from a speci ca-tion based on regular expressions.

The basic organization of lexical analyzers is based on nite automation.

2. Parser Generators:

These produce syntax analyzers, normally from input that is based on a context-free grammar.

It consumes a large fraction of the running time of a com-piler.

Example-YACC (Yet Another Compiler-Compiler).

3. Syntax-Directed Translation:

These produce routines that walk the parse tree and as a result generate intermediate code.

Each translation is de ned in terms of translations at its neighbor nodes in the tree.

4. Automatic Code Generators:

It takes a collection of rules to translate intermediate lan-guage into machine language. The rules must include suf- cient details to handle different possible access methods for data.

5. Data-Flow Engines:

It does code optimization using data- ow analysis, that is, the gathering of information about

(b) (i) There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing1. Simpler design is perhaps the most important consideration.

The separation of lexical analysis often allows us to simplify one or the other of these phases

2. Compiler ef ciency improved a separate lexical analyzer al-lows us to construct a specialized and potentially more ef- cient processor for the task. A large amount of time is spent reading the source program and partitioning it into tokens specialized buffering techniques for reading input characters and processing tokens can signi cantly speed up the perfor-mance of a compiler.

3. Compiler portability is enhanced. Input alphabet peculiarities and other devicespeci c anomalies can be restricted to the



lexical analyzer. The representation of special or non-stan-dard symbols, such as in Pascal, can be isolated in the lexi-cal analyzer.

(ii) Consider the following grammar fragment:stmt if expr then stmt | if expr then stmt else stmt | expr term relop term | termterm id | numWhere the terminals if , then, else, relop, id and num generate sets of strings given by the following regular de nitions:if ifthen thenelse elserelop =id letter(letter|digit)*num digit+ (.digit+)?(E(+|-)?digit+)?

For this language fragment the lexical analyzer will recognize the keywords if, then, else, as well as the lexemes denoted by relop, id, and num. To simplify matters, we assume keywords are reserved; that is, they cannot be used as identi ers.

Transition diagrams

It is a diagrammatic representation to depict the action that will take place when a lexical analyzer is called by the parser to get the next token. It is used to keep track of information about the characters that are seen as the forward pointer scans the input.

Transition diagram for identi ers and keywords

9 10letter otherStart

letter or digit

return(gettoken( ),install_id( ))11

12. (a) (i) As the given grammar is left recursive because of L L,S/SWe will rst eliminate left recursion . as AAa /b can be converted as AbA AaA|e



We can write LL,S/S L SL L,SL|eNow the grammar taken for predictive parsing is S(L)/a LSL L,SL|eNow we will compute FIRST and FOLLOW of non terminals FIRST(S) = {( , a} FIRST(L) = {( , a} FIRST(L) = {, , a}

FOLLOW(S) = {, , ) , $} FOLLOW(L)={)} FOLLOW(L)={)}The predictive parsing table can be constructed as

a ( ) $

S Sa S(L)

L LSL LSL

L L L

As we have constructed a predictive parsing table in the string (a,a)

State Input Action$S (a,a)$

$)L( (a,a)$ S(L)

$)L a,a)$ LSL

$)LS a,a)$ Sa

$)La a,a)$

$)L ,a)$ L,SL

$ )LS, ,a)$

$ )LS a)$ Sa

$ )La a)$

$ )L )$ L$ ) )$

$ $ Accept



(ii) Con icts in shift-reduce parsing:There are two con icts that occur in shift shift-reduce parsing:1. Shift-reduce con ict: The parser cannot decide whether to

shift or to reduce.2. Reduce-reduce con ict: The parser cannot decide which of

several reductions to make.1. Shift-reduce con ict:Example:Consider the grammar:EE+E | E*E | id and input id+id*id

Stack Input Action Stack Input Action

$ E+E *id $Reduce byEE+E

$ E+E *id $ Shift

$ E *id $ Shift $ E+E* id $ Shift

$ E* id $ Shift $ E+E*id $Reduce byEid

$ E*id $Reduce byEid

$ E+E*E $Reduce byEE*E

$ E*E *$Reduce byEE*E

$ E+E $Reduce byEE*E

$ E $ E

2. Reduce-reduce con ict:

Consider the grammar:M R+R | R+c | RR cand input c+c

Stack Input Action Stack Input Action

$ c+c $ Shift $ c+c $ Shift

$ c +c $Reduce byRc

$c +c $Reduce byRc

$ R +c $ Shift $R +c $ Shift

$ R+ c $ Shift $R+ c $ Shift

$ R+c $ Reduce byRc

$R+c $ Reduce byMR+c

(Continued)



Continued

$ R+R $ Reduce byMR+R

$M $

$ M $

12. (b) (i) The type checker is a translation scheme that synthesizes the type of each expression from the types of its sub expressions. Identi er must be declared before the identi er is used. The type checker can handle arrays, pointers, statements and functions.

A Simple Language

Consider the following grammar:P D ; ED D ; D | id : TT char | integer | array [ num ] of T | TE literal | num | id | E mod E | E [ E ] | E

Translation scheme:

P D ; ED D ; DD id : T { addtype (id.entry , T.type) }T char { T.type : = char }T integer { T.type : = integer }T T1 { T.type : = pointer(T1.type) }T array [ num ] of T1 { T.type : = array ( 1num.val , T1.type) }In the above language, There are two basic types: char and integer ; type_error is used to signal errors; the pre x operator builds a pointer type. Example, inte-ger leads to the type expression pointer ( integer ).

Type checking of expressions

In the following rules, the attribute type for E gives the type ex-pression assigned to the expression generated by E.1. E literal { E.type : = char }

E num { E.type : = integer }Here, constants represented by the tokens literal and num have type char and integer.2. E id {E.type : = lookup ( id.entry ) }lookup ( e ) is used to fetch the type saved in the symbol table entry pointed to by e.

i d



3. E E1 mod E2 {E.type : = if E1. type = integer and E2. type = integer then integer else type_error }The expression formed by applying the mod operator to two subexpressions of type integer has type integer; otherwise, its type is type_error.4. E E1 [E2] {E.type : = if E2.type = integer and E1.type = array (s,t) then t else type_error }In an array reference E1 [ E2 ] , the index expression E2 must have type integer. The result is the element type t obtained from the type array (s,t) of E1.5. E E1 { E.type : = if E1.type = pointer (t) then t else type_error }The post x operator yields the object pointed to by its oper-and. The type of E is the type t of the object pointed to by the pointer E.

Type checking of statements

Statements do not have values; hence the basic type void can be assigned to them. If an error isdetected within a statement, then type_error is assigned.

Translation scheme for checking the type of statements:

1. Assignment statement: S id : = E { S.type : = if id.type = E.type then void else type_error }2. Conditional statement: S if E then S1 { S.type : = if E.type = boolean then S1.type else type_error }3. While statement: S while E do S1 { S.type : = if E.type = boolean then S1.type else type_error }4. Sequence of statements: S S1 ; S2 { S.type : = if S1.type = void and S1.type = void then void else type_error }

Type checking of functions

The rule for checking the type of a function application is :E E1 ( E2) { E.type : = if E2.type = s and E1.type = s t then t else type_error }



(b) (ii) Storage Organization:

The executing target program runs in its own logical address space in which each program value has a location.

The management and organization of this logical address space is shared between the complier, operating system and target ma-chine. The operating system maps the logical address into physi-cal addresses, which are usually spread throughout memory.

Typical subdivision of run-time memory:

Code

Static Data

Stack

free memory

Heap

Run-time storage comes in blocks, where a byte is the smallest unit of addressable memory. Our bytes form a machine word. Multibyte objects are stored in consecutive bytes and given the address of rst byte.

The storage layout for data objects is strongly in uenced by the addressing constraints of the target machine.

A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may allocate 12 bytes to get alignment, leaving 2 bytes unused.

This unused space due to alignment considerations is referred to as padding.

The size of some program objects may be known at run time and may be placed in an area called static.

The dynamic areas used to maximize the utilization of space at run time are stack and heap.

13. (a) (i) The common three-address statements are:1. Assignment statements of the form x : = y op z, where op is

a binary arithmetic or logical operation.2. Assignment instructions of the form x : = op y, where op is

a unary operation. Essential unary operations include unary minus, logical negation, shift operators, and conversion op-erators that, for example, convert a xed-point number to a oating-point number.

3. Copy statements of the form x : = y where the value of y is assigned to x.



4. The unconditional jump goto L. The three-address state-ment with label L is the next to be executed.

5. Conditional jumps such as if x relop y goto L. This instruc-tion applies a relational operator (=, etc. ) to x and y, and executes the statement with label L next if x stands in relation relop to y. If not, the three-address statement fol-lowing if x relop y goto L is executed next, as in the usual sequence.

6. param x and call p, n for procedure calls and return y, where y representing a returned value is optional. For example,param x1param x2param xncall p,ngenerated as part of a call of the procedure p(x1,x2,xn ).

7. Indexed assignments of the form x : = y[i] and x[i] : = y.8. Address and pointer assignments of the form x : = &y , x :

= *y, and *x : = y.(a) (ii) There are two types of name representation

1. Fixed-length name2. Variable length name

1. Fixed-length name

A xed space for each name is allocated in symbol table .in this type of storage if name is too small then there is wastage of spaceFor Example :

NameAttributeStarting index Length

0 1010 414 216 2

2. Variable Length Name

The amount of space requited by string is used to store the names.The name can be stored with the help of starting index and length of each name



For example:

Name Attributec a l c u l a t e

s u m

a

b

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17c a l c u l a t e $ s u m $ a $ b $

13. (b) (i) Boolean expressions have two primary purposes. They are used to compute logical values, but more often they are used as con-ditional expressions in statements that alter the ow of control, such as if-then-else, or while-do statements.

Boolean expressions are composed of the boolean operators ( and, or, and not ) applied to elements that are boolean variables or relational expressions. Relational expressions are of the form E1 relop E2, where E1 and E2 are arithmetic expressions.Here we consider boolean expressions generated by the follow-ing grammar :E E or E | E and E | not E | ( E ) | id relop id | true | false

Methods of Translating Boolean Expressions:

There are two principal methods of representing the value of a boolean expression. They are : i. To encode true and false numerically and to evaluate a bool-

ean expression analogously to an arithmetic expression. Of-ten, 1 is used to denote true and 0 to denote false.

ii. To implement boolean expressions by ow of control, that is, representing the value of a boolean expression by a position reached in a program. This method is particularly convenient in implementing the boolean expressions in ow-of-control statements, such as the if-then and while-do statements.

Numerical Representation

Here, 1 denotes true and 0 denotes false. Expressions will be evaluated completely from left to right, in a manner similar to arithmetic expressions.For example :The translation fora or b and not c



is the three-address sequencet1 : = not ct2 : = b and t1t3 : = a or t2A relational expression such as a < b is equivalent to the condi-tional statementif a < b then 1 else 0which can be translated into the three-address code sequence (again, we arbitrarily start statement numbers at 100) :100 : if a < b goto 103101 : t : = 0102 : goto 104103 : t : = 1104 :

Translation scheme using a numerical representation for Booleans

E E1 or E2 { E.place : = newtemp; emit( E.place : = E1.place or E2.place ) }E E1 and E2 { E.place : = newtemp; emit( E.place : = E1.place and E2.place ) }E not E1 { E.place : = newtemp; emit( E.place : = not E1.place ) }E ( E1 ) { E.place : = E1.place }E id1 relop id2 { E.place : = newtemp; emit( if id1.place relop.op id2.place goto nextstat + 3); emit( E.place : = 0 ); emit(goto nextstat +2); emit( E.place : = 1) }E true { E.place : = newtemp; emit( E.place : = 1) }E false { E.place : = newtemp; emit( E.place : = 0) }

Short-Circuit Code:

We can also translate a boolean expression into three-address code without generating code for any of the boolean operators and without having the code necessarily evaluate the entire expression. This style of evaluation is sometimes called short-circuit or jumping code. It is possible to evaluate boolean expressions without generating code for the boolean operators



and, or, and not if we represent the value of an expression by a position in the code sequence.

Translation of a < b or c < d and e < f

100 : if a < b goto 103 107 : t2 : = 1101 : t1 : = 0 108 : if e < f goto 111102 : goto 104 109 : t3 : = 0103 : t1 : = 1 110 : goto 112104 : if c < d goto 107 111 : t3 : = 1105 : t2 : = 0 112 : t4 : = t2 and t3106 : goto 108 113 : t5 : = t1 or t4

Flow-of-Control Statements

We now consider the translation of boolean expressions into three-address code in the context of if-then, if-then-else, and while-do statements such as those generated by the followinggrammar:S if E then S1 | if E then S1 else S2 | while E do S1In each of these productions, E is the Boolean expression to be translated. In the translation, we assume that a three-address statement can be symbolically labeled, and that the function newlabel returns a new symbolic label each time it is called.

E.true is the label to which control ows if E is true, and E.false is the label to which control ows if E is false.

The semantic rules for translating a ow-of-control statement S allow control to ow from the translation S.code to the three-address instruction immediately following S.code. S.next is a label that is attached to the rst three-address instruction to be executed after the code for S.

Code for if-then , if-then-else, and while-do statements

E.code

E.code E.truerr :

to E.truerr

E.truerr :

E.falsff e:

to E.falsff e

to E.truerrgoto S.nexteeto E.falseff

E.falsff e:

S.nextee :tt

S1.code

S1.code

S2SS .code

(a) if-then (b) if-then-else



E.code

E.truerr :

S.beginii :to E.truerrto E.falsff e

goto S.nextee

E.falseff :

S1.code

(c) while-doSyntax-directed de nition for ow-of-control statements

PRODUCTION SEMANTIC RULES

S if E then S1 E.true : = newlabel;E.false : = S.next;S1.next : = S.next;S.code : = E.code || gen(E.true :) || S1.code

S if E then S1 else S2 E.true : = newlabel;E.false : = newlabel;S1.next : = S.next;S2.next : = S.next;S.code : = E.code || gen(E.true :) || S1.code ||gen(goto S.next) ||gen( E.false :) || S2.code

S while E do S1 S.begin : = newlabel;E-True: = newlabelE.false : = S.next;S1.next : = S.begin;S.code : = gen(S.begin :)|| E.code ||gen(E.true :) || S1.code ||gen(goto S.begin)

Control-Flow Translation of Boolean Expressions:Syntax-directed de nition to produce three-address code for Booleans

PRODUCTION SEMANTIC RULES

E E1 or E2 E1.true : = E.true;

E1.false : = newlabel;

E2.true : = E.true;(Continued)



Continued

E2.false : = E.false;

E.code : = E1.code || gen(E1.false :) || E2.code

E E1 and E2 E.true := newlabel;


E2.true : = E.true;


E.code : = E1.code || gen(E1.true :) || E2.code

E not E1 E1.true : = E.false;

E1.false : = E.true;

E.code : = E1.code

E ( E1 ) E1.true : = E.true;


E.code : = E1.code

E id1 relop id2 E.code : = gen(if id1.place relop.op id2.place

goto E.true) || gen(goto E.false)

E true E.code : = gen(goto E.true)

E false E.code : = gen(goto E.false)

(ii) The procedure is such an important and frequently used pro-gramming construct that it is imperative for a compiler to gener-ate good code for procedure calls and returns. The run-time rou-tines that handle procedure argument passing, calls and returns are part of the run-time support package.

Let us consider a grammar for a simple procedure call state-ment1. S call id ( Elist )2. Elist Elist , E3. Elist E

Calling Sequences:

The translation for a call includes a calling sequence, a sequence of actions taken on entry to and exit from each procedure. The falling are the actions that take place in a calling sequence :1. When a procedure call occurs, space must be allocated for the

activation record of the called procedure.



2. The arguments of the called procedure must be evaluated and made available to the called procedure in a known place.

3. Environment pointers must be established to enable the called procedure to access data in enclosing blocks.

4. The state of the calling procedure must be saved so it can resume execution after the call. Also saved in a known place is the return address, the location to which the called routine must transfer after it is nished.

5. Finally a jump to the beginning of the code for the called procedure must be generated.

6. For example, consider the following syntax-directed translation(1) S call id ( Elist ) { for each item p on queue do emit ( param p ); emit (call id.place) }(2) Elist Elist , E { append E.place to the end of queue }(3) Elist E { initialize queue to contain only E.place Here, the code for S is the code for Elist, which evaluates the arguments, followed by a param p statement for each argument, followed by a call statement queue is emptied and then gets a single pointer to the symbol table location for the name that de-notes the value of E.

14. (a) (i) Issues in the design of code generatorThe following issues arise during the code generation phase:1. Input to code generator2. Target program3. Memory management4. Instruction selection5. Register allocation6. Evaluation order

1. Input to code generator:

The input to the code generation consists of the intermediate representation of the source program produced by front end, to-gether with information in the symbol table to determine run-time addresses of the data objects denoted by the names in the intermediate representation.Intermediate representation can be:a. Linear representation such as post x notationb. Three address representation such as quadruples



c. Virtual machine representation such as stack machine coded. Graphical representations such as syntax trees and dags.Prior to code generation, the front end must be scanned, parsed and translated into intermediate representation along with nec-essary type checking. Therefore, input to code generation is as-sumed to be error-free.

2. Target program:

The output of the code generator is the target program. The out-put may be:a. Absolute machine language It can be placed in a xed memory location and can be executed immediately.b. Relocatable machine language It allows subprograms to be compiled separately.c. Assembly language Code generation is made easier.

3. Memory management:

Names in the source program are mapped to addresses of data objects in run-time memory by the front end and code generator.

It makes use of symbol table, that is, a name in a three-address statement refers to a symbol-table entry for the name.

Labels in three-address statements have to be converted to ad-dresses of instructions For example,

j : goto i generates jump instruction as follows : if i < j, a backward jump instruction with target address equal to location of code for quadruple i is generated.

if i > j, the jump is forward. We must store on a list for qua-druple i the location of the rst machine instruction generated for quadruple j. When i is processed, the machine locations for all instructions that forward jumps to I are lled.

4. Instruction selection:

The instructions of target machine should be complete and uni-form. Instruction speeds and machine idioms are important fac-tors when ef ciency of target program is considered. The qual-ity of the generated code is determined by its speed and size.For exampleX: = y+zA: = +t



The code for the above statements can be generated as follows;MOV y,R0ADD Z,R0MOV R0,xMOV x,R0ADD t,R0MOV R0,a

5. Register allocation

Instructions involving register operands are shorter and faster than those involving operands in memory.The use of registers is subdivided into two sub problems:Register allocation the set of variables that will reside in reg-isters at a point in the program is selected.Register assignment the speci c register that a variable will reside in is picked.For example , consider the division instruction of the form :D x, ywhere, x dividend even register in even/odd register pair y divisoreven register holds the remainderodd register holds the quotient

6. Evaluation order

The order in which the computations are performed can affect the ef ciency of the target code. Some computation orders re-quire fewer registers to hold intermediate results than others

14. (a) (ii) Basic BlocksA basic block is a sequence of consecutive statements in which ow of control enters at the beginning and leaves at the end without any halt or possibility of branching except at the end. The following sequence of three-address statements forms a ba-sic block:Example:t1 : = a * at2 : = a * bt3 : = 2 * t2t4 : = t1 + t3t5 : = b * bt6 : = t4 + t5



Basic Block Construction:

Algorithm: Partition into basic blocksInput: A sequence of three-address statementsOutput: A list of basic blocks with each three-address state-ment in exactly one blockMethod:1. We rst determine the set of leaders, the rst statements

of basic blocks. The rules we use are of the following:a. The rst statement is a leader.b. Any statement that is the target of a conditional or uncon-

ditional goto is a leader.c. statement that immediately follows a goto or conditional

goto statement is a leader.2. For each leader, its basic block consists of the leader and

all statements up to but not including the next leader or the end of the program.

(b) (i) A code generator generates target code for a sequence of three- address statements and effectively uses registers to store oper-ands of the statements.For example: consider the three-address statement a : = b+cIt can have the following sequence of codes:ADD Rj, Ri Cost = 1 // if Ri contains b and Rj contains c(or)ADD c, Ri Cost = 2 // if c is in a memory location(or)MOV c, Rj Cost = 3 // move c from memory to Rj and addADD Rj, Ri

Register and Address Descriptors:

A register descriptor is used to keep track of what is currently in each registers. The register descriptors show that initially all the registers are empty.An address descriptor stores the location where the current value of the name can be found at run time

A code-generation algorithm:

Gen_code(operand1,operand2){

If (operand1.addressmode = R)



Generate (ADD operand2, RO);Else if (operator =-)Generate (SUB operand2, RO);Else if (operator =*)Generate (MUL operand2, RO);Else if (operator =/ )Generate (DIV operand2, RO);

}Else if (operand2.addressmode =R)

{If (operator =+)Generate (ADD operand 1, RO);Else if (operator =-)Generate (SUB operand1, RO);Else if(operator =*)Generate (MUL operand1, RO);Else if (operator =/ )Generate (DIV operand1, RO);

}Else

{Generate (MOV operand2, RO);If (operator =+)Generate (ADD operand2, RO);Else if (operator =-)Generate (SUB operand2, RO);Else if(operator =*)Generate (MUL operand2, RO);Else if (operator =/ )Generate (DIV operand2, RO);

}The algorithm takes as input a sequence of three-address state-ments constituting a basic block.

For each three-address statement of the form x : = y op z, per-form the following actions:1. Invoke a function getreg to determine the location L where

the result of the computation y op z should be stored.2. Consult the address descriptor for y to determine y, the cur-

rent location of y. Prefer the register for y if the value of y is currently both in memory and a register. If the value of y is not already in L, generate the instruction MOV y , L to place a copy of y in L.



3. Generate the instruction OP z , L where z is a current lo-cation of z. Prefer a register to a memory location if z is in both. Update the address descriptor of x to indicate that x is in location L. If x is in L, update its descriptor and remove x from all other descriptors.

4. If the current values of y or z have no next uses, are not live on exit from the block, and are in registers, alter the register descriptor to indicate that, after execution of x : = y op z , those registers will no longer contain y or z.

(ii) The DAG can be constructed in following steps:

Step 1:*d

b cStep 2:

+e d

a

Step 3:

+e *d,b

a b c

b c

Step 4:a

+e*d, b

cba

15. (a) (i) A transformation of a program is called local if it can be per-formed by looking only at the statements in a basic block; other-wise, it is called global.

Many transformations can be performed at both the local and global levels. Local transformations are usually performed rst.



Function-Preserving Transformations

There are a number of ways in which a compiler can improve a program without changing the function it computes.The transformations:1. Common sub expression elimination,2. Copy propagation,3. Dead-code elimination, and4. Constant foldingare common examples of such function-preserving transforma-tions. The other transformations come up primarily when global optimizations are performed.

Common Sub expressions elimination:

An occurrence of an expression E is called a common sub-ex-pression if E was previously computed, and the values of vari-ables in E have not changed since the previous computation. We can avoid recomputing the expression if we can use the previ-ously computed value.For examplet1: = 4*it2: = a [t1]t3: = 4*jt4: = 4*it5: = nt6: = b [t4] +t5The above code can be optimized using the common sub-ex-pression elimination ast1: = 4*it2: = a [t1]t3: = 4*jt5: = nt6: = b [t1] +t5The common sub expression t4: = 4*i is eliminated as its com-putation is already in t1. And value of i is not been changed from de nition to use.

Copy Propagation:

Assignments of the form f : = g called copy statements, or cop-ies for short. The idea behind the copy-propagation transforma-tion is to use g for f, whenever possible after the copy statement f: = g. Copy propagation means use of one variable instead of another. This may not appear to be an improvement, but as we shall see it gives us an opportunity to eliminate x.



For example:x = Pi;A = x*r*r;The optimization using copy propagation can be done as fol-lows:A = Pi*r*r;Here the variable x is eliminated

Dead-Code Eliminations:

A variable is live at a point in a program if its value can be used subsequently; otherwise, it is dead at that point. A related idea is dead or useless code, statements that compute values that never get used. While the programmer is unlikely to introduce any dead code intentionally, it may appear as the result of previous transfor-mations. An optimization can be done by eliminating dead code.Example:i=0;if(i=1){a=b+5;}Here, if statement is dead code because this condition will never get satis ed.

Constant folding:

We can eliminate both the test and printing from the object code. More generally, deducing at compile time that the value of an expression is a constant and using the constant instead is known as constant folding.One advantage of copy propagation is that it often turns the copy statement into dead code.For example,a=3.14157/2 can be replaced bya=1.570 there by eliminating a division operation.

Loop Optimizations:

We now give a brief introduction to a very important place for optimizations, namely loops, especially the inner loops where programs tend to spend the bulk of their time. The running time of a program may be improved if we decrease the number of instructions in an inner loop, even if we increase the amount of code outside that loop.



Three techniques are important for loop optimization:code motion, which moves code outside a loop; Induction-variable elimination, which we apply to replace variables from inner loop.Reduction in strength, which replaces and expensive operation by a cheaper one, such as a multiplication by an addition.

Code Motion:

An important modi cation that decreases the amount of code in a loop is code motion. This transformation takes an expression that yields the same result independent of the number of times a loop is executed ( a loop-invariant computation) and places the expression before the loop. Note that the notion before the loop assumes the existence of an entry for the loop. For ex-ample, evaluation of limit-2 is a loop-invariant computation in the following while-statement:while (i < = limit-2) /* statement does not change limit*/Code motion will result in the equivalent oft = limit-2;while (i< = t) /* statement does not change limit or t */

Induction Variables :

A variable x is called an induction variable of loop L if the value of variable gets changed every time. It is either decremented or incremented by some constantFor example:B1i: = i+1t1: = 4*jt2: = a[t1]if t2


For example, x is invariably cheaper to implement as x*x than as a call to an exponentiation routine.

(ii) A statement-by-statement code-generations strategy often produce target code that contains redundant instructions and suboptimal constructs .The quality of such target code can be improved by applying optimizing transformations to the tar-get program.

A simple but effective technique for improving the target code is peephole optimization, a method for trying to improving the performance of the target program by examining a short sequence of target instructions (called the peephole) and replacing these in-structions by a shorter or faster sequence, whenever possible.The following examples of program transformations those are characteristic of peephole optimizations:

Redundant-instructions elimination Flow-of-control optimizations Algebraic simpli cations Use of machine idioms Unreachable Code

Redundant-instructions elimination

If we see the instructions sequenceMOV R0,aMOV a,R0we can eliminate the second instruction since x is already in R0.

Unreachable Code:

We can eliminate the unreachable instructions for example Sum = 0If(sum)Print (%d,sum);Now this if statement will never get executed hence we can eliminate such a unreachable code

Flows-Of-Control Optimizations:

The unnecessary jumps on jumps can be eliminated in either the intermediate code or the target code by the following types of peephole optimizations. We can replace the jump sequencegoto L1.L1: gotoL2by the sequence



goto L2.L1: goto L2

Algebraic Simpli cation:

There is no end to the amount of algebraic simpli cation that can be attempted through peephole optimization. Only a few algebraic identities occur frequently enough that it is worth con-sidering implementing them. For example, statements such asx : = x+0Orx : = x * 1

Reduction in Strength:

Reduction in strength replaces expensive operations by equiva-lent cheaper ones on the target machine. Certain machine in-structions are considerably cheaper than others and can often be used as special cases of more expensive operators.

For example, x is invariably cheaper to implement as x*x than as a call to an exponentiationroutine.X2 X*X

Use of Machine Idioms:

The target machine may have hardware instructions to imple-ment certain speci c operations ef ciently. For example, some machines have auto-increment and auto-decrement addressing modes. These add or subtract one from an operand before or after using its value.

The use of these modes greatly improves the quality of code when pushing or popping a stack, as in parameter passing. These modes can also be used in code for statements likei : = i+1.i: = i+1 i++i: = i-1n i- -

(b) (i) Flow graphs for control ow constructs such as do-while state-ments have a useful property: there is a single beginning point at which control enters and a single end point that control leaves from when execution of the statement is over. We exploit this property when we talk of the de nitions reaching the beginning and the end of statements with the following syntax.S id: = E| S; S | if E then S else S | do S while EE id + id| id



Expressions in this language are similar to those in the interme-diate code, but the ow graphs for statements have restricted forms.

S1;S2 If E then S1 else S2 do S1 while E

S1

S2

If E goto S1

S1 S2If E goto S1

S1

We de ne a portion of a ow graph called a region to be a set of nodes N that includes a header, which dominates all other nodes in the region. All edges between nodes in N are in the region, except for some that enter the header.

The portion of ow graph corresponding to a statement S is a region that obeys the further restriction that control can ow to just one outside block when it leaves the region.

We say that the beginning points of the dummy blocks at the entry and exit of a statements region are the beginning and end points, respectively, of the statement. The equations are inductive, or syntax-directed, de nition of the sets in[S], out[S], gen[S], and kill[S] for all statements S.

gen[S] is the set of de nitions generated by S while kill[S] is the set of de nitions that never reach the end of S.

(ii) Code improving transformations:Algorithms for performing the code improving transformations rely on data- ow information. Here we consider common sub-expression elimination, copy propagation and for eliminat-ing induction variables.

Global transformations are not substitute for local transforma-tions; both must be performed.

Elimination of global common sub expressions:

The available expressions allow us to determine if an expression at point p in a ow graph is a common sub-expression. The fol-lowing algorithm we can eliminating common subexpressions.



Algorithm: Global common sub expression elimination.Input: A ow graph with available expression information.Output: A ow graph after eliminating common subexpression.Method: For every statement s of the form x : = y+z such that y+z is available at the beginning of block and neither y nor r z is de ned prior to statement s in that block, do the following.1. To discover the evaluations of y+z that reach in the block

containing statement s2. Create new variable u.3. Replace each statement w: =y+z found in (1) by

u : = y + zw : = u

4. Replace statement s by x: = u.Let us apply this algorithm and perform global common subex-pression elimination Example:

Step 1 Step 2 andSS 3 Step 4t1:= 4*k m:= 4*k (12) now if we assign value to

common subexpression then,t2:= a[t1] t1:= mt2:= a[t1] (15) (12):= 4*k

t5:= 4*k (15):= a[(12)]t6:= a[t1]

t5:= m (12) can be assigned to t5 t5:= (12)(15) can be assigned to t6 t6:= (15)t6:= a[t5]

Copy propagation:

The assignment in the form a=b is called copy statement. The idea behind the copy propagation transformation is to use b for a whenever possible after copy statement a:=b .Algorithm: Copy propagation.Input: a ow graph G, with ud-chains giving the de nitions reaching block BOutput: A graph after Appling copy propagation transformation.Method: For each copy s: x: =y do the following:1. Determine those uses of x that are reached by this de nition

of namely, s: x: = y. 2. Determine whether for every use of x found in (1) , s is

in c_in[B], where B is the block of this particular use, and moreover, no de nitions of x or y occur prior to this use of x within B. Recall that if s is in c_in[B]then s is the only de ni-tion of x that reaches B.

3. If s meets the conditions of (2), then remove s and replace all uses of x found in (1) by y.



Step 1 and 2

x:= t3 this is a copy sty atementa[t1]:= t2a[t4]:= x use

y: = x + 3a[t5]: = y

use

since value of t3 and x is not altered along the path from is de ni-tion we will replace x by t3 and then eliminate the copy statement.

x:= t3a[t1]:[[ = t2 a[t1]:= t2a[t4]:= t3 Eliminating a[t4]:= t3

y:= t3 + 3 copy statement y:= t3 + 3 a[t5]: = y a[t5]: = y

Elimination of induction variable:

A variable x is called an induction variable of a loop L if every time the variable x changes values, it is incremented or decre-mented by some constant. For example i is an induction variable, for a for loop for i:=1 to 10 while eliminating induction variables rst of all we have to identify all the induction variables generally induction variables come in following formsa:=i*ba:=b*ia:=iba:=biwhere b is a constant and i is an induction variables ,basic or otherwise.If b is a basic then a is in the family of j the depends on the de nition of i.Algorithm: Elimination of induction variables Input: A loop L with reaching de nition information, loop-in-variant computation information and live variable information.Output: A ow graph without induction variablesMethod: Consider each basic induction variable i whose only uses are to compute other induction variables in its family and in conditional branches. Take some j in is family, preferably one such that c and d in its triple are as simple as possible and



modify each test that i appears in to use j instead. We assume in the following tat c is positive. A test of the form if i relop x goto B, where x is not an induction variable, is replaced byr := c*x /* r := x if c is 1. */r := r+d /* omit if d is 0 */

if j relop r goto Bwhere, r is a new temporary.

The case if x relop i goto B is handled analogously. If there are two induction variables i1 and i2 in the test if i1 relop i2 goto B, then we check if both i1 and i2 can be replaced. The easy case is when we have j1 with triple and j2 with triple, and c1=c2 and d1=d2. Then, i1 relop i2 is equivalent to j1 relop j2.

Now, consider each induction variable j for which a state-ment j: =s was introduced. First check that there can be no as-signment to s between the introduced statement j :=s and any use of j. In the usual situation, j is used in the block in which it is de ned, simplifying this check; otherwise, reaching de ni-tions information, plus some graph analysis is needed to imple-ment theheck. Then replace all uses of j by uses of s and delete statement j: =s.

i = i + 1

i = j 1

i = m 1j = nt1 = 4*n

t4 = 4*j

t2 = 4*i

t5 = a[t4]

t3 = a[t2]

t4 = t4 4

if t3v goto B3BB

if i> = j goto B6BB

x = t3a [t2] = t5a [t4] = xgoto B2BB

x = t3t14 = a[t1]a [t2] = t14a[t1] = x

v = a[t1]

B1

B2BB

B3BB

B4B

B5BB B6BB

Fig: Strength reduction applied to 4*j in block B3



i = m 1j = nt1 = 4*n

t2 = 4*it4 = 4*j

t2 = t2 + 4

t5 = a[t4]

t3 = a[t2]

t4 = t4 4

if t3v goto B3BB

if t2>t4 goto B6BB

a [t7] = t5a [t10] = t3goto B2BB

t14 = a[t1]a [t2] = t14a[t1] = t3

v = a[t1]B1

B2BB

B3BB

B4B

B5BB B6BB

Fig: Flow graph after induction-variable elimination


B.E. / B.Tech. DEGREE EXAMINATION,

NOV/DEC 2011

Sixth Semester

Computer Science and Engineering

CS 2352 PRINCIPLES OF COMPILER

DESIGN (Regulation 2008)

Time : Three hours Maximum : 100 marks

Answer All questions.

PART A (10 2 = 20 marks)

1. What is the role of lexical analyzer?

2. Give the transition diagram for an identi er.

3. De ne handle pruning.

4. Mention the two rules for type checking.

5. Construct the syntax tree for the following assignment statement: a:=b*-c- +b*- c.-

6. What are the types of three address statements?

7. De ne basic blocks and ow graphs.

8. What is DAG?

9. List out the criterias for code improving transformations.

10. When does dangling reference occur?


(Common to Information Technology)


PART B (5 16 = 80 marks)

11. (a) (i) Describe the various phases of compiler and trace it with the program segment (position: = initial + rate * 60). (10)

(ii) State the complier construction tools. Explain them. (6)

Or

(b) (i) Explain brie y about input buffering in reading the source pro-gram for nding the tokens. (8)

(ii) Construct the minimized DFA for the regular expression

(0+1)*(0+1) 10. (8)

12. (a) Construct a canonical parsing table for the grammar given below.Also explain the algorithm used. (16)E E + TF (E )E TF id .T T * FT F

Or

(b) What are the different storage allocation strategies? Explain. (16)

13. (a) (i) Write down the translation scheme to generate code for assign-ment statement. Use the scheme for generating three address code for the assignment statement g: = a+b- c*d. (8)-

(ii) Describe the various methods of implementing three-address statements. (8)

Or

(b) (i) How can Back patching be used to generate code for Boolean expressions and ow of control statements? (10)

(ii) Write a short note on procedures calls. (6)

14. (a) (i) Discuss the issues in the design of code generator. (10)


Principles of Compiler Design (Nov/Dec 2011) 2.43

(ii) Explain the structure-preserving transformations for basicblocks. (6)

Or

(b) (i) Explain in detail about the simple code generator. (8)

(ii) Discuss brie y about the Peephole optimization. (8)

15. (a) Describe in detail the principal sources of optimization. (16)

Or

(b) (i) Explain in detail optimization of basic blocks with example. (8)

(ii) Write about Data ow analysis of structural programs. (8)


SolutionsPART A

1.

sourceprogram

lexicalanalyzer

tokens parser+

semantic analyzersyntax

tree

symboltable

manager

INSERT FIGURE

Main Task: Take a token sequence from the scanner and verify that it is a syntactically correct program.Secondary Tasks:

Process declarations and set up symbol table information accordingly, in preparation for semantic analysis.Construct a syntax tree in preparation for intermediate code generation.

2.Letter or digit

Start Letter other

return(get token(), install_id())

1 2 3

3. In bottom up parsing the process of detecting handle and using them in reduction is called handle pruning.Example:Consider the grammar,E->E+EE->idNow consider the string id+id+id and the rightmost derivation isE=>E+EE=>E+E+EE=>E+E+idE=>E+id+idE=>id+id+idThe bold strings are called handles.



Right sentinel form handle Productionid+id+id id E->id

E+id+id id E->id

E+E+id Id E->id

E+E+E E+E E->E+E

E+E E+E E->E+E

E

4. A type checker veri es that the type of a construct matches that expected by its context. For example : arithmetic operator mod in Pascal requires integer operands, so a type checker veri es that the operands of mod have type integer.

Type information gathered by a type checker may be needed when code is generated.

5.

:=

b b

c

+a

c

6. 1. Assignment statements of the form x: = y op z, 2. Assignment instructions of the form x:= op y,3. Copy statements of the form x: = y4. The unconditional jump goto L.5. Conditional jumps such as if x relop y goto L.

7. A basic block is a sequence of consecutive statements in which ow of control enters at the beginning and leaves at the end without any halt or possibility of branching except at the end.

The following sequence of three-address statements forms a basic block:t1 : = a * at2 : = a * bt3 : = 2 * t2t4 : = t1 + t3t5 : = b * bt6 : = t4 + t5



Flow Graphs

Flow graph is a directed graph containing the ow-of-control informa-tion for the set of basic blocks making up a program.The nodes of the ow graph are basic blocks. It has a distinguished initial node.

8. A DAG for a basic block is a directed acyclic graph with the following labels on nodes:1. Leaves are labeled by unique identi ers, either variable names or

constants.2. Interior nodes are labeled by an operator symbol.3. Nodes are also optionally given a sequence of identi ers for labels to

store the computed values.DAGs are useful data structures for implementing transformations on basic blocks.

9. Algorithms for performing the code improving transformations rely on data- ow information. Here we consider common sub-expression elimi-nation, copy propagation and transformations for moving loop invariant computations out of loops and for eliminating induction variables.

10. Whenever storage can be deallocated, the problem of dangling refer-ences arises. A dangling reference occurs when there is a reference tostorage that has been deallocated. It is a logical error to use dangling references, since the value of deallocated storage is unde ned according to the semantics of most languages. Worse, since that storage may later be allocated to another datum, mysterious bugs can appear in programs with dangling references.

PART B

11. (a) (i) Phases of CompilerA Compiler operates in phases, each of which transforms the source program from one representation into another. The fol-lowing are the phases of the compiler:

Main phases:

1. Lexical analysis2. Syntax analysis3. Semantic analysis4. Intermediate code generation5. Code optimization6. Code generation



Sub-Phases:

1. Symbol table management2. Error handling

Lexical Analysis:

It is the rst phase of the compiler. The lexical analysis is called scanning. It is the phase of compilation in which the complete source code is scanned and broken up into group of stings called tokens.

It reads the characters one by one, starting from left to right and forms the tokens. Token represents a logically cohesive sequence of characters such as keywords, operators, identi ers, special symbols etc.Example: position: =initial + rate*601. The identi er position2. The assignment symbol =3. The identi er initial4. The plus sign5. The identi er rate 6. The multiplication sign7. The constant number 60

Syntax Analysis:

Syntax analysis is the second phase of the compiler. It is also known as parser. It gets the token stream as input from the lexi-cal analyzer of the compiler and generates syntax tree as the output.

Syntax tree:

It is a tree in which interior nodes are operators and exterior nodes are operands.Example: For position: =initial + rate*60, syntax tree is

=

Position +

initial

rate 60

Semantic Analysis:

Semantic Analysis is the third phase of the compiler. It gets input from the syntax analysis as parse tree and checks whether the given syntax is correct or not.



It performs type conversion of all the data types into real data types.

=

Position +

initial

rate (int to float)

60

Intermediate code generation:

Intermediate code generation gets input from the semantic anal-ysis and converts the input into output as intermediate code such as three-address code.

This code is in variety of forms as quadruple, triple and indi-rect triple .The three-address code consists of a sequence of instructions, each of which has almost three operands.Examplet1:=int to oat(60) t2:= rate *t1t3:=initial+t2position:=t3

Code Optimization:

Code Optimization gets the intermediate code as input and pro-duces optimized intermediate code as output. This phase reduces the redundant code and attempts to improve the intermediate code so that faster-running machine code will result.

During the code optimization, the result of the program is not affected.To improve the code generation, the optimization involves deduction and removal of dead code (unreachable code). calculation of constants in expressions and terms. collapsing of repeated expression into temporary string. loop unrolling. moving code outside the loop. removal of unwanted temporary variables.

Example:t1:= rate *60position:=initial+t1



Code Generaton:

Code Generation gets input from code optimization phase and produces the target code or object code as result.

Intermediate instructions are translated into a sequence of machine instructions that perform the same task.

The code generation involves allocation of register and memory generation of correct references generation of correct data types generation of missing codeMachine instructions:

MOV rate, R1MUL #60, R1MOV initial, R2ADD R2, R1MOV R1, position

Symbol Table Management:

Symbol table is used to store all the information about identi ersused in the program.

It is a data structure containing a record for each identi er,with elds for the attributes of the identi er.

It allows to nd the record for each identi er quickly and tostore or retrieve data from that record.

Whenever an identi er is detected in any of the phases, it isstored in the symbol table.

Error Handling:

Each phase can encounter errors. After detecting an error, aphase must handle the error so that compilation can proceed.

In lexical analysis, errors occur in separation of tokens.In syntax analysis, errors occur during construction of syntax

tree.In semantic analysis, errors occur when the compiler detects

constructs with right syntactic structure but no meaning and during type conversion.

In code optimization, errors occur when the result is affected by the optimization.

In code generation, it shows error when code is missing etc.



Source Program

Object Program

Syntax Analyzer

Lexical Analyzer

Error Detectionand Handling

Intermediate Code Generator

Code Optimization

Code Generation

Semantic AnalyzerSymbol TableManagement

Fig : Phases of compiler

(ii) These are specialized tools that have been developed for helping implement various phases of a compiler. The following are the compiler construction tools:1. Scanner Generator2. Parser Generators3. Syntax-Directed Translation4. Automatic Code Generators5. Data-Flow Engines

1. Scanner Generator:

These generate lexical analyzers, normally from a speci -cation based on regular expressions.

The basic organization of lexical analyzers is based on -nite automation.

2. Parser Generators:

These produce syntax analyzers, normally from input that is based on a context-free grammar.

It consumes a large fraction of the running time of a compiler.

Example-YACC (Yet Another Compiler-Compiler).



3. Syntax-Directed Translation:

These produce routines that walk the parse tree and as a result generate intermediate code.

Each translation is de ned in terms of translations at its neighbor nodes in the tree.

4. Automatic Code Generators:

It takes a collection of rules to translate intermediate lan-guage into machine language. The rules must include suf- cient details to handle different possible access methods for data.

5. Data-Flow Engines:

It does code optimization using data- ow analysis, that is, the gathering of information about

(b) (i) As characters are read from left to right, each character is stored in the buffer to form a meaningful token as shown below:

Forward pointer

A = B + C

Beginning of the token Look ahead pointer

We introduce a two-buffer scheme that handles large look aheads safely. We then consider an improvement involving sentinels that saves time checking for the ends of buffers.

Buffer Pairs

A buffer is divided into two N-character halves, as shown below

: : E : : = : : M : * C : * : : * : 2 : eof

ForwardLexeme beginning

Each buffer is of the same size N, and N is usually the number of characters on one disk block. E.g., 1024 or 4096 bytes.

Using one system read command we can read N characters into a buffer.If fewer than N characters remain in the input le, then a special character, represented by eof, marks the end of the ffsource le.



Two pointers to the input are maintained:1. Pointer lexeme beginning marks the beginning of the current

lexeme, whose extent we are attempting to determine.2. Pointer forward scans ahead until a pattern match is found.

Once the next lexeme is determined, forward is set to the character at its right end.

The string of characters between the two pointers is the current lexeme. After the lexeme is recorded as an attribute value of a token returned to the parser, lexeme beginning is set to the char-acter immediately after the lexeme just found.

Advancing forward pointer:

Advancing forward pointer requires that we rst test whether we have reached the end of one of the buffers, and if so, we must reload the other buffer from the input, and move forward to the beginning of the newly loaded buffer. If the end of second buffer is reached, we must again reload the rst buffer with input and the pointer wraps to the beginning of the buffer.

Code to advance forward pointer:

if forward at end of rst half then beginreload second half;forward := forward + 1endelse if forward at end of second half then beginreload second half;move forward to beginning of rst halfendelse forward := forward + 1;

Sentinels

For each character read, we make two tests: one for the end of the buffer, and one to determine what character is read. We can combine the buffer-end test with the test for the current character if we extend each buffer to hold a sentinel character at the end.The sentinel is a special character that cannot be part of the source program, and a natural choice is the character eof.

The sentinel arrangement is as shown below:

: : E : : = : : M : * : eof C : * : : * : 2 : eof : : : eof

ForwardLexeme beginning



Code to advance forward pointer:

forward : = forward + 1;if forward = eof then beginif forward at end of rst half then beginreload second half;forward := forward + 1endelse if forward at end of second half then beginreload rst half;move forward to beginning of rst halfendelse /* eof within a buffer signifying end of input */terminate lexical analysisend

(ii) Construct the minimized DFA for the regular expression

(0 + 1)*(0 + 1)10

(0+1)*(0+1) 10

(0+1)* (0+1) 10

0,1

(0+1) 1 0

0,1

0 0

q0 11 q1 q2 q3

Transition diagram for above diagram

States 0 1

q0 {q0,q1} {q0,q1}

q1 - (q2)

q2 {q3} -

*q3 - -



Minimized DFA Table

New State States 0 1

A [q0] [q0,q1] [q0,q1]

B [q1] - [q2]

C [q2] [q3] -

D *[q3] - -

E [q0,q1] [q0,q1] [q0,q1,q2]

F [q0,q1,q2] [q0,q1,q2] [q0,q1,q2]

G *[q0,q1,q3] [q0,q1] [q0,q1,q2]

Minimized DFA Diagram:

0

0,1

0 1

1

0

A

F

E

G

12. (a) (i) EE+TF(E)ETFidTT*FTFInput: An augmented grammar GOutput: The canonical LR parsing table

Algorithm:

1. Initially construct set of items C={I0,I1,I2In} where C is acollection of set of LR(1) items for the input grammar G

2 The parsing actions are based on each item Ii. The actions are as given below:a. If A[ . a , b] is in Ii and goto (Ii, a)= Ij then create an

entry in the action table action[Ii,a]= shift j.



b. If there is a production [Aa, a] in Ii then in the action table aaaction [Ii,a]= reduce by Aa. Here A should not be S.aa

c. If there is a production SS., $ in Ii then action[Ii,$]=accept

3. The goto part of the LR table can be lled as: The goto tran-sitions for state I is considered for non terminals only. If goto(Ii, A)=Ii then goto[Ii,A]=j=

4. All the entries not de ned by rule 2 and 3 are considered to be error.

Augmented Grammar

E->.E1. E->.E+T2. F->.(E)3. E->.T4. F->.id5. T->.T*F6. T->.FFIRST(E)= {(,id } FOLLOW(E)= { +, ), $}FIRST(T)= {(,id } FOLLOW(T)= { +, *, ), $}FIRST(F)= {(,id } FOLLOW(F)= {+, *, ), $ }Initially add E->.E, $ as the rst rule in I

0.

On matching E->.E, $ with A->a.aa XbX , abbA=Ea=aa e X=E e b = e a=$eIf there is a production X->g,b then add X->.gg g,b whereggbFIRST(b, a)bbb FIRST(e,ee $) $Hence the productions become of E becomesE->.E, $E->.E+T, $E->.T, $

I0: E->.E, $

E->.E+T, $F->.(E), $E->.T, $F->.id, $T->.T*F, $T->.F, $

I1:

goto (I0, E)

E->E. ,$E->E.+T, $

I4:

goto (I0, id)

F->id. , $

I5:

goto (I0, F)

T->F. , $

I6:

goto (I1, +)

E->E+.T, $T->.T*F, $T->.F, $F->.(E), $F->.id, $



I2:

goto (I0, ( )

F-> (.E), $E->.E+T, $E->.T, $T->.T*F, $T->.F, $

I3:

goto (I0, T)

E->T. , $T->T.*F , $

I8:

goto (I3, *)

T->T*.F , $F->.(E), $F->.id, $

I9:

goto (I6, T)

E->E+.T, $T->T.*F, $

I5:

goto (I6, F)

T->F. , $

I10:

go.to (I7,))

F->(E). , $

I7: goto (I

2, E)

F-> (E.), $

I3: goto (I

2, T)

->. , F->. F, $

I5: goto (I

2, F)

->F. , $

I6: goto (I

7, +)

E->E+.T, $T->.T*F, $T->.F, $F->.(E), $F->.id, $

I11:

goto (I8, F)

T->.T*F, $

I8:

goto (I9, *)

T->T*.F , $F->.(E), $F->.id, $

Parsing table

id ( + * ) $ E T F0 S4 S2 1 3 5

1 S6

2 7 3 5

3 r 3 r 3 r 3 r 3

4 r 4 r 4 r 4 r 4

5 r 5 r 5 r 5 r 5

6 r 1 r 1 r 1 9 5

7 S6 S10

8 11

9 S8

10 r 2 r 2 r 2 r 2

11 r 5 r 5 r 5 r 5



(b) The different storage allocation strategies are :1. Static allocation lays out storage for all data objects at compile

time2. Stack allocation manages the run-time storage as a stack.3. Heap allocation allocates and deallocates storage as needed at

run time from a data area known as heap.

Static Allocation:

In static allocation, names are bound to storage as the program is compiled, so there is no need for a run-time support package.

Since the bindings do not change at run-time, every time a proce-dure is activated, its names are bound to the same storage locations.

Therefore values of local names are retained across activations of a procedure. That is, when control returns to a procedure the values of the locals are the same as they were when control left the last time.

From the type of a name, the compiler decides the amount of stor-age for the name and at which the target code can nd the data it operates on.

Stack Allocation:

All compilers for languages that use procedures, functions or meth-ods as units of user de ned actions manage at least part of their run-time memory as a stack.

Each time a procedure is called , space for its local variables is pushed onto a stack, and when the procedure terminates, that space is popped off the stack.

Calling sequences:

Procedures called are implemented in what is called as calling se-quence, which consists of code that allocates an activation record on the stack and enters information into its elds.

A return sequence is similar to code to restore the state of machine so the calling procedure can continue its execution after the call.

The code in calling sequence is often divided between the calling procedure (caller) and the procedure it calls (callee).

When designing calling sequences and the layout of activation re-cords, the following principles are helpful:

Values communicated between caller and callee are generally placed at the beginning of the callees activation record, so they are as close as possible to the callers activation record.

Fixed length items are generally placed in the middle. Such items typically include the control link, the access link, and the machine status elds.



Items whose size may not be known early enough are placed at the end of the activation record. The most common example is dynami-cally sized array, where the value of one of the callees parameters determines the length of the array.

We must locate the top-of-stack pointer judiciously. A common approach is to have it point to the end of xed-length elds in the activation record. Fixed-length data can then be accessed by xed offsets, known to the intermediate-code generator, relative to the top-of-stack pointer.

Parameters and returned values

Parameters and returned values

control linklinks and saved status

control linklinks and saved status

temporaries and local data

temporaries and local datacallers

responsibility

top_pp sps

callersactivation

record

calleesactivation

record

callersrespponsibility

Fig: Division of tasks between caller and callee

The calling sequence and its division between caller and callee are as follows.

The caller evaluates the actual parameters.The caller stores a return address and the old value of top_sp into

the callees activation record. The caller then increments the top_spto the respective positions.

The callee saves the register values and other status information.The callee initializes its local data and begins execution.

A suitable, corresponding return sequence is:The callee places the return value next to the parameters.

Using the information in the machine-status eld, the callee re-stores top_sp and other registers, and then branches to the returnaddress that the caller placed in the status eld.

Although top_sp has been decremented, the caller knows wherethe return value is, relative to the current value of top_sp; the caller therefore may use that value. Parameters and returned values



Variable length data on stack:

The run-time memory management system must deal frequently with the allocation of space for objects, the sizes of which are not known at the compile time

Date post:	08-Mar-2016
Category:	Documents
Upload:	kalai
View:	29 times
Download:	1 times

Compiler QBank From CD

Documents