CS1352_MAY07

7/28/2019 CS1352_MAY07

1/19

MAY/JUNE-'07/CS1352-Answer Key

CS1352 Principles of Compiler Design

University Question Key

May/June 2007

PART-A1. Define a preprocessor.

Produce input to compilers. Functions: Macro processing, file inclusion, rationalpreprocessors and language extensions.

2. What are the issues in lexical analysis?

Simpler design Compiler efficiency is improved Compiler portability is enhanced.

3. Eliminate the left recursion from the following grammar

A->Ac | Aad | bd | cThe rule to eliminate the left recursion is A->A | can be converted as A-> A

and A-> A | . So, the grammar after eliminating left recursion isA->bdA | cA; A->cA | adA |

4. What are the disadvantages of operator precedence parsing?

The operator, like minus (unary and binary) has two different precedence.

Hence it is hard to handle tokens like minus sign. This kind of parsing is applicable to onlysmall class of grammars.

5. Write the properties of intermediate language.

Intermediate codes are machine independent codes, but they are close to machineinstructions. The given program in a source language is converted to an equivalent program in

an intermediate language by the intermediate code generator. Intermediate language can be many different languages, and the designer of the

compiler decides this intermediate language.syntax trees can be used as an intermediate language.postfix notation can be used as an intermediate language.three-address code (Quadruples, triples and indirect triples) can be used as

an intermediate language

6. What is back patching?Back patching is the activity of filling up unspecified information of labelsusing appropriate semantic actions in during the code generation process. In thesemantic actions the functions used are mklist(i),merge_list(p1,p2) andbackpatch(p,i).

- 1 -

http://engineerportal.blogspot.in/

7/28/2019 CS1352_MAY07

2/19


7. What are the applications of DAG? Determining the common sub-expressions. Determining which identifiers have their values used in the block Determining which statements of the block compute value outside the block.

8.

Give the primary structure preserving transformations on Basic Blocks. Common sub expression elimination Dead-code elimination Renaming of temporary variables Interchange of two independent adjacent statements

9. What do you mean by code motion?It decreases the amount of code in a loop. Taking the expression which yield the

same result independent of the number of times a loop is executed (a loopinvariantcomputation and places it before the loop.

10. Draw the diagram of the general activation record and give the purpose of anytwo fields.Returned valueActual parametersOptional control linkOptional access linkSaved machine statusLocal datatemporaries

Temporaries are used to hold values that arise in the evaluation of expressions.Returned value field is used by the called procedure to return a value to the calling

procedurePART B

11. a. i. Write about the phases of compiler and by assuming an input and show theoutput of various phases. (10)

The process of compilation is very complex. So it comes out to be customary fromthe logical as well as implementation point of view to partition the compilation processinto several phases. A phase is a logically cohesive operation that takes as input one

representation of source program and produces as output another representation. (2)

Source program is a stream of characters: E.g.pos = init + rate * 60 (6) lexical analysis: groups characters into non-separable units, called token, and

generates token stream: id1 = id2 + id3 * const The information about the identifiers must be stored somewhere (symbol

table). Syntax analysis: checks whether the token stream meets the grammatical

specification of the language and generates the syntax tree.

- 2 -


7/28/2019 CS1352_MAY07

3/19


Semantic analysis: checks whether the program has a meaning (e.g. if pos is a recordand init and rate are integers then the assignment does not make a sense).

:=

:=

id1 +

id2

*

id3 60

Syntax analysis

id1 +

id2

*

id3 inttoreal

60

Semantic analysis

Intermediate code generation, intermediate code is something that is both close to thefinal machine code and easy to manipulate (for optimization). One example is the threeaddresscode:

dst = op1 op op2 The three-address code for the assignment statement:

temp1 = inttoreal(60);temp2 = id3 * temp1;temp3 = id2 + temp2;id1 = temp3

Code optimization: produces better/semantically equivalent code.temp1 = id3 * 60.0id1 = id2 + temp1

Code generation: generates assemblyMOVF id3, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1

Symbol Table Creation / Maintenance

Contains Info (storage, type, scope, args) on Each Meaningful Token, typicallyIdentifiersData Structure Created / Initialized During Lexical Analysis

Utilized / Updated During Later Analysis & SynthesisError Handling

Detection of Different Errors Which Correspond to All PhasesEach phase should know somehow to deal with error, so that compilation

can proceed, to allow further errors to be detected

- 3 -


7/28/2019 CS1352_MAY07

4/19


Source Program

Symbol-table

Manager

1 Lexical Analyzer

2 Syntax Analyzer

3 Semantic Analyzer

Error Handler

4 Intermediate Code

Generator

5Code Optimizer

6 Code Generator

Target Program (2)

ii. Explain briefly about compiler construction tools. (6)

Parser Generators : Produce Syntax Analyzers

Scanner Generators : Produce Lexical Analyzers

Syntax-directed Translation Engines : Generate Intermediate Code

Automatic Code Generators : Generate Actual Code

Data-Flow Engines : Support Optimization

(OR)

b. i. Construct the NFA from the (a|b)*a(a|b) using Thompsons construction

algorithm. (10)

The algorithm is syntax directed in that it uses the syntactic structure of the

regular expression to guide the construction process. First, parse the regular expression r intoits constituent sub expressions. Then using various rules, construct NFAs for each of the basicsymbols in r.

- 4 -


7/28/2019 CS1352_MAY07

5/19


- 5 -


7/28/2019 CS1352_MAY07

6/19


ii. Explain about Input buffering technique. (6)

Determining the next lexeme requires reading the input beyond the end of thelexeme.

Buffer Pairs: (2)

Concerns with efficiency issues

Used with a look ahead on the inputIt is a specialized buffering technique used to reduce the overhead required to process

an input character. Buffer is divided into two N-character halves. Use two pointers. Usedat times when the lexical analyzer needs to look ahead several characters beyond the

lexeme for a pattern before a match is announced. One pointer called forward pointer,points to first character of the next lexeme found. The string of characters between twoforms the lexeme.

- 6 -


7/28/2019 CS1352_MAY07

7/19


Increment procedure for forward pointer: (2)If forward at end of first half then

reload second halfforward+=1

else if forward at end of second half

reload the first halfmove forward to beginning of first halfelse

forward+=1Sentinels: (2)

It is the special character which cannot be a part of source program. It is used toreduce the two tests into one. e.g. eofIncrement procedure for forward pointer using sentinels:

forward+=1if forward =eof then

If forward at end of first half thenreload second halfforward+=1

else if forward at end of second halfreload the first halfmove forward to beginning of first half

elseterminate lexical analysis

12.a. i. Construct predictive parsing table for the grammar (10)S->(L) | aL->L,S | S

After the elimination of left recursion: (2)

S->(L) | aL->SLL->,SL |

Calculation of First: (2)

First(S) = {(, a}First(L) = {(, a}First(L) = {, , }

Calculation of Follow: (2)

Follow(S) = {$, , ,)}Follow (L) = {)}Follow (L) = {)}

Predictive parsing table: (4)

Non Input symbol

terminals a ( ) , $

S S->a S->(L)L L->SL L->SLL L-> L->,SL

- 7 -


7/28/2019 CS1352_MAY07

8/19


ii. What are the different strategies that a parser can employ to recover from syntax

errors? (6)

Panic mode recoveryOn discovering an error, the parser discards input symbols one at a time until one of

a designated set of synchronizing tokens is found. Phrase level recoveryOn discovering an error, the parser may perform local correction on the

remaining input; e.g. replace prefix of the remaining input by some string that allow theparser to continue Error productions

Augment the grammar for the language with productions that generate theerroneous constructs. If it is being used by the parser, generate appropriate errordiagnostics to indicate the erroneous construct that has been recognized in the input Global correction

It does minimal changes in the incorrect input string to obtain a globally least-cost

correction. (OR)

b. i. Construct the CLR parsing table from

S->AA, A->Aa | b

Augmented grammar:

S->SS->AAA->AaA->b

I0: S->.S, $

S->.AA, $A->.Aa, bA->.b, b

I1: goto(I0, S)S->S., $

I2: goto(I0, A)S->A.A,$

`Parsing table:

Action

(10)

A->A.a, bA->.Aa, bA->.b, b

I3: goto(I0, b)A->b., b

I4: goto(I2, A)

S->AA., $A->Aa., bA->A.a, b

I5: goto(I4, a)A->Aa., b

goto(I2, a)=I5goto(I2, b)=I3

GotoStates

012345

a bs3

r2, r3, s5 r2, r3, s3r3 r3s5r2 r2

$ S A1 2acc

4r3

- 8 -


7/28/2019 CS1352_MAY07

9/19


ii. Write Operator-precedence parsing algorithm. (6)

set ip to point to the first symbol of w$;repeat

if $ is on top of the stack and ip points to $ thenreturn

else let a be the topmost terminal symbol on the stack and let b be the symbolpointed to by ip;if ab thenrepeat

pop the stackuntil the top stack terminal is related by

7/28/2019 CS1352_MAY07

10/19


Triples

Op arg1 arg2(0) uminus c(1) * b (0)

(2) uminus c(3) * b (2)(4) + (1) (3)(5) assign a (4)

Indirect Triples

Op arg1 arg2 Statement(14) uminus c (0) (14)(15) * b (14) (1) (15)(16) uminus c (2) (16)(17) * b (16) (3) (17)(18) + (15) (17) (4) (18)

(19) assign a (18) (5) (19)ii. Give the syntax-directed definition for flow of control statements. (8)

Flow of control statements:S-> if E then S1 | if E then S1 else S2 | while E do S1

If-statement: (4)

Semantic rules for if E then S1:

E.true:= newlabel;E.false:=S.next;S1.next:=S.next;S.code:=E.code || gen(E.true :) || S1.code

Semantic rules for if E then S1 else S2:

E.true:= newlabel;E.false:=newlabel;S1.next:=S.next;S2.next:=S.next;S.code:=E.code || gen(E.true :) || S1.code || gen(goto S.next) ||

gen(E.false :) || S2.codeExample:Statement: a and b and c

if a were false, then we need not evaluate the rest of the expressions. So, we insert labelsE.true and E.false in the appropriate places.

if a goto E.truegoto E.falseE.true: if b goto E1.truegoto E.falseE1.true: if c goto E2.truegoto E.falseE2.true : exp =1E.false: exp =0

- 10 -


7/28/2019 CS1352_MAY07

11/19


Semantic rules for while E do S1: (4)

S.begin:=newlabelE.true:= newlabel;E.false:=S.next;

S1.next:=S.begin;S.code:=gen(S.begin:) || E.code || gen(E.true :) || S1.code || gen(goto S.begin)Example:

while a

7/28/2019 CS1352_MAY07

12/19


The corresponding semantic rules are given by:

E E1 or M E2 {Backpatch( E1.falselist,M.quad); E.truelist = merge (E2.truelist,E1.truelist); E.falselist=E2.falselist; }E E1 and M E2 {Backpatch(E1.truelist,M.quad); E.falselist = merge (E1.falselist,

E2.falselist); E.truelist=E2.truelist}E not E1 { E.truelist = E1.falselist; E.falselist = E1.truelist; } E(E1) { E.truelist = E1.truelist;E.falselist=E1.falselist}E id1 relop id2 {E.truelist = makelist(nextquad); E.falselist = makelist(nextquad +1);

emit( if id1.place relop.op id2.place goto ____); emit( goto ____);} Efalse { E.falselist = makelist(nextquad); emit( goto ____) }E true { E.truelist = makelist(nextquad); emit( goto ____) }M { M.quad = nextquad }Example :

Consider the string:a

7/28/2019 CS1352_MAY07

13/19


S->begin L end {S.nextlist:=L.nextlist}S->A {S.nextlist:=nil}L->L1;M S {backpatch(L1.nextlist, M.quad); L.nextlist:= S.nextlist}L->S { L.nextlist:= S.nextlist}Here, fill in the jumps out of statements when their targets are found. Not only do

Boolean expressions need two lists of jumps that occur when the expression is true and whenit is false, but statements also need list of jumps (given by attribute nextlist) to the code thatfollows them in the execution sequence.

ii. Write short notes on procedure calls. (6)

Procedure is an important and frequently used programming construct that is imperative for acompiler to generate good code for procedure calls and returns. (2)Consider the following grammar for a simple procedure call statement:

S-> call id (Elist)Elist -> Elist, EElist ->E

Calling sequences: (2)

The translation for a call includes a calling sequence, a sequence of actions taken onentry to and exit from each procedure.Example: (2)

Syntax directed translation:S-> call id(Elist)

{for each item p on queue doEmit(param p);

Emit(call id.place)}Elist -> Elist, E

{append E.place to the end of the queue}

Elist - > E{initialize queue to contain only E.place}E.g. Call p1(int a, int b)

param aparam bcall p1

14.a. i. Write in detail about the issues in the design of a code generator. (10) Input to the code generator

Intermediate representation of the source program, like linearrepresentations such as postfix notation, three address representations such asquadruples, virtual machine representations such as stack machine code andgraphical representations such as syntax trees and dags.

Target programsIt is the output such as absolute machine language, relocatable machine

language or assembly language. Memory management

Mapping of names in the source program to addresses of data object in run timememory is done by front end and the code generator.

- 13 -


7/28/2019 CS1352_MAY07

14/19


Instruction selectionNature of the instruction set of the target machine determines the difficulty of

instruction selection. Register allocation

Instructions involving registers are shorter and faster. The use of registers is

being divided into two sub problems:o During register allocation, we select the set of variables that will reside in

registers at a point in the programo During a subsequent register assignment phase, we pick the specific

register that a variable will reside in Choice of evaluation order

The order in which computations are performed affect the efficiency of targetcode. Approaches to code generation

ii. What are steps needed to compute the next use information? (6)

If the name in a register is no longer needed, then the register can be assigned tosome other name. This idea of keeping a name in storage only if it will be usedsubsequently can be applied in a number of contexts.Computing next uses: (2)

The use of a name in a three-address statement is defined as follows: Suppose athree-address statement i assigns a value to x. If statement j has x as an operand andcontrol can flow from statement i to j along a path that has no intervening assignments to x,then we say statement j uses the value of x computed at i.Example:

x:=ij:=x op y // j uses the value of x

Algorithm to determine next use: (2)The algorithm to determine next uses makes a backward pass over each basicblock, recording for each name x whether x has a next use in the block and if not,whether it is live on exit from the block (using data flow analysis). Suppose we reachthree-address statement i: x: =y op z in our backward scan. Then do the following:

Attach to statement i, the information currently found in the symbol table

regarding the next use and the liveness of x, y, and z. In the symbol table, set x to not live and no next use In the symbol table, set y and z to live and the next uses of y and z to i.

(OR)

b. i. Discuss briefly about DAG representation of basic blocks. (10)A DAG for a basic block is a directed acyclic graph in which (2) leaves are labeled by unique ids, either variable names or constants Interior nodes are operatorsNodes are also given a sequence of ids for labels to store the computed values.

It is useful for implementing transformations on basic blocks and shows how values

computed by a statement are used in subsequent statements.

- 14 -


7/28/2019 CS1352_MAY07

15/19


e.g t1:=4*it2:=a[t1]

Dag is

t1

4

(2)

[ ] t2

* a

i

Algorithm for the construction of DAG: (4)Input: A basic blockOutput: DAG for that basic block, having Label for each node where leaves are identifiers, interior nodes are operator

symbol. for each node, a list of identifiers to hold computed values

1) x = y op z 2) x = op y 3) x = y

Step 1: If node(y) is undefined, create a leaf labeled y and let node(y) be this node. In 1), ifnode(z) is undefined, create a leaf labeled z and let that leaf be node(z)Step 2: For 1), create node op with left child y and right child z, after checking for

common sub expressionFor 2), check for a node op with a child y. If not create such node For3), let n be node y.Step 3: Delete x from the list of identifiers for node x. Append x to the list of attached

identifiers for node n found in step 2 and set node x to n

Applications of DAG: (2) Determining the common sub-expressions.

Determining which identifiers have their values used in the block Determining which statements compute values that could be used outside the block Simplifying the list of quadruples by eliminating the common sub-expressions and not

performing the assignment of the form x: = y unless and until it is a must.

ii. Explain the characteristics of peephole optimization (6)

Peephole optimization is a simple and effective technique for locally improvingtarget code. This technique is applied to improve the performance of the target program byexamining the short sequence of target instructions and replacing these instructions by shorteror faster sequence, whenever is possible.Peep hole is a small, moving window on the target program.

Local in nature Pattern driven Limited by the size of the window

Characteristics of peephole optimization:

Redundant instruction elimination Flow of control optimization Algebraic simplification Use of machine idioms

- 15 -


7/28/2019 CS1352_MAY07

16/19


Constant Foldingx := 32x := x + 32 becomes x := 64

Unreachable Code

An unlabeled instruction immediately following an unconditional jump is removed.

goto L2x := x + 1 unneeded Flow of control optimizations

Unnecessary jumps are eliminated.goto L1

L1: goto L2 becomes goto L2 Algebraic Simplification

x := x + 0 unneeded Dead code elimination

x := 32 where x not used after statement

y := x + y

y := y + 32 Reduction in strengthReplace expensive operations by equivalent cheaper ones

x := x * 2 x := x + x

15.a. i. Describe the principal sources of optimization. (8)Code optimization is needed to make the code run faster or take less space or both.Function preserving transformations:

Common sub expression elimination Copy propagation Dead-code elimination Constant folding

Common sub expression elimination: (2)

E is called as a common sub expression if E was previously computed and the

values of variables in E have not changed since the previous computation.Copy propagation: (2)

Assignments of the form f:=g is called copy statements or copies in short. The ideahere is use g for f wherever possible after the copy statement.Dead code elimination: (2)

A variable is live at a point in the program if its value can be used subsequently.Otherwise dead. Deducing at compile time that the value of an expression is a constant andusing the constant instead is called constant folding.Loop optimization: (2)

Code motion: Moving code outside the loopTakes an expression that yields the same result independent of the number of

times a loop is executed (a loop-invariant computation) and place the expression beforethe loop. Induction variable elimination Reduction in strength: Replacing an expensive operation by a cheaper one.

- 16 -


7/28/2019 CS1352_MAY07

17/19


ii. Write about Data flow analysis of structural programs. (8)

Flow graphs for control-flow constructs such as do while statements have a useful

property; there is a single beginning point at which control enters and a single end point thatcontrol leaves from when execution of the statement is over.Some structured control constructs:

Define a portion of a flow graph called a region to be a set of nodes N that includes a

header, which dominates all other nodes in the region. All edges between nodes in N are inthe region, except for some that enter the header. The portion of a flow graphcorresponding to a statement S is a region that obeys the further restriction that control canflow to just one outside block when it leaves the region.gen[S] is the set of definitions generated by S.kill[S] be the set of definitions that never reach the end of S, even if they reach the

beginning.Both are synthesized attributes; they are computed bottom-up, from the smallest

statements to the largest.Data-flow equations for reaching definitions:

- 17 -


7/28/2019 CS1352_MAY07

18/19


(OR)

b. i. What are the different storage allocation strategies? Explain. (10)

Strategies: (2)

Static allocation lays out storage for all data objects during compile time Stack allocation manages the run-time storage as a stack Heap allocation allocates and deallocates storages as needed at runtime from heap

area

Static allocation: (2)

Names are bound to storage at compile time No need for run-time support package When a procedure is activated, its names are bound to same storage location. Compiler must decide where activation records should go.

Limitations: size must be known at compile time recursive procedures are restricted data structures cant be created dynamically

Stack allocation: (3) Activation records are pushed and popped as activations begin and end. Locals are bound to fresh storage in each activation and deleted when the

activation ends. Call sequence and return sequence caller and callee Dangling references

Heap allocation: (3)Stack allocation cannot be used if either of the following is possible:

1. The values of local names must be retained when an activation ends2. A called activation outlives the caller. Allocate pieces of memory for activation records, which can be deallocated in any

order Maintain linked list of free blocks Fill a request for size s with a block of size s, where s is the smallest size

greater than or equal to s Use heap manager, which takes care of defragmentation and garbage collection.

ii. Write short notes on parameter parsing. (6) Call by value

A formal parameter is treated just like a local name. Its storage is in theactivation record of the called procedure

The caller evaluates the actual parameter and place the r-value in the storagefor the formals

- 18 -


7/28/2019 CS1352_MAY07

19/19


Call by referenceIf an actual parameter is a name or expression having L-value, then that l-

value itself is passedHowever, if it is not (e.g. a+b or 2) that has no l-value, then expression is

evaluated in the new location and its address is passed.

Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out)Actual parameters evaluated, its r-value is passed and l-value of the actualsare determined

When the called procedure is done, r-value of the formals are copied back tothe l-value of the actuals

Call by nameInline expansion(procedures are treated like a macro)

19

Date post:	03-Apr-2018
Category:	Documents
Upload:	sridharanc23
View:	219 times
Download:	0 times

CS1352_MAY07

Documents