Date post: | 15-Oct-2014 |
Category: |
Documents |
Upload: | sethu-raman |
View: | 46 times |
Download: | 0 times |
CS 153: Concepts of Compiler DesignOctober 17 Class Meeting
Department of Computer ScienceSan Jose State University
Fall 2011Instructor: Ron Mak
www.cs.sjsu.edu/~mak
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
2
Shomit Ghose
History of Computing Speaker Wednesday, Oct. 19, 6:00-7:00 PMAuditorium ENGR 189 Reception before the talk in
ENGR 294 at 5:00 PM “Micro-History:
An Examination of the Brief but Successful Life of a Silicon Valley Start-up”
Venture capitalist Partner, ONSET Ventures
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
3
Midterm Solution: Question 1
1. List and describe five software engineering techniques we employed to make the code manageable and understandable.
Initial framework classes Validate the architecture early.
Partitioning language-dependent front end language-independent middle tier and back end The back end can be either an interpreter or a compiler.
Early initial end-to-end thread Always build on working code.
Design patterns strategy, factory, etc. “Code to the interfaces.” “Closed for modification, open for extension.”
Team development tools subversion source control
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
4
Midterm Solution: Question 2
2. What is the purpose of the symbol table stack and how does it achieve its purpose? Purpose: Implement static scoping
Push a symbol table onto the stack whenever the parser enters a scope.
Pop the symbol table off the stack when the parser leaves a scope.
Search only the local (topmost) symbol table to determine if an identifier has been declared in the local scope.
Search the entire stack from top to bottom to determine if an identifier has been declared in the local or an outer scope.
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
5
Midterm Solution: Question 3
3. What is the purpose of the runtime stack and how does it achieve its purpose?3. Purpose: To store runtime values according to the call chain
3. Push an activation record onto the stack whenever the main program or a procedure or function is called.
4. Pop the symbol table off the stack upon return.
4. The topmost activation record at level n contains the current values of the local variables and formal parameters of the currently active procedure or function at level n.
5. Use a runtime display to optimize accessing the appropriate activation record on the stack.
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
6
Midterm Solution: Question 4
2. Implement the ternary conditional operator in Pascal using the keywords IF, THEN, and ELSE.
a. Modify the syntax diagrams.
factorvariable
number
factor
expression( )
NOT
string
conditional
conditional
expressionIF THEN expression ELSE expression
The result at run time of evaluating the conditional operator is a single value, the result of evaluating either <expression-2> or <expression-3>.
Therefore, a conditional expression must be a factor.
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
7
Midterm Solution: Question 4
b. What type checking operations are necessary while parsing a conditional operator? <expression-1> must be boolean <expression-2> and <expression-3> must be type compatible with
the surrounding operators (preferably they should be the same type) or be assignment compatible with the target variable._
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
8
Midterm Solution: Question 4
c. Draw a parse tree for the statement
k := i – j*IF m-n = 0 THEN m*n ELSE m+n
Note that the conditionaldoes not change any
precedence rules.
:=
k -
i
IF
-
m n
*
m n
+
m n
=
0
*
j
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
9
Midterm Solution: Question 5
5. Describe the purpose of each of the following hash tables (or tree maps) and describe its keys (or give an example of a key).a. symbol table
Store the symbol table entries for the identifiers declared within given scope
Keys: Names of the identifiers
b. symbol table entry Store the attributes of an identifier Keys: Attribute enum constants such as ROUTINE_CODE
c. type specification object Store attributes about a data type Keys: Attribute enum constants such as ARRAY_INDEX_TYPE
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
10
Midterm Solution: Question 5
d. parse tree node Store the attributes of a parse tree node Keys: Attribute enum constants LINE, ID, and VALUE
e. memory map Store the runtime values of the local variables and formal
parameters of a program, procedure, or function Keys: The names of the variables and parameters
_
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
11
Midterm Solution: Question 6
6. How to implement the ENDALL reserved word? Front end
Modify the scanner to recognize ENDALL as a reserved word. Modify method CompoundStatementParser.parse() to
include ENDALL as a statement list terminator. Modify method StatementParser.parseList()
Stop looping if the global flag endAllFlag is true. Set endAllFlag to true after consuming the ENDALL keyword.
Modify method StatementParser.parse() Set endAllFlag to false after consuming the BEGIN keyword.
Middle tier No changes
Back end No changes
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
12
Midterm Solution: Question 7
7. Classic Pascal included the WITH statement.
a. What must the Pascal parser do in order to parse a WITH statement?
After parsing the record variable following the WITH keyword, the parser must
Determine the record type of the variable. Push the record type’s symbol table onto the symbol table stack. When parsing the nested statements of the WITH statement, look
up identifiers first in the record type’s symbol table to determine whether or not they are record fields.
At the end of the WITH statement, pop off the record type’s symbol table._
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
13
Midterm Solution: Question 7
b. What advantages would a WITH statement have at run time?
c. How would you implement a WITH statement in the interpreter’s back end?
None at all, if the WITH statement is considered to be shorthand for the programmer (“syntactic sugar”). However, if the parse tree contains a WITH node, then the record variable only needs to be evaluated once. This would be a performance optimization especially if the record variable is complicated, such ashaving subscripts, fields, and pointer dereferencing.
In the syntactic sugar case, do nothing.
In the WITH node case, the interpreter must allocate an extra slot in the activation record to store the value of the record variable.
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
14
Minimum Acceptable Compiler Project
At least two data types with type checking. Basic arithmetic operations with operator precedence. Assignment statements. At least one conditional control statement (e.g., IF) At least one looping control statement. Procedures or functions with calls and returns Parameters passed by value or by reference. Basic error recovery (skip to semicolon or end of line). Sample source programs written in the source language. Generate Jasmin code that can be assembled. Execute the resulting .class file standalone (preferred)
or with a test harness. No crashes (e.g., null pointer exceptions) 70 points/100
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
15
Ideas for Programming Languages
A language that works with a database such as MySQL Combines Pascal and SQL for writing database applications Not PL/SQL – use the language to write client programs Compiled code makes JDBC calls hidden from the programmer
A language that can access web pages Statements that “scrape” pages to extract information
A language for generating business reports A Pascal-like language that combines report writer features
A string-processing language Combines Pascal and Perl for writing applications that involve
pattern matching and string transformations
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
16
Can We Build a Better Scanner?
Our scanner in the front end is relatively easy to understand and follow. Separate scanner classes for each token type.
However, it’s big and slow. Separate scanner classes for each token type. Create lots of objects and make lots of method calls.
We can write a more compact and faster scanner. However, it may be harder to understand and follow.
_
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
17
Deterministic Finite Automata (DFA)
Pascal identifier Regular expression: <letter> ( <letter> | <digit> )* Implement the regular expression with a finite automaton
(AKA finite state machine):
1 2 3letter
letter
digit
[other]
start state accepting state
transition
This automaton is a deterministic finite automaton (DFA). At each state, the next input character uniquely determines which
transition to take to the next state.
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
18
State-Transition Matrix
Represent the behavior of a DFA by a state-transition matrix:
1 2 3letter
letter
digit
[other]
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
19
DFA for a Pascal Number
6 9 104 7 11digit
digit
digit
digit digit
digit+ +
-
E
digit
digitE
.
5 8
12
[other] [other]
[other]3
-0
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
20
DFA for a Pascal Identifier or Number
6 9 104 7 11digit
digit
digit
digit digit
digit
+
+
-
E
digit
digitE
.
5 8
12
[other] [other]
[other]3
-
digit
1 20 letter [other]
letter
private static final int matrix[][] = {
/* letter digit + - . E other */ /* 0 */ { 1, 4, 3, 3, ERR, 1, ERR }, /* 1 */ { 1, 1, -2, -2, -2, 1, -2 }, /* 2 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 3 */ { ERR, 4, ERR, ERR, ERR, ERR, ERR }, /* 4 */ { -5, 4, -5, -5, 6, 9, -5 }, /* 5 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 6 */ { ERR, 7, ERR, ERR, ERR, ERR, ERR }, /* 7 */ { -8, 7, -8, -8, -8, 9, -8 }, /* 8 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 9 */ { ERR, 11, 10, 10, ERR, ERR, ERR }, /* 10 */ { ERR, 11, ERR, ERR, ERR, ERR, ERR }, /* 11 */ { -12, 11, -12, -12, -12, -12, -12 }, /* 12 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR },};
Negative numbersin the matrix are theaccepting states.
Notice how theletter ‘E’ is handled!
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
21
A Simple DFA Scannerpublic class SimpleDFAScanner{ // Input characters. private static final int LETTER = 0; private static final int DIGIT = 1; private static final int PLUS = 2; private static final int MINUS = 3; private static final int DOT = 4; private static final int E = 5; private static final int OTHER = 6;
private static final int ERR = -99999; // error state
private static final int matrix[][] = { ... };
private char ch; // current input character private int state; // current state
...}
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
22
A Simple DFA Scanner, cont’d
int typeOf(char ch) { return (ch == 'E') ? E : Character.isLetter(ch) ? LETTER : Character.isDigit(ch) ? DIGIT : (ch == '+') ? PLUS : (ch == '-') ? MINUS : (ch == '.') ? DOT : OTHER; }
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
23
A Simple DFA Scanner, cont’dprivate String nextToken() throws IOException{ while (Character.isWhitespace(ch)) nextChar(); if (ch == 0) return null; // EOF? state = 0; // start state StringBuilder buffer = new StringBuilder(); while (state >= 0) { // not accepting state state = matrix[state][typeOf(ch)]; // transit if ((state >= 0) || (state == ERR)) { buffer.append(ch); // build token string nextChar(); } } return buffer.toString();}
This is theheart of thescanner.
Table-driven scannerscan be very fast!
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
24
Simple DFA Scanner, cont’dprivate void scan() throws IOException{ nextChar(); while (ch != 0) { // EOF? String token = nextToken(); if (token != null) { System.out.print("=====> \"" + token + "\" "); String tokenType = (state == -2) ? "IDENTIFIER" : (state == -5) ? "INTEGER" : (state == -8) ? "REAL (fraction only)" : (state == -12) ? "REAL" : "*** ERROR ***"; System.out.println(tokenType); } }}
How do we know which token we just got?
Demo
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
25
Backus Naur Form (BNF)
A text-based way to describe source language syntax. Named after John Backus and Peter Naur. Text-based means it can be read by a program ...
… such as a compiler-compiler that can automatically generate a parser for a source language after reading (and parsing) the language’s syntax rules written in BNF.
Uses certain meta-symbols. Symbols that are part of BNF itself but are not necessarily part
of the syntax of the source language.
::= “is defined as”
| “or”
< > Surround names of nonterminal (not literal) items
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
26
BNF Example: U.S. Postal Address <postal-address> ::= <name-part> <street-part> <city-state-part> <name-part> ::= <first-part> <last-name> | <first-part> <last-name> <suffix><first-part> ::= <first-name> | <capital-letter> . <street-part> ::= <house-number> <street-name> | <house-number> <street-name> <apartment-number> <city-state-part > ::= <city-name> , <state-code> <ZIP-code> <suffix> ::= Sr. | Jr. | <roman-numeral><first-name> ::= <name><last-name> ::= <name><street-name> ::= <name><city-name> ::= <name><house-number> ::= <number><apartment-number> ::= <number><state-code> ::= <capital-letter> <capital-letter><capital-letter> ::= A|B|C|D|E|F|G|H|I|J|K|L|M |N|O|P|Q|R|S|T|U|V|W|X|Y|Z<name> ::= …<number> ::= …etc.
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
27
BNF: Optional and Repeated Items
To show optional items in BNF, use the vertical bar |. “An expression is a simple expression optionally followed by an
relational operator and another simple expression.” <expression> ::= <simple expression>
| <simple expression> <rel op> <simple expression>
BNF uses recursion for repeated items. “A digit sequence is a digit followed by zero or more digits.” <digit sequence> ::= <digit>
| <digit> <digit sequence> <digit sequence> ::= <digit>
| <digit sequence> <digit>
Rightrecursive
Leftrecursive
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
28
BNF Example: Pascal Number
<digit sequence> ::= <digit> | <digit> <digit sequence><unsigned integer> ::= <digit sequence><unsigned real> ::= <unsigned integer>.<digit sequence> | <unsigned integer>.<digit sequence> <e> <scale factor> | <unsigned integer > <e> <scale factor><unsigned number> ::= <unsigned integer> | <unsigned real><scale factor> ::= <unsigned integer> | <sign> <unsigned integer><e> ::= E | e<sign> ::= + | -
Repetition via recursion.
The sign is optional.
SJSU Dept. of Computer ScienceFall 2011: October 17
CS 153: Concepts of Compiler Design© R. Mak
29
BNF Example: Pascal IF Statement
<if statement> ::= IF <expression> THEN <statement> | IF <expression> THEN <statement> ELSE <statement>
It should be straightforward to write a parsing method from either the syntax diagram or the BNF._