CSC 415: Translators and CompilersSpring 2009
Chapter 4
Syntactic Analysis
Chart 2
Syntactic Analysis
Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the
Triangle Compiler
Chart 3
Structure of a Compiler
Lexical Analyzer
Parser & Semantic Analyzer
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Chart 4
Syntactic Analysis
Main function– Parse source program to discover its phrase structure– Recursive-descent parsing– Constructing an AST– Scanning to group characters into tokens
Chart 5
Sub-phases of Syntactic Analysis
Scanning (or lexical analysis)– Source program transformed to a stream of tokens
Identifiers Literals Operators Keywords Punctuation
– Comments and blank spaces discarded
Parsing– To determine the source programs phrase structure– Source program is input as a stream of tokens (from the Scanner)– Treats each token as a terminal symbol
Representation of phrase structure– AST
Chart 6
Lexical Analysis – A Simple Example
Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments
Tokens for this example:
letvary:Integeriny:=y+1
let var y: Integerin !new year y := y+1
Note: !new year does not appear in list of tokens. Comments are removed along with white spaces.
Chart 7
Creating Tokens – Mini-Triangle Example
let var y: Integerin !new year y := y+1
InputConverter
Buffer
Scanner
l e t v a r y : I n t e g e r i n . . . .
= space)character string
let
let
var
var
y
Ident.
:
colonInteger
Ident.
in
in
y
Ident.
:=
becomes
y
Ident.
+
op.
1
Intlit.
eot
Chart 8
Tokens in Triangle
// literals, identifiers, operators... INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2,
"<identifier>", OPERATOR = 3,
"<operator>",
// reserved words - must be in alphabetical order... ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12, "in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while",
// punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26,
// brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}",
// special tokens... EOT = 33, "", ERROR = 34; "<error>"
Chart 9
Grammars Revisited
Context free grammars– Generates a set of sentences– Each sentence is a string of terminal symbols– An unambiguous sentence has a unique phrase
structure embodied in its syntax tree
Develop parsers from context-free grammars
Chart 10
Regular Expressions
A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols
Main features– ‘|’ separates alternatives– ‘*’ indicates that the previous item may be represented
zero or more times– ‘(‘ and ‘)’ are grouping parentheses
The empty string -- a special string of length 0
Chart 11
Regular Expression Basics
Algebraic Properties– | is commutative and associative
r|s = s|r r|(s|t) = (r|s)|t
– Concatenation is associative (rs)t = r(st)
– Concatenation distributes over | r(s|t) = rs|rt (s|t)r = sr|tr
– is the identity for concatenation r = r r = r
– * is idempotent r** = r* r* = (r| )*
Chart 12
Regular Expression Basics
Common Extensions– r+ one or more of expression r, same as rr*– rk k repetitions of r
r3 = rrr
– ~r the characters not in the expression r ~[\t\n]
– r-z range of characters [0-9a-z]
– r? Zero or one copy of expression (used for fields of an expression that are optional)
Chart 13
Regular Expression Example
Regular Expression for Representing Months– Examples of legal inputs
January represented as 1 or 01 October represented as 10
– First Try: [0|1|][0-9] 0, 1, or followed by a number between 0 and 9 Matches all legal inputs? Yes
1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? Yes
0, 00, 18
Chart 14
Regular Expression Example
Regular Expression for Representing Months– Examples of legal inputs
January represented as 1 or 01 October represented as 10
– Second Try: [1-9]|(0[1-9])|(1[0-2]) Any number between 1 and 9 or 0 followed by any number
between 1 and 9 or 1 followed by any number between 0 and 2 Matches all legal inputs? Yes
1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? No
Chart 15
Regular Expression Example
Regular Expression for Floating Point Numbers– Examples of legal inputs
1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6, -2.5e+5 Assume that a 0 is required before numbers less than 1 and
does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal
– Building the regular expression Assume
digit 0|1|2|3|4|5|6|7|8|9 Handle simple decimals such as 1.0, 0.2, 3.14159
digit+.digit+ 1 or more digits followed by . followed by 1 or more decimals
Add an optional sign (only minus, no plus)
(-| )digit+.digit+ or -?digit+.digit+
Chart 16
Regular Expression Example
Regular Expression for Floating Point Numbers (cont.)– Building the regular expression (cont.)
Format for the exponent(E|e)(+|-)?(digit+)
Adding it as an optional expression to the decimal part
(-| )digit+.digit+((E|e)(+|-)?(digit+))?
Chart 17
Extended BNF
Extended BNF (EBNF)– Combination of BNF and RE– N::=X, where N is a nonterminal symbol and X is an
extended RE, i.e., an RE constructed from both terminal and nonterminal symbols
– EBNF Right hand side may use |. *, (, ) Right hand side may contain both terminal and nonterminal
symbols
Chart 18
Example EBNF
Expression ::= primary-Expression (Operator primary-Expression)*
primary-Expression ::= Identifier| ( Expression )
Identifier ::= a|b|c|d|e
Operator ::= +|-|*|/
Generatesea + ba – b – ca + (b * c)a + (b + c) / da – (b – (c – (d – e)))
Chart 19
Grammar Transformations
Left FactorizationXY | XZ is equivalent to X(Y | Z)
single-Command ::= V-name := Expression
| if Expression then single-Command
| if Expression then single-Command
else single-Command
single-Command ::= V-name := Expression
| if Expression then single-Command
( |else single-Command)
Chart 20
Grammar Transformations
Elimination of left recursion
N::= X | NY is equivalent to N::=X(Y)*Identifier ::= Letter
| Identifier Letter
| Identifier Digit
Identifier ::= Letter
| Identifier (Letter | Digit)
Identifier ::= Letter(Letter | Digit)*
Chart 21
Grammar Transformations
Substitution of nonterminal symbolsGiven N::=X, we can substitute each occurrence of N with X
iff N::=X is nonrecursive and is the only production rule for N
single-Command ::= for Control-Variable := Expression To-or-Downto
Expression do single-Command
| …
Control-Variable ::= Identifier
To-or-Downto ::= to
| down
single-Command ::= for Identifier := Expression (to|downto)
Expression do single-Command
| …
Chart 22
Starter Sets
Starter set of an RE X– Starters[[X]]– Set of terminal symbols that can start a string
generated by X
Examples– Starter[[his | her | its]] = {h, i}– Starter[[(re)* set]] = {r, s}
Chart 23
Starter Sets
Precise and complete definition of starters:starters[[starters[[t]] = {t} where t is a terminal symbol
starters[[X Y]] = starters[[X]] starters[[Y]] if X generates starters[[X Y]] = starters[[X]] if X does not
generate starters[[X | Y]] = starters[[X]] starters[[Y]]
starters[[X *]] = starters[[X]]
To generalize fo ra starter set of an extended RE add– starters[[N]] = starters[[X]] where N is a nonterminal
symbol defined production rule N ::= X
Chart 24
Example Starter Set
Expression ::= primary-Expression (Operator primary-Expression)*primary-Expression ::= Identifier
| ( Expression )Identifier ::= a|b|c|d|eOperator ::= +|-|*|/
starters[[Expression]] = starters[[primary-Expression (Operator primary-Expression)*]] = starters[[primany-Expression]] = starters[[Identifier]] starters[[ (Expressions ) ]] = starters[[a | b | c | d | e]] { ( } = {a, b, c, d, e, (}
Chart 25
Scanning (Lexical Analysis)
The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.
Difference between parsing and scanning:– Parsing groups terminal symbols, which are tokens,
into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure
– Scanning groups individual characters into tokens
Chart 26
Structure of a Compiler
Lexical Analyzer
Parser & Semantic Analyzer
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Chart 27
Creating Tokens – Mini-Triangle Example
let var y: Integerin !new year y := y+1
InputConverter
Buffer
Scanner
l e t v a r y : I n t e g e r i n . . . .
= space)character string
let
let
var
var
y
Ident.
:
colonInteger
Ident.
in
in
y
Ident.
:=
becomes
y
Ident.
+
op.
1
Intlit.
eot
Chart 28
What Does a Scanner Do?
Handle keywords (reserve words)– Recognizes identifiers and keywords– Match explicitly
Write regular expression for each keyword Identifier is any alpha numeric string which is not a keyword
– Match as an identifier, perform lookup No special regular expressions for keywords When an identifier is found, perform lookup into preloaded
keyword table
How does Triangle handle keywords?Discuss in terms of efficiency and ease to code.
Chart 29
What Does a Scanner Do?
Remove white space– Tabs, spaces, new lines
Remove comments– Single line
-- Ada comment
– Multi-line, start and end delimiters{ Pascal comment }/* c comment */
– Nested– Runaway comments
Nonterminated comments can’t be detected till end of file
Chart 30
What Does a Scanner Do?
Perform look ahead– Multi-character tokens
1..10 vs. 1.10&, &&<, <=etc
Challenging input languages– FORTRAN
Keywords not reserved Blanks are not a delimiter Example (comma vs. decimal)
DO10I=1,5 start of a do loop (equivalent to a C for loop)DO10I=1.5 an assignment statement, assignment to variable DO10I
Chart 31
What Does a Scanner Do?
Challenging input languages (cont.)– PL/I, keywords not reserved
IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
Chart 32
What Does a Scanner Do?
Error Handling– Error token passed to parser which reports the error– Recovery
Delete characters from current token which have been read so far, restart scanning at next unread character
Delete the first character of the current lexeme and resume scanning from next character.
– Examples of lexical errors: 3.25e bad format for a constant Var#1 illegal character
– Some errors that are not lexical errors Mistyped keywords
– Begim Mismatched parenthesis Undeclared variables
Chart 33
Scanner Implementation
Issues– Simpler design – parser doesn’t have to worry about
white space, etc.– Improve compiler efficiency – allows the construction of
a specialized and potentially more efficient processor– Compiler portability is enhanced – input alphabet
peculiarities and other device-specific anomalies can be restricted to the scanner
Chart 34
Scanner Implementation
What are the keywords in Triangle? How are keywords and identifiers implemented in
Triangles? Is look ahead implemented in Triangle?
– If so, how?
Chart 35
Structure of a Compiler
Lexical Analyzer
Parser
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Semantic Analyzer
Chart 36
Parsing
Given an unambiguous, context free grammar, parsing is– Recognition of an input string, i.e., deciding whether or
not the input string is a sentence of the grammar– Parsing of an input string, i.e., recognition of the input
string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.
Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.
Chart 37
Parsing
The syntax of programming language constructs are described by context-free grammars.
Advantages of unambiguous, context-free grammars– A precise, yet easy-to understand, syntactic specification of
the programming language– For certain classes of grammars we can automatically
construct an efficient parser that determines if a source program is syntactically well formed.
– Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.
– Easier to add new constructs to the language if the implementation is based on a grammatical description of the language
Chart 38
Parsing
Check the syntax (structure) of a program and create a tree representation of the program
Programming languages have non-regular constructs– Nesting– Recursion
Context-free grammars are used to express the syntax for programming languages
sequence of tokens parser syntax tree
Chart 39
Context-Free Grammars
Comprised of– A set of tokens or terminal symbols– A set of non-terminal symbols– A set of rules or productions which express the legal
relationships between symbols– A start or goal symbol
Example:1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr
Chart 40
Context-Free Grammars
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr
expr
expr digit
digit
digit
3
2
8
+
-
Chart 41
Checking for Correct Syntax
Given a grammar for a language and a program, how do you know if the syntax of the program is legal?
A legal program can be derived from the start symbol of the grammar
Grammar must be unambiguous and context-free
Chart 42
Deriving a String
The derivation begins with the start symbol At each step of a derivation the right hand side of a
grammar rule is used to replace a non-terminal symbol Continue replacing non-terminals until only terminal
symbols remain
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr expr – digit expr – 2 expr + digit - 2Rule 1 Rule 4 Rule 2
expr + 8-2 digit + 8-2 3+8 -2Rule 4 Rule 3 Rule 4
Chart 43
Rightmost Derivation
The rightmost non-terminal is replaced in each step
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr + digit - 2 expr + 8-2
expr + 8-2 digit + 8-2Rule 3
expr expr – digitRule 1
expr – digit expr – 2Rule 4
expr – 2 expr + digit - 2Rule 2
Rule 4
digit + 8-2 3+8 -2Rule 4
Chart 44
Leftmost Derivation
The leftmost non-terminal is replaced in each step
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
digit + digit – digit 3 + digit – digit
3 + digit – digit 3 + 8 – digitRule 4
expr expr – digitRule 1
expr – digit expr + digit – digitRule 2
expr + digit – digit digit + digit – digitRule 3
Rule 4
3 + 8 – digit 3 + 8 – 2Rule 4
Chart 45
Leftmost Derivation
The leftmost non-terminal is replaced in each step
digit + digit – digit 3 + digit – digit
3 + digit – digit 3 + 8 – digitRule 4
expr expr – digitRule 1
expr – digit expr + digit – digitRule 2
expr + digit – digit digit + digit – digitRule 3
Rule 4
3 + 8 – digit 3 + 8 – 2Rule 4
expr
expr
expr digit
digit
digit
3
2
8
+
-
33
22
11
44
55
66
11
22
33
44
55
66
Chart 46
Bottom-Up Parsing
Parser examines terminal symbols of the input string, in order from left to right
Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)
Bottom-up parsing reduces a string w to the start symbol of the grammar.– At each reduction step a particular sub-string matching
the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
Chart 47
Bottom-Up Parsing
Types of bottom-up parsing algorithms– Shift-reduce parsing
At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
– LR(k) parsing L is for left-to-right scanning of the input, the R is for
constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.
Chart 48
Bottom-Up Parsing Example3+8-2
1. expr expr – digit
2. expr expr + digit
3. expr digit
4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
3 + 8 - 2
3 + 8 - 2
digit
3 + 8 - 2
digitdigit
3 + 8 - 2
digitdigit
expr
Chart 49
Bottom-Up Parsing Example3+8-2
3 + 8 - 2
digitdigit
expr
3 + 8 - 2
digitdigit
exprdigit
expr
3 + 8 - 2
digitdigit
exprdigit
Chart 50
Bottom-Up Parsing Exampleabbcde
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
a b b c d e
A
Abbcde aAbcde
a b b c d e
A
aAbcde
Chart 51
Bottom-Up Parsing Exampleabbcde
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
A
A
aAde
a b b c d e
A
A
aAbcde aAde
Chart 52
Bottom-Up Parsing Exampleabbcde
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
A
A
aAde aABe
B
a b b c d e
A
A
aABe
B
Chart 53
Bottom-Up Parsing Exampleabbcde
1. S aABe
2. A Abc | b
3. B d
Example input: abbcde
a b b c d e
A
A
aABe S
B
S
Chart 54
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat .
the cat sees a rat
Noun
the cat sees a rat. the Noun sees a rat.
.
the cat sees a rat
Noun
the Noun sees a rat.
.
Chart 55
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject sees a rat.
Subject
.
the cat sees a rat
Noun
the Noun sees a rat. Subject sees a rat.
Subject
.
Chart 56
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject sees a rat. Subject Verb a rat.
Subject
Verb
.
the cat sees a rat
Noun
Subject Verb a rat.
Subject
Verb
.
Chart 57
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Subject Verb a rat. Subject Verb a Noun.
the cat sees a rat
Noun
Subject
Verb
.
Noun
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb a Noun.
Chart 58
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb a Noun. Subject Verb Object.
Object
Subject Verb Object.
the cat sees a rat
Noun
Subject
Verb
.
Noun
ObjectWhat would happened if we
choose ‘Subject a Noun’ instead of
‘Object a Noun’?
Chart 59
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb Object.
Object
Sentence
Chart 60
Top-Down Parsing
The parser examines the terminal symbols of the input string, in order from left to right.
The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).
An attempt to find the leftmost derivation for an input string
Chart 61
Top-Down Parsers
General rules for top-down parsers– Start with just a stub for the root node– At each step the parser takes the left most stub– If the stub is labeled by terminal symbol t, the parser
connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)
– If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).
– Parsing succeeds when and if the whole input string is connected up to the syntax tree.
Chart 62
Top-Down Parsing
Two forms– Backtracking parsers
Guesses which rule to apply, back up, and changes choices if it can not proceed
– Predictive Parsers Predicts which rule to apply by using look-ahead tokens
Backtracking parsers are not very efficient. We will cover Predictive parsers
Chart 63
Predictive Parsers
Many types– LL(1) parsing
First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead
Table driven with an explicit stack to maintain the parse tree
– Recursive decent parsing Uses recursive subroutines to traverse the parse tree
Chart 64
Predictive Parsers (Lookahead)
Lookahead in predictive parsing– The lookahead token (next token in the input) is used
to determine which rule should be used next– For example:
1. term num term’
2. term’ ‘+’ num term’ | ‘-’ num term’ |
– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7
+
term
num term’
term
num term’
Chart 65
Predictive Parsers (Lookahead)
1. term num term’
2. term’ ‘+’ num term’ | ‘-’ num term’ |
– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7
+
term
num term’
3
term’num7
+
term
num term’
3
- num term’
Chart 66
Predictive Parsers (Lookahead)
1. term num term’
2. term’ ‘+’ num term’ | ‘-’ num term’ |
– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7 +
term
num term’
3 - num term’
2
term’num7 +
term
num term’
3 - num term’
2
Chart 67
Recursive-Decent Parsing
Top-down parsing algorithm– Consists of a group of methods (programs) parseN,
one for each nonterminal symbol N of the grammar.– The task of each method parseN is to parse a single N-
phrase– These parsing methods cooperate to parse complete
sentences
Chart 68
Recursive-Decent Parsing
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
.
a. Decide which production rule to apply. Only one, #1.This step created four stubs.
Chart 69
Recursive-Decent Parsing
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 70
Recursive-Decent Parsing
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 71
Recursive-Decent Parsing
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 72
Recursive-Decent Parsing
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 73
Recursive-Decent Parsing
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 74
Recursive-Decent Parsing
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 75
Recursive-Descent Parser for Micro-English
ParseSentence
ParseSubject
ParseObject
ParseVerb
ParseNoun
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Chart 76
Recursive-Descent Parser for Micro-English
ParseSentence
parseSubject
parseVerb
parseObject
parseEnd
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Sentence
Subject
Verb
Object
.
Chart 77
Recursive-Descent Parser for Micro-English
ParseSubjectif input = “I”
acceptelse if input =“a”
acceptparseNoun
else if input = “the”acceptparseNoun
else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Subject I
|
Noun
a
|
Noun
the
Chart 78
Recursive-Descent Parser for Micro-English
ParseNoun
if input = “cat”
accept
else if input =“mat”
accept
else if input = “rat”
accept
else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Noun cat
| mat
| rat
Chart 79
Recursive-Descent Parser for Micro-English
ParseObjectif input = “me”
acceptelse if input =“a”
acceptparseNoun
else if input = “the”acceptparseNoun
else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Object
me
|
Noun
a
|
Noun
the
Chart 80
Recursive-Descent Parser for Micro-English
ParseVerb
if input = “like”
accept
else if input =“is”
accept
else if input = “see”
accept
else if input = “sees”
accept
else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
Verb like
| is
| see
| sees
Chart 81
Recursive-Descent Parser for Micro-English
ParseEnd
if input = “.”
accept
else error
1. Sentence Subject Verb Object.
2. Subject I | a Noun | the Noun
3. Object me | a Noun | the Noun
4. Noun cat | mat | rat
5. Verb like | is | see | sees
.
Chart 82
Systematic Development of a Recursive-Descent Parser
Given a (suitable) context-free grammar– Express the grammar in EBNF, with a single production rule for
each nonterminal symbol, and perform any necessary grammar transformations
Always eliminate left recursion Always left-factorize whenever possible
– Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X
– Make the parser consist of: A private variable currentToken; Private parsing methods developed in previous step Private auxiliary methods accept and acceptIt, both of which call the
scanner A public parse method that calls parseS, where S is the start symbol
of the grammar), having first called the scanner to store the first input token in currentToken
Chart 83
Quote of the Week
“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”– Bjarne Stroustrup
Chart 84
Quote of the Week
Did you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it. I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.
Chart 85
Converting EBNF Production Rules to Parsing Methods
For production rule N::=X– Convert production rule to parsing method named parseN
Private void parseN () { Parse X }
– Refine parseE to a dummy statement– Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()– Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing
methodparseN()
– Refine parse X Y to{parseXparseY}}
– Refine parse X|YSwitch (currentToken.kind) {Cases in starter[[X]]
Parse XBreak;
Cases in starters[[Y]]:Parse YBreak
Default:Report a syntax error
}
Chart 86
Converting EBNF Production Rules to Parsing Methods
For X | Y – Choose parse X only if the current token is one that
can start an X-phrase– Choose parse Y only if the current token is one that
can start an Y-phrase starters[[X]] and starters[[Y]] must be disjoint
For X*– Choose
while (currentToken.kind is in starters[[X]]) starter[[X]] must be disjoint from the set of tokens that can
follow X* in this particular context
Chart 87
Converting EBNF Production Rules to Parsing Methods
A grammar that satisfies both these conditions is called an LL(1) grammar
Recursive-descent parsing is suitable only for LL(1) grammars
Chart 88
Error Repair
Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.
Error repair usually occurs at two levels:– Local: repairs mistakes with little global import, such as
missing semicolons and undeclared variables.– Scope: repairs the program text so that scopes are
correct. Errors of this kind include unbalanced parentheses and begin/end blocks.
Chart 89
Error Repair
Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:
– No input should cause the compiler to collapse– Illegal constructs are flagged– Frequently occurring errors are repaired gracefully– Minimal stuttering or cascading of errors.
LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input
Chart 90
Triangle single-Command
Single-Command ::= | V-name := Expression| Identifier ( Actual-Parameter-Sequence )| begin Command end| let Declaration in single-Command| if Expression then single-Command
else single-Command| while Expression do single-Command
V-name ::= Identifier| V-name . Identifier| V-name [ Expression ]
Identifier :: = Letter (Letter | Digit)*Letter ::= a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|ZDigit :: = 0|1|2|3|4|5|6|7|8|9
Chart 91
Starter Sets
Starter Set for RE– starters[[X]] is the string of terminal symbols that can
start a string generated by X
Examplestarters[[single-Command]] =
starters[[:=, (, begin, let, if, while]] What about Vname vs Identifier?
– Use the look ahead when encounter Identifier to look for := or (.
Chart 92
Mini-Triangle Production Rules
Program ::= Command Program (1.14)
Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand (1.15b)
| Command ; Command SequentialCommand (1.15c)| if Expression then Command IfCommand (15.d)
else Command| while Expression do Command WhileCommand (1.15e| let Declaration in Command LetCommand (1.15f)
Expression ::= Integer-Literal IntegerExpression (1.16a)| V-name VnameExpression (1.16b)| Operator Expression UnaryExpression (1.16c)| Expression Operator Expression BinaryExpressioiun (1.16d)
V-name ::= Identifier SimpelVname (1.17)
Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)| var Identifier : Typoe-denoter VarDeclaration (1.18b)| Declaration ; Declaration SequentialDeclaration (1.18c)
Type-denoter ::= Identifier SimpleTypeDenoter (1.19)
Chart 93
Abstract Syntax Trees
An explicit representation of the source program’s phrase structure
AST for Mini-Triangle
Chart 94
Abstract Syntax Trees
Program ASTs (P):
Program
C
Program ::= Command Program (1.14
Command ASTs (C):
AssignCommand
V E
CallCommand
Identifier E
spelling
SequentialCommand
C1C2
Command ::= V-name := Expression AssignCommand (1.15a)
| Identifier ( Expression ) CallCommand (1.15b)
| Command ; Command SequentialCommand (1.15c)
(1.15a)(1.15b) (1.15c)
Chart 95
Abstract Syntax Trees
Command ASTs (C):
WhileCommand
V E
SequentialCommand
C1C2(1.15e)
(1.15d)
LetCommand
D C(1.15f) E
Command ::= | if Expression then Command IfCommand (15.d)
else Command
| while Expression do Command WhileCommand (1.15e
| let Declaration in Command LetCommand (1.15f)