+ All Categories
Home > Documents > 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing...

1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing...

Date post: 26-Dec-2015
Category:
Upload: thomasine-stanley
View: 250 times
Download: 1 times
Share this document with a friend
Popular Tags:
107
1 Syntax Analysis Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars and parsers Bison/Yacc - parser gen erators Error Handling: Detecti on & Recovery
Transcript
Page 1: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

1

Syntax AnalysisSyntax Analysis

Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars and parsers Bison/Yacc - parser generators Error Handling: Detection & R

ecovery

Page 2: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

2

Introduction to parsersIntroduction to parsers

LexicalAnalyzer

Parser

SymbolTable

token

next token

source SemanticAnalyzer

syntaxtreecode

CFG

Page 3: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

3

Context Free GrammarContext Free Grammar

CFG & Terminology Rewrite vs. Reduce Derivation

Language and CFL Equivalence & CNF

Parsing vs. Derivation lm/rm derivation & parse tree Ambiguity & resolution

Expressive power

Derivation is the reverse of Parsing.If we know how sentences are derived, we may find a parsing method in the reversed direction.

Page 4: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

4

CFG: An ExampleCFG: An Example

Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: expr, opProductions:

expr expr op expr expr ‘(’ expr ‘)’

expr ‘-’ expr expr id

op ‘+’ | ‘-’ | ‘*’ | ‘/’ The start symbol: expr

Page 5: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

5

Notational Conventions in CFGNotational Conventions in CFG

• a, b, c, … [+-0-9], id: symbols in • A, B, C,…,S, expr,stmt: symbols in N• U, V, W,…,X,Y,Z: grammar symbols in(+N)• …denotes strings in (+N)*

• u, v, w,… denotes strings in *

• is an abbreviation of

• Alternatives: … at RHS

||| A

A

AA

Page 6: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

7

Context-Free GrammarsContext-Free Grammars

A set of terminals: basic symbols from which sentences are formed

A set of nonterminals: syntactic variables denoting sets of strings

A set of productions: rules specifying how the terminals and nonterminals can be combined to form sentences

The start symbol: a distinguished nonterminal denoting the language

Page 7: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

8

CFG: ComponentsCFG: ComponentsSpecification for Structures & ConstituencySpecification for Structures & Constituency

• CFG: formal specification of structure (parse trees)– G = {, N, P, S} : terminal symbols– N: non-terminal symbols– P: production rules– S: start symbol

Page 8: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

9

CFG: ComponentsCFG: Components

: terminal symbols– the input symbols of the language

• programming language: tokens (reserved words, variables, operators, …)

• natural languages: words or parts of speech

– pre-terminal: parts of speech (when words are regarded as terminals)

• N: non-terminal symbols– groups of terminals and/or other non-terminals

• S: start symbol: the largest constituent of a parse tree

Page 9: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

10

CFG: ComponentsCFG: Components

• P: production (re-writing) rules– form: A → β (A: non-terminal, β: string of

terminals and non-terminals)– meaning: A re-writes to (“consists of”, “derived

into”)β, or β reduced to A – start with “S-productions” (S → β)

Page 10: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

11

DerivationsDerivations

A derivation step is an application of a production as a rewriting rule

E - EA sequence of derivation steps

E - E - ( E ) - ( id ) is called a derivation of “- ( id )” from E

The symbol * denotes “derives in zero or more steps”; the symbol + denotes “derives in one or more steps

Page 11: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

12

CFG: Accepted LanguagesCFG: Accepted Languages

• Context-Free Language– Language accepted by a CFG

• L(G) = { | S + (strings of terminals that can be derived from start symbol)}

– Proof of acceptance: by induction• On the number of derivation steps

• On the length of input string

Page 12: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

13

Context-Free LanguagesContext-Free Languages

A context-free language L(G) is the language defined by a context-free grammar G

A string of terminals is in L(G) if and only if S + , is called a sentence of G

If S * , where may contain nonterminals, then we call a sentential form of G

E - E - ( E ) - ( id ) G1 is equivalent to G2 if L(G1) = L(G2)

Page 13: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

14

CFG: EquivalenceCFG: Equivalence• Chomsky Normal Form (CNF) (Chomsky, 1963):

– ε-free, and– Every production rule is in either of the following

form:• A → A1 A2 [two non-terminals: A1, A2], or• A → a [a terminal: a]

– i.e., two non-terminals or one terminal at the RHS

• Properties:– Generate binary parse tree– Good simplification for some algorithms

• e.g., grammar training with the inside-outside algorithm (Baker 1979)

– Good tool for theoretical proving• e.g., time complexity

Page 14: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

15

CFG: EquivalenceCFG: Equivalence

• Every CFG can be converted into a weakly equivalent CNF– equivalence: L(G1) = L(G2)

• strong equivalent: assign the same phrase structure to each sentence (except for renaming non-terminals)

• weak equivalent: do not assign the same phrase structure to each sentence

– e.g., A → B C D == {A → B X, X → CD}

Page 15: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

16

CFG: An ExampleCFG: An Example

Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: E, opProductions:

E E op E …[R1] E ‘(’ E ‘)’ …[R2] E ‘-’ E …[R3] E id …[R4] op ‘+’ | ‘-’ | ‘*’ | ‘/’

The start symbol: E

Page 16: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

17

Left- & Right-most DerivationsLeft- & Right-most DerivationsEach derivation step needs to choose

– a nonterminal to rewrite– an alternative to apply

A leftmost derivation always chooses the leftmost nonterminal to rewrite

E lm - E lm - ( E ) lm - ( E + E ) lm - ( id + E ) lm - ( id + id )

A rightmost (canonical) derivation always chooses the rightmost nonterminal to rewrite

E rm - E rm - ( E ) rm - ( E + E ) rm - (E + id ) rm - ( id + id )

Page 17: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

18

Left- & Right-most DerivationsLeft- & Right-most Derivations Representation of leftmost/rightmost derivations:

Use the sequence of productions (or production numbers) to represent a derivation sequence.

Example:E rm - E rm - ( E ) rm - ( E + E )

rm - (E + id ) rm - ( id + id ) => [3], [2], [1], [4], [4] (~ R3, R2, R1, R4, R

4)Advantage: A compact representation for

parse tree (data compression)Each parse tree has a unique leftmost/rightmo

st derivation

R3

R2 R1

Page 18: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

19

Parse TreesParse Trees

A parse tree is a graphical representation for a derivation that filters out the order of choosing nonterminals for rewriting

PP

in

NP

NP

girl the park

NP

Page 19: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

20

Context Free Grammar (CFG): Context Free Grammar (CFG): Specification for Structures & ConstituencySpecification for Structures & Constituency

• Parse Tree: graphical representation of structure– Root node (S): a sentencial level structure

– Internal nodes: constituents of the sentence

– Arcs: relationship between parent nodes and their children (constituents)

– Terminal nodes: surface forms of the input symbols (e.g., words)

• Bracketed notation: Alternative representation• e.g., [I saw [the [girl [in [the park]]]]]

Page 20: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

21

Parse Tree:Parse Tree:“I saw the girl in the park”“I saw the girl in the park”

PP

in

NP

NP

girl the parkI saw the

NP

S

VP

vpron det n p det n

1st parse

Page 21: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

22

Parse Tree:Parse Tree:“I saw the girl in the park”“I saw the girl in the park”

PP

in

NP

NP

girl the park

NP

I saw the

NP

S

VP

vpron det n p det n

2nd parse

Page 22: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

23

LM & RM: An ExampleLM & RM: An Example

E

-

( )

+

id id

E

E E

E E lm - E lm - ( E ) lm - ( E + E )lm - ( id + E ) lm - ( id + id )

E rm - E rm - ( E ) rm - ( E + E )rm - ( E + id ) rm - ( id + id )

Page 23: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

24

Parse Trees & DerivationsParse Trees & Derivations

Many derivations may correspond to the same parse tree, but every parse tree has associated with it a unique leftmost and a unique rightmost derivation

Page 24: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

25

Ambiguous GrammarAmbiguous Grammar

A grammar is ambiguous if it produces more than one parse tree for some sentence more than one leftmost/rightmost derivation

E E + E id + E id + E * E id + id * E id + id * id

E E * E E + E * E id + E * E id + id * E id + id * id

Page 25: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

26

Ambiguous GrammarAmbiguous Grammar

E

+E E

id

id

*E E

id

E

*E E

id

id

+E E

id

Page 26: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

27

Resolving AmbiguityResolving Ambiguity

Use disambiguating rules to throw away

undesirable parse trees

Rewrite grammars by incorporating

disambiguating rules into unambiguous

grammars

Page 27: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

28

An ExampleAn Example

The dangling-else grammar stmt if expr then stmt | if expr then stmt else stmt

| other

Two parse trees forif E1 then if E2 then S1 else S2

Page 28: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

29

An ExampleAn Example

S

elseE S Sif then

if E then S

elseE

S

S Sif then

if E then S

Preferred parse: closest then

Page 29: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

30

Disambiguating RulesDisambiguating Rules

Rule: match each else with the closest previous

unmatched then

Remove undesired state transitions in the

pushdown automaton (parser) shift/reduce conflict on “else”

1st parse: reduce

2nd parse: shift

Page 30: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

31

Grammar RewritingGrammar Rewritingstmt m_stmt ; with only paired then-else | unm_stmt

m_stmt if expr then m_stmt else m_stmt | other

unm_stmt if expr then stmt | if expr then m_stmt else unm_stmt

So… cannot have unmatched then-else

want this then-else pair matched

Page 31: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

32

RE RE vs.vs. CFG CFG

Every language described by a RE can also be described by a CFG

Example: (a|b)*abb A0 a A0 | b A0 | a A1 A1 b A2 A2 b A3 A3 (1) Right branching

(2) Starts with a terminal symbol

Page 32: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

33

RE RE vs.vs. CFG CFGRegular Grammar:• Right branching• Starts with a

terminal symbol

A0

a(|b) A0

a(|b) A0A0

a A1

b A2A2

b A3

(a|b)* abb

Page 33: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

34

RE vs. CFG

0 31 2a b b

a

b

start

RE: (a | b)*abb

A0 a A0 | b A0 | a A1

A1 b A2

A2 b A3

A3 A0

A1

A2

A3

Page 34: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

35

RE vs. CFG

a DFA for (a | b)*abb

0 31 2ab b

a

b

start

a

b

a

A0

A1 A3

A2

A0 b A0 | a A1

A1 a A1 | b A2

A2 a A1 | b A3

A3 a A1 | b A0 |

Page 35: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

36

CFG: Expressive Power (cont.)CFG: Expressive Power (cont.)

• Writing a CFG for a FSA (RE)– define a non-terminal Ni for a state with state numb

er i

– start symbol S = N0 (assuming that state 0 is the initial state)

– for each transition δ(i,a)=j (from state i to stet j on input alphabet a), add a new production Ni → a Nj to P (if a== εNi → Nj)

– for each final state i, add a new production Ni → εto P

Page 36: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

38

CFG: Expressive PowerCFG: Expressive Power

• CFG vs. Regular Expression (R.E.)– Every R.E. can be recognized by a FSA– Every FSA can be represented by a CFG

with production rules of the form: A → a B | ε

– (known as a “Regular Grammar”)

• Therefore, L(RE) L(CFG)

Page 37: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

39

CFG: Expressive Power (cont.)CFG: Expressive Power (cont.)

• Chomsky Hierarchy:– R.E. : Regular set (recognized by FSAs)– CFG: Context-free (Pushdown automata)– CSG: Context-sensitive (Linear bounded aut

omata)– Unrestricted: Recursively enumerable (Tuni

ng Machine)

Page 38: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

40

Push-Down AutomataPush-Down Automata

Finite Automata

Input

OutputStack

Page 39: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

41

RE RE vs.vs. CFG CFG

Why use REs for lexical syntax?– do not need a notation as powerful as CFGs– are more concise and easier to understand than

CFGs– More efficient lexical analyzers can be constru

cted from REs than from CFGs– Provide a way for modularizing the front end i

nto two manageable-sized components

Page 40: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

42

CFG CFG vs.vs. Finite-State Machine Finite-State Machine

• Inappropriateness of FSA– Constituents: only terminals

– Recursion: do not allow A => … B … => … A …

• RTN (Recursive Transition Network)– FSA with augmentation of recursion

– arc: terminal or non-terminal

– if arc is non-terminal: call to a sub-transition network & return upon traversal

Page 41: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

43

Nonregular ConstructsNonregular Constructs

REs can denote only a fixed number of repetitions or an unspecified number of repetitions of one given constructE.g. a*b*

A nonregular construct:– L = {anbn | n 1}

Page 42: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

44

Non-Context-Free ConstructsNon-Context-Free Constructs

CFGs can denote only a fixed number of repetitions or an unspecified number of repetitions of one or two (paired) given constructs E.g. anbn

Some non-context-free constructs:– L1 = {wcw | w is in (a | b)*}

• declaration/use of identifiers

– L2 = {anbmcndm | n 1 and m 1}• #formal arguments/#actual arguments

– L3 = {anbncn | n 0}• e.g., b: Backspace, c: under score

Page 43: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

45

Context-Free ConstructsContext-Free Constructs

FA (RE) cannot keep countsCFGs can keep count of two items but not

threeSimilar context-free constructs:

– L’1 = {wcwR | w is in (a | b)*, R: reverse order}– L’2 = {anbmcmdn | n 1 and m 1}– L’’2 = {anbncmdm | n 1 and m 1}– L’3 = {anbn | n 1}

Page 44: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

46

CFG ParsersCFG Parsers

Page 45: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

47

Types of CFG ParsersTypes of CFG Parsers

Universal: can parse any CFG grammar CYK, Earley

CYK: Exhaustively matching sub-ranges of input tokens against grammar rules, from smaller ranges to larger ranges

Earley: Exhaustively enumerating possible expectations from left-to-right, according to current input token and grammar

Non-universal: not all CFG’s can be parsed (e.g., recursive descent parser)

Universal (to all grammars) is NOT always efficient

Page 46: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

48

Types of CFG ParsersTypes of CFG Parsers Practical Parsers: [“what is a good parser?”]

Simple: simple program structure Left-to-right (or right-to-left) scan

middle-out or island driven is often not preferred

Top-down or Bottom up matching

Efficient: efficient for good/bad inputs Parse normal syntax quickly Detect errors immediately on next token

Deterministic: No alternative choices during parsing given next token Small lookahead buffer (also contribute to efficiency)

Page 47: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

49

Types of CFG ParsersTypes of CFG Parsers

Top Down:Matching from start symbol down to terminal

tokens

Bottom Up:Matching input tokens with reducible rules

from terminal up to start symbol

Page 48: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

50

Efficient CFG ParsersEfficient CFG Parsers

Top Down: LL ParsersMatching from start symbol down to terminal

tokens, left-to-right, according to a leftmost derivation sequence

Bottom Up: LR ParsersMatching input tokens with reducible rules,

left-to-right, from terminal up to start symbol, in a reverse order of rightmost derivation sequence

Page 49: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

51

Efficient CFG ParsersEfficient CFG Parsers

Efficient & Deterministic Parsing – only possible for some subclasses of grammars with special parsing algorithmsTop Down:

Parsing LL Grammars with LL Parsers

Bottom Up:Parsing LR Grammars with LR ParsersLR grammar is a larger class of grammars than LL

Page 50: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

52

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

Parsing Table:A pre-computed table (according to the gram

mar), indicating the appropriate action(s) to take in any predefined state when some input token(s) is/are under examination

Lookahead symbol(s): the input symbol(s) under examination for determining next action(s) id + * num

State-0 action-1 action-3

State-1 action-2 action-5

State-2 action-4

Good parsers do not change their codes when the grammar

is revised. Table driven.

Page 51: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

53

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

Parsing Table Construction:Decide a pre-defined number of lookaheads to

use for predicting next stateDefine and enumerate all the unique states for

the parsing methodDecide the actions to take in all states with all

possible lookahead(s)

Page 52: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

54

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

X-Parser: you can invent any parser and call it the X-ParserBut its parsing algorithm may not handle all

grammars deterministically, thus efficiently.X-Grammar:

Any grammar whose parsing table for the X-parsing method/X-Parser has no conflicting actions in all states

Non-X Grammar: has more than one action to take under any state

Page 53: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

55

Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers

k: The number of lookahead symbols used by a parser to determine the next action A larger number of lookahead symbols tends to make

it less possible to have conflicting actions But may result in a much larger table that grows exponential

ly with the number of lookaheads Does not guarantee unambiguous for some grammars (inher

ently ambiguous) even with infinite lookaheads X(k) Parser:

X Parser that uses k lookahead symbols to determine the next action

X(k) Grammar: any grammar deterministically parsable with X(k) Par

ser

Page 54: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

56

Types of Grammars Capable of Types of Grammars Capable of Efficient ParsingEfficient Parsing

LL(k) GrammarsGrammars that can be deterministically

parsed using an LL(k) parsing algorithme.g., LL(1) grammar

LR(k) GrammarsGrammars that can be deterministically

parsed using an LR(k) parsing algorithme.g., SLR(1) grammar, LR(1) grammar,

LALR(1) grammar

Page 55: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

57

Top-Down CFG ParsersTop-Down CFG Parsers

Recursive Descent Parser

vs.

Non-Recursive LL(1) Parser

Page 56: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

58

Top-Down ParsingTop-Down ParsingConstruct a parse tree from the root to the

leaves using leftmost derivation

S c A B input: cadA a b | aB d

S

c A B

S

c A B

a

S

c A B

a b

S

c A B

a d

Page 57: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

59

Predictive ParsingPredictive Parsing

A top-down parsing without backtracking– there is only one alternative production to choo

se at each derivation step

stmt if expr then stmt else stmt | while expr do stmt | begin stmt_list end

Page 58: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

60

LL(LL(kk) Parsing) Parsing

The first L stands for scanning the input from left to right

The second L stands for producing a leftmost derivation

The k stands for the number of input symbols for lookahead used to choose alternative productions at each derivation step

Page 59: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

61

LL(1) ParsingLL(1) Parsing

Use one input symbol of lookaheadSame as Recursive-descent parsing

But, Non-recursive predictive parsing

Page 60: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

62

Recursive Descent Parsing (more)Recursive Descent Parsing (more)

The parser consists of a set of (possibly recursive) procedures

Each procedure is associated with a nonterminal of the grammar

The calling sequence of procedures in processing the input implicitly defines a parse tree for the input

Page 61: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

63

An ExampleAn Example

type simple | id | array [ simple ] of type

simple integer | char | num dotdot num

Page 62: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

64

An ExampleAn Example

type

array [ simple ] of type

dotdotnum num simple

integer

array [ num dotdot num ] of integer

Page 63: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

65

An ExampleAn Exampleprocedure type;begin if lookahead is in { integer, char, num } then simple else if lookahead = id then match(id) else if lookahead = array then begin match(array); match('['); simple; match(']'); match(of); type end else errorend;

Page 64: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

66

An ExampleAn Example

procedure match(t : token);begin if lookahead = t then lookahead := nexttoken else errorend;

Page 65: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

67

An ExampleAn Example

procedure simple;begin if lookahead = integer then match(integer) else if lookahead = char then match(char) else if lookahead = num then begin match(num); match(dotdot); match(num) end else errorend;

Page 66: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

68

LL(k) Constraint: Left RecursionLL(k) Constraint: Left Recursion

A grammar is left recursive if it has a nonterminal A such that A + A

A A | A R R R |

A

A

A

A

A R

RRR

*

Page 67: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

69

Direct/Immediate Left Direct/Immediate Left RecursionRecursion

A A 1 | A 2 | ... | A m | 1 | 2 | ... | n

A 1 A' | 2 A' | ... | n A'

A' 1 A' | 2 A' | ... | m A' |

is equivalent to …

(1 | 2 | ... | n ) (1 | 2 | ... | m )*

A A i | j (i=1,m ; j=1,n)

Page 68: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

70

An ExampleAn Example

E E + T | TT T * F | FF ( E ) | id

E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

Page 69: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

71

Indirect Left RecursionIndirect Left Recursion

G0: S A a | b A A c | S d |

Problem: Indirect Left-Recursion: S A a S d a

Solution-Step1: Indirect to Direct Left-Recursion: A A c | A a d | b d |

Solution-Step2: Direct Left-Recursion to Right-Recursion: S A a | b A b d A' | A' A' c A' | a d A' |

• Scan rules top-down• Do not start with symbols defined earlier (=> substitute them if any)• Resolve direct recursion

Page 70: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

72

Indirect Left RecursionIndirect Left Recursion

Input. Grammar G with no cycles or -production.Output. An equivalent grammar with no left recursion.1. Arrange the nonterminals in some order A1, A2, ..., An

2. for i := 1 to n do begin // Step1: Substitute 1st-symbols of Aifor j := 1 to i - 1 do begin // which are previous Aj’s replace each production of the form Ai Aj ( j < i )

by the production Ai 1 | 2 | ... | k where Aj 1 | 2 | ... | k are all thecurrent Aj-productions;

endeliminate direct left recursion among Ai-productions // Step2

end

Page 71: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

73

Left FactoringLeft Factoring

Two alternatives of a nonterminal A have a nontrivial common prefix if , and

A 1 | 2

A A'A' 1 | 2

Page 72: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

74

An ExampleAn Example

S i E t S | i E t S e S | aE b

S i E t S S' | aS' e S | E b

Page 73: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

76

Top-Down Parsing: as Stack Top-Down Parsing: as Stack MatchingMatching

Construct a parse tree from the root to the leaves using leftmost derivation

S c A B input: cadA a b | aB d

S

c A B

S

c A B

a

S

c A B

a b

S

c A B

a d

Page 74: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

77

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing – General Stateg – General State

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

X

…Non-

Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

Page 75: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

78

Nonrecursive Predictive Parsing Nonrecursive Predictive Parsing – Expand Non-terminal– Expand Non-terminal

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

Page 76: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

79

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing – Match Terminalg – Match Terminal

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

=a

Page 77: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

80

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing - Error Recoveryg - Error Recovery

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

=a

=c

Page 78: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

81

Nonrecursive Predictive ParsinNonrecursive Predictive Parsing - Error Recoveryg - Error Recovery

Parsing program(parser/driver)

Parsing table

Input

Output

Stack

Predictive: pre-computed

parsing actions

M[X,a]= {X -> Y1 Y2 … Yk}

Y1

Y2

Yk

Non-Recursive: “Stack + Driver

Program” (instead of Recursive

procedures)

a b c … x y z

=a

=c

Page 79: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

83

Stack OperationsStack Operations

Match– when the top stack symbol is a terminal and it

matches the input symbol, pop the top stack symbol and advance the input pointer

Expand– when the top stack symbol is a nonterminal, rep

lace this symbol by the right hand side of one of its productions

• Leftmost RHS symbol at Top-of-Stack

Page 80: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

84

An ExampleAn Example

type simple | id | array [ simple ] of type

simple integer | char | num dotdot num

Page 81: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

85

An ExampleAn ExampleAction Stack InputE type array [ num dotdot num ] of integerM type of ] simple [ array array [ num dotdot num ] of integerM type of ] simple [ [ num dotdot num ] of integerE type of ] simple num dotdot num ] of integerM type of ] num dotdot num num dotdot num ] of integerM type of ] num dotdot dotdot num ] of integerM type of ] num num ] of integerM type of ] ] of integerM type of of integerE type integerE simple integerM integer integer

Page 82: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

86

Parsing programParsing program

push $S onto the stack, where S is the start symbolset ip to point to the first symbol of w$; // try to match S$ with w$repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error // or error_recovery() else // X is a nonterminal

if M[X, a] = X Y1 Y2 ... Yk then pop X from and push Yk ... Y2 Y1 onto the stack else error // or error_recovery()until X = $

Page 83: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

87

Parser Driven by a Parsing Table:Parser Driven by a Parsing Table:Non-recursive DescentNon-recursive Descent

X() { // WITHOUT ε-production: X→ε

if (LA=‘a’) then

Y1(); Y2(); …Yk();

else if (LA=‘b’)

Z1(); Z2(); …; Zm();

else ERROR(); // no X→ε

// else RETURN; if X exists

} // Recursive decent procedure for matching X

a b c d

X X Y1 Y2 … Yk X Z1 Z2 … Zm

Y1 Y1 1 Y1 2

Z1 Z1 1 Z1 2

‘a’ in FirstSet( Y1 Y2 … Yk )

‘b’ in FirstSet( Z1 Z2 … Zm )

Page 84: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

88

Parser Driven by a Parsing Table:Parser Driven by a Parsing Table:Non-recursive DescentNon-recursive Descent

X() { // WITH ε-production: X→ε

if (LA=‘a’) then

Y1(); Y2(); …Yk();

else if (LA=‘b’)

Z1(); Z2(); …; Zm();

// else ERROR(); // no X→ε

else if (LA=??) RETURN; // if X exists

} // Recursive decent procedure for matching X

a b c d

X X Y1 Y2 … Yk X Z1 Z2 … Zm X

Y1 Y1 1 Y1 2

Z1 Z1 1 Z1 2

‘a’ in FirstSet( Y1 Y2 … Yk )

‘b’ in FirstSet( Z1 Z2 … Zm )

‘d’ in FollowSet(X)(S =>* …X d …)

Page 85: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

89

First Sets: Predictive ParsingFirst Sets: Predictive Parsing

The first set of a string is the set of terminals that begin the strings derived from. If * , then is also in the first set of

.Used simply to flag whether can be null for

computing First SetNot for matching any real input when parsing

FIRST() = {a | * a }+{ , if * }FIRST() includes { }: means that *

Page 86: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

90

Compute First SetsCompute First Sets

If X is terminal, then FIRST(X) is {X} If X is nonterminal and X is a production,

then add to FIRST(X) If X is nonterminal and X Y1 Y2 ... Yk is a pr

oduction, then add a to FIRST(X) if for some i, a is in FIRST(Yi) and is in all of FIRST(Y1), ..., FIRST(Yi-1).

If is in FIRST(Yj) for all j, then add to FIRST(X)

Page 87: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

91

Follow Sets: Matching EmptyFollow Sets: Matching Empty

What to do with matching null: A ? TD Recursive Descent Parsing: “assumes” success LL: more predictive => Follow Set of ‘A’

The follow set of a nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form, namely,

S * A a

a is in the follow set of A.

Page 88: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

92

Compute Follow SetsCompute Follow Sets Initialization: Place $ in FOLLOW(S), where S is the

start symbol and $ is the input right end marker. If there is a production A B , then everything in

FIRST() except for is placed in FOLLOW(B) is not considered a visible input to follow any symbol

If there is a production A B or A B where FIRST() contains (i.e., * ), then everything in FOLLOW(A) is in FOLLOW(B) S * … A a … implies S * … B a YES:“every symbol that can follow A will also follow B” NO!: “every symbol that can follow B will also follow A”

Page 89: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

93

An ExampleAn Example

E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }FIRST(T') = { *, }FOLLOW(E) = FOLLOW(E') = { ), $ }FOLLOW(T) = FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }

Page 90: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

94

Constructing Parsing TableConstructing Parsing Table

Input. Grammar G.

Output. Parsing Table M.

Method.

1. For each production A of the grammar, do steps 2 and 3.

2. For each terminal a in FIRST( ), add A to M[A, a].

3. If is in FIRST( ) [A * ], add A to M[A, b] for each

terminal b [including ‘$’] in FOLLOW(A).

- If is in FIRST( ) and $ is in FOLLOW(A),

add A to M[A, $].

4. Make each undefined entry of M be error.

Page 91: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

95

LL(1) Parsing Table ConstructionLL(1) Parsing Table Construction

A() { // WITH/WITHOUT ε-productions: A (* )

if (LA=‘a’ in First(Y1 Y2… Yk)) then

Y1(); Y2(); …Yk();

else if (LA=‘b’ in Follow(A) & εin First(Z1 Z2... ))

Z1(); Z2(); …; Zm(); // Nullable

else ERROR();

} // Recursive version of LL(1) parser

a in First() b in Follow(A) c not in First() or Follow(A)

A A A (* ) error

B

CWhen to apply A ?

including A

Page 92: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

96

An ExampleAn Example

id + * ( ) $E E TE' E TE'E' E' +TE' E' E' T T FT' T FT' T' T' T' *FT' T' T' F F id F (E)

Page 93: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

97

An ExampleAn Example Stack Input Output$E id + id * id$ $E'T id + id * id$ E TE' $E'T'F id + id * id$ T FT' $E'T'id id + id * id$ F id$E'T' + id * id$$E' + id * id$ T' $E'T+ + id * id$ E' + TE' $E'T id * id$$E'T'F id * id$ T FT' $E'T'id id * id$ F id$E'T' * id$

$E'T'F* * id$ T' * FT' $E'T'F id$$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'

Page 94: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

98

LL(1) GrammarsLL(1) Grammars

A grammar is an LL(1) grammar if its predictive parsing table has no multiply-defined entries

Page 95: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

99

A Counter ExampleA Counter Example

S i E t S S' | aS' e S | E b

a b e i t $S S a S i E t S S'S' S' S' S' e SE E b

e FOLLOW(S’)

e FIRST(e S)Disambiguation: matching closest “then”

Page 96: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

100

LL(1) Grammars or Not ??LL(1) Grammars or Not ??

A grammar G is LL(1) iff whenever A | are two distinct productions of G, the following conditions hold:– For no terminal a do both and derive strings beginning

with a.• or… M[A, first()&first()] entries will have conflicting actions

– At most one of and can derive the empty string• or… M[A, follow(A)] entries have conflicting actions

– If * , then does not derive any string beginning with a terminal in FOLLOW(A).

• or… M[A, first()&follow(A)] entries have conflicting actions

Page 97: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

101

Non-LL(1) Grammar:Non-LL(1) Grammar:Ambiguous According to LL(1) Ambiguous According to LL(1)

Parsing Table ConstructionParsing Table Construction

a in First() & First() b in Follow(A) a in First() & Follow(A)

A A A

A (* )

A (* )

A (/* ) (but * a )

A (* )

B

C

When will A & A appear in the same table cell ??

S' e S | X X a | b

Page 98: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

102

LL(1) Grammars or Not??LL(1) Grammars or Not??

If G is left-recursive or ambiguous, then M will have at least one multiply-defined entry=> non-LL(1)E.g., X X a | b

=> FIRST(X) = {b} (and, of course, FIRST(b) = {b})

=> M[X,b] includes both {X X a} and {X b}

i.e., Ambiguous G and G with left-recursive productions can not be LL(1).

No LL(1) grammar can be ambiguous

Page 99: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

103

Error Recovery for LL ParsersError Recovery for LL Parsers

Page 100: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

104

Syntactic ErrorsSyntactic Errors

• Empty entries in a parsing table:– Syntactic error is encountered when the lookah

ead symbol corresponding to this entry is in input buffer

– Error Recovery information can be encoded in such entries to take appropriate actions upon error

• Error Detection:– (1) Stacktop = x && x != input (a)– (2) Stacktop = A && M[A, a] = empty (error)

Page 101: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

105

Error Recovery StrategiesError Recovery Strategies Panic mode: skip tokens until a token in a set of

synchronizing tokens appears INS (insertion) type of errors sync at delimiters, keywords, …, that have clear

functions Phrase Level Recovery

local INS (insertion), DEL (deletion), SUB (substitution) types of errors

Error Production define error patterns (“error productions”) in grammar

Global Correction [Grammar Correction] minimum distance correction

Page 102: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

106

Error Recovery – Panic ModeError Recovery – Panic Mode

Panic mode: skip tokens until a token in a set of synchronizing tokens appears

Commonly used Synchronizing tokens:– SUB(A,ip): use FOLLOW(A) as sync set for A (pop A)

– use the FIRST set of a higher construct as sync set for a lower construct

– INS(ip): use FIRST(A) as sync set for A

– *ip= : use the production deriving as the default

– DEL(ip): If a terminal on stack cannot be matched, pop the terminal

Page 103: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

107

… …

Error Recovery – Panic ModeError Recovery – Panic ModeAction Stack InputSUB(A,ip)

INS(ip)

DEL(ip)

… A *ip … Follow(A) …A

… A *ip … First(A) …

… x *ip … …

A

x

X

Follow(A)…

A

*ip

X

… A

First(A)…*ip

X

… …x

*ip

x

Page 104: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

108

Error Recovery Actions Using Error Recovery Actions Using Follow & First Sets to SyncFollow & First Sets to Sync

Expanding non-terminal A: M[A,a] = error (blank):

Skip “a” in input = delete all such “a” (until sync with sync symbol, b) /* panic */

M[A,b] = sync (at FOLLOW(A)) Pop “A” from stack = “b” is a sync symbol following A

M[A,b] = A (== sync at FIRST(A) ) Expand A as (same as normal parsing action)

Matching terminal “x”: (*sp=“x”) != “a”

Pop(x) from stack = missing input token “x”

Page 105: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

109

An ExampleAn Example

id + * ( ) $E E TE' E TE' sync syncE' E' +TE' E' E' T T FT' sync T FT' sync syncT' T' T' *FT' T' T' F F id sync sync F (E) sync sync

FOLLOW(F)={+,*,),$}

FOLLOW(E)=FOLLOW(E’)={),$}

FIRST(X) is used to Expand non-productions or Sync (on errors)

FOLLOW(X) is used to Expand -productions or Sync (on errors)

Page 106: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

110

An ExampleAn Example Stack Input Output$E ) id * + id$ error, skip )$E id * + id$ id is in FIRST(E)$E'T id * + id$ E TE' $E'T'F id * + id$ T FT' $E'T'id id * + id$ F id$E'T' * + id$$E'T'F* * + id$ T' *FT' $E'T'F + id$ error, M[F,+]=synch / FOLLOW(F)$E'T' + id$ F popped$E' + id$ T' $E'T+ + id$ E' +TE' $E'T id$$E'T'F id$ T FT'$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'

Page 107: 1 Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars.

111

Parse Tree - Error RecoveredParse Tree - Error Recovered

E

) E’

ε

+ E’T

ε

F T’

id

T

F

id

T’

ε

F* T’

) id * + id => id * F + id


Recommended