CMPT 755 Compilers - cs.sfu.caanoop/courses/CMPT-755-Fall-2004/parsing… · Predictive Top-Down...

transcript

CMPT 755CompilersAnoop Sarkar

http://www.cs.sfu.ca/~anoop

Parsing - Roadmap

• Parser:– decision procedure: builds a parse tree

• Top-down vs. bottom-up• LL(1) – Deterministic Parsing

– recursive-descent– table-driven

• LR(k) – Deterministic Parsing– LR(0), SLR(1), LR(1), LALR(1)

• Parsing arbitrary CFGs – Polynomial time parsing

Top-Down vs. Bottom UpS → A BA → c | εB → cbB | ca

Input String: ccbca

S ⇒ AB ⇒ cB ⇒ ccbB ⇒ ccbca

Top-Down/leftmostS→ABA→cB→cbBB→ca

A→cB→caB→cbBS→AB

ccbca ⇐ Acbca ⇐ AcbB ⇐ AB ⇐ S

Bottom-Up/rightmost

Grammar:

Top-Down: Backtracking

S → A BA → c | εB → cbB | ca

True/FalseS ⇒* cbca?

S cbca try S→ABAB cbca try A→ccB cbca match cB bca dead-end, try A→ε

εB cbca try B→cbBcbB cbca match cbB bca match bB ca try B→cbBcbB ca match cbB a dead-end, try B→caca ca match ca a match a, Done!

BacktrackingS → cAd | cA → ad | a

S → cAd | cA → a | ad

Input: cad

a dFailureSuccess

Recursive descent parser does not backtrack into rules that succeed

Transition Diagramc A aS:

S → cAa

A → cB | B

B → bcB | ε

Predictive Top-Down Parser

• Knows which production to choose based onsingle lookahead symbol

• Need LL(1) grammars– First L: reads input Left to right– Second L: produce Leftmost derivation– 1: one symbol of lookahead

• Can’t have left-recursion• Must be left-factored (no left-factors)• Not all grammars can be made LL(1)

Leftmost derivation forid + id * idE ⇒ E + E⇒ id + E⇒ id + E * E⇒ id + id * E⇒ id + id * id

E → E + EE → E * EE → ( E )E → - EE → id

E ⇒*lm id + E \* E

Predictive Parsing Table

F → idF → ( T )F

T’ → εT’ → εT’ → * F T’T’

T → F T’T → F T’T$id)(*

T’ → ε2T → F T’1

F → ( T )5F → id4T’ → * F T’3

Productions

Trace “(id)*id”

(id)*id$$TOutputInputStack

T → F T’(id)*id$$T’FF → ( T )(id)*id$$T’)T(

id)*id$$T’)TT → F T’id)*id$$T’)T’FF → idid)*id$$T’)T’id

)*id$$T’)T’T’ → ε)*id$$T’)

F → idF → (T)F

T’ → εT’ → εT’ → *FT’T’

T → FT’T → FT’T

$id)(*

Trace “(id)*id”

*id$$T’OutputInputStack

T’ → * F T’*id$$T’F*id$$T’F

F → idid$$T’id$$T’

T’ → ε$$

F → idF → (T)F

T’ → εT’ → εT’ → *FT’T’

T → FT’T → FT’T

$id)(*

Table-Driven Parsing stack.push($); stack.push(S);

a = input.read();forever do begin X = stack.peek(); if X = a and a = $ then return SUCCESS; elsif X = a and a != $ then pop X; a = input.read(); elsif X != a and X ∈ N and M[X,a] then pop X; push right-hand side of M[X,a]; else ERROR!end

Predictive Parsing table

• Given a grammar produce the predictive parsingtable

• We need to to know for all rules A → α | β thelookahead symbol

• Based on the lookahead symbol the table can beused to pick which rule to push onto the stack

• This can be done using two sets: FIRST andFOLLOW

FIRST and FOLLOW

Conditions for LL(1)

• Necessary conditions:– no ambiguity– no left recursion– Left factored grammar

• A grammar G is LL(1) iff - whenever A → α | β

1. First(α) ∩ First(β) = ∅2. α ⇒* ε implies !(β ⇒* ε)3. α ⇒* ε implies First(β) ∩ Follow(A) = ∅

proc First(α: string of symbols) // assume α = X1 X2 X3 … Xn

if X1 ∈ T then First(α) := {X1}else begin i:=1; First(α) := First(X1)\{ε}; while Xi ⇒* ε do begin if i < n then First(α) := First(α) ∪ First(Xi+1)\{ε}; else First(α) := First(α) ∪ {ε}; i := i + 1; endend

proc First(X); modifiedforeach X ∈ T do First(X) := X;foreach p ∈ P : X → ε do First(X) := {ε};repeat foreach X ∈ N, p : X → Y1 Y2 Y3 … Yn do

begin i:=1; while Yi ⇒* ε and i <= n do begin First(X) := First(X) ∪ First(Yi)\{ε}; i := i+1; end

if i = n+1 then First(X) := First(X) ∪ {ε}; else First(X) := First(X) ∪ First(Yi);

until no change in any First(X);

proc Follow(N: non-terminal)Follow(S) := {$};repeat foreach p ∈ P do

case p = A → αBβ begin Follow(B) := Follow(B) ∪ First(β)\{ε}; if ε ∈ First(β) then Follow(B) := Follow(B) ∪ Follow(A); end

case p = A → αB Follow(B) := Follow(B) ∪ Follow(A);

until no change in any Follow(N)

Example First/FollowS → ABA → c | εB → cbB | ca

First(A) = {c, ε} Follow(A) = {c}Follow(A) ∩ First(c) = {c}

First(B) = {c}First(cbB) = First(ca) = {c} Follow(B) = {$}First(S) = {c} Follow(S) = {$}

Not an LL(1) grammar

Converting to LL(1)

S → ABA → c | εB → cbB | ca

S → cAaA → cB | BB → bcB | ε

c (c b c b … c b) c a (c b c b … c b) c a

Note that grammaris regular: c? (cb)* ca

same as:c c? (bc)* a

c c (b c b … c b c) ac (b c b … c b c) a

Verifying LL(1) using F/F sets

First(A) = {b, c, ε}First(B) = {b, ε}

Follow(A) = {a}Follow(B) = {a}

First(S) = {c} Follow(S) = {$}

S → cAaA → cB | BB → bcB | ε

Building the Parse Table

• Compute First and Follow sets• For each production A → α

– foreach a ∈ First(α) add A → α to M[A,a]– If ε ∈ First(α) add A → α to M[A,b] for each b

in Follow(A)– If ε ∈ First(α) add A → α to M[A,$] if $ ∈

Follow(α)– All undefined entries are errors

Revisit conditions for LL(1)

• A grammar G is LL(1) iff - whenever A → α | β

1. First(α) ∩ First(β) = ∅2. α ⇒* ε implies !(β ⇒* ε)3. α ⇒* ε implies First(β) ∩ Follow(A) = ∅

• No more than one entry per table field

Error Handling

• Reporting & Recovery– Report as soon as possible– Suitable error messages– Resume after error– Avoid cascading errors

• Phrase-level vs. Panic-mode recovery

Panic-Mode Recovery• Skip tokens until synchronizing set is seen

– Follow(A)• garbage or missing things after

– Higher-level start symbols– First(A)

• garbage before– Epsilon

• if nullable– Pop/Insert terminal

• “auto-insert”• Add “synch” actions to table

Summary so far

• LL(1) grammars– necessary conditions

• No left recursion• Left-factored

• Not all languages can be generated by LL(1)grammar

• LL(1) grammars can be parsed by simplepredictive recursive-descent parser– Alternative: table-driven top-down parser

Bottom-up parsing overview

• Start from terminal symbols, search for a path tothe start symbol

• Apply shift and reduce actions: postpone decisions• LR parsing:

– L: left to right parsing– R: rightmost derivation (in reverse or bottom-up)

• LR(0) → SLR(1) → LR(1) → LALR(1)– 0 or 1 or k lookahead symbols

Actions in Shift-Reduce Parsing

• Shift– add terminal to parse stack, advance input

• Reduce– If αw on stack, and A→ w, and there is a β ∈ T* such

that S ⇒*rm αAβ ⇒rm αwβ then we can prune thehandle w; we reduce αw to αA on the stack

– αw is a viable prefix• Error• Accept

Questions

• When to shift/reduce?– What are valid handles?– Ambiguity: Shift/reduce conflict

• If reducing, using which production?– Ambiguity: Reduce/reduce conflict

Rightmost derivation forid + id * idE ⇒ E * E⇒ E * id⇒ E + E * id⇒ E + id * id⇒ id + id * id shift

reduce with E → id

E → E + EE → E * EE → ( E )E → - EE → id

E ⇒*rm E + E \* id

LR Parsing

• Table-based parser– Creates rightmost derivation (in reverse)– For “less massaged” grammars than LL(1)

• Data structures:– Stack of states/symbols {s}– Action table: action[s, a]; a ∈ T– Goto table: goto[s, X]; X ∈ N

Action/Goto Table

R3R3R3R3R38R4R4R4R4R47

S7S3616S8S55

R2R2R2R2R244S8S53

Acc!S32R1R1R1R1R11

12S8S50FT$id)(*

F → (T)4F → id3T → T*F2T → F1

Productions

Trace “(id)*id”

Shift S5Shift S8Reduce 3 F→id,pop 8, goto [5,F]=1Reduce 1 T→ F,pop 1, goto [5,T]=6Shift S7Reduce 4 F→ (T),pop 7 6 5, goto [0,F]=1Reduce 1 T → Fpop 1, goto [0,T]=2

( id ) * id $id ) * id $

) * id $

) * id $* id $

* id $

00 50 5 8

0 5 60 5 6 7

ActionInputStack

Trace “(id)*id”

Shift S5Shift S8Reduce 3 F→id,pop 8, goto [5,F]=1Reduce 1 T→ F,pop 1, goto [5,T]=6Shift S7Reduce 4 F→ (T),pop 7 6 5, goto [0,F]=1Reduce 1 T → Fpop 1, goto [0,T]=2

( id ) * id $id ) * id $

) * id $

) * id $* id $

* id $

00 50 5 8

0 5 60 5 6 7

ActionInputStack

S7S3616S8S55

R2R2R2R2R244S8S53

AS32R1R1R1R1R11

12S8S50FT$id)(*

F → (T)4F → id3T → T*F2T → F1

Productions

Trace “(id)*id”

Reduce 1 T→F,pop 1, goto [0,T]=2Shift S3Shift S8Reduce 3 F→id,pop 8, goto [3,F]=4Reduce 2 T→T * Fpop 4 3 2, goto [0,T]=2Accept

* id $

* id $id $

0 20 2 30 2 3 8

0 2 3 4

ActionInputStack

Trace “(id)*id”

Reduce 1 T→F,pop 1, goto [0,T]=2Shift S3Shift S8Reduce 3 F→id,pop 8, goto [3,F]=4Reduce 2 T→T * Fpop 4 3 2, goto [0,T]=2Accept

* id $

* id $id $

0 20 2 30 2 3 8

0 2 3 4

ActionInputStack

S7S3616S8S55

R2R2R2R2R244S8S53

AS32R1R1R1R1R11

12S8S50FT$id)(*

F → (T)4F → id3T → T*F2T → F1

Productions

Tracing LR: action[s, a]

• case shift u:– push state u– read new a

• case reduce r:– lookup production r: X → Y1..Yk;– pop k states, find state u– push goto[u, X]

• case accept: done• no entry in action table: error

Configuration set

• Each set is a parser state• Consider

T → T * • FF → • ( T )F → • id

• Like NFA-to-DFA conversion

Closure

Closure property:• If T → X1 … Xi • Xi+1 … Xn is in set, and

Xi+1 is a nonterminal, thenXi+1 → • Y1 … Ym is in the set as well forall productions Xi+1 → Y1 … Ym

• Compute as fixed point

Starting Configuration

• Augment Grammar with S’• Add production S’ → S• Initial configuration set is

closure(S’ → • S)

S’ → TT → F | T * FF → id | ( T )

Example: I = closure(S’ → • T)

S’ → • TT → • T * FT → • FF → • idF → • ( T )

Successor(I, X)

Informally: “move by symbol X”1. move dot to the right in all items where

dot is before X2. remove all other items

(viable prefixes only!)3. compute closure

Successor ExampleS’ → TT → F | T * FF → id | ( T )

I = {S’ → • T, T → • F, T → • T * F, F → • id, F → • ( T ) }

{ F → ( • T ), T → • F, T → • T * F,F → • id, F → • ( T ) }

Compute Successor(I, “(“)

Sets-of-Items Construction

Family of configuration setsfunction items(G’)

C = { closure({S’ → • S}) }; do foreach I ∈ C do foreach X ∈ (N ∪ T) do

C = C ∪ { Successor(I, X) }; while C changes;

0: S’ → • TT → • FT → • T * FF → • idF → • ( T )

1: T → F •

F → (T)4F → id3T → T*F2T → F1

Productions

2: S’ → T •T → T • * F

3: T → T * • FF → • idF → • ( T )

4: T → T * F •

5: F → ( • T )T → • FT → • T * FF → • idF → • ( T )

6: F → ( T • )T → T • * F

7: F → ( T ) • )

8: F → id •

$ Accept

Reduce 1

Reduce 2

Reduce 3

Reduce 4

0: S’ → • TT → • FT → • T * FF → • idF → • ( T )

1: T → F •

F → (T)4F → id3T → T*F2T → F1

Productions

2: S’ → T •T → T • * F

3: T → T * • FF → • idF → • ( T )

4: T → T * F •

5: F → ( • T )T → • FT → • T * FF → • idF → • ( T )

6: F → ( T • )T → T • * F

7: F → ( T ) • )

8: F → id •

$ Accept

Reduce 1

Reduce 2

Reduce 3

Reduce 4

S7S3616S8S55

R2R2R2R2R244S8S53

AS32R1R1R1R1R11

12S8S50FT$id)(*

LR(0) Construction

1. Construct F = {I0, I1, …In}2. a) if {A → α•} ∈ Ii and A != S

then action[i, _] := reduce A → α b) if {S’ → S•} ∈ Ii

then action[i,$] := accept c) if {A → α•aβ} ∈ Ii and Successor(Ii,a) = Ij

then action[i,a] := shift j3. if Successor(Ii,A) = Ij then goto[i,A] := j

LR(0) Construction (cont’d)

4. All entries not defined are errors5. Make sure I0 is the initial state

• Note: LR(0) always reduces if{A → α•} ∈ Ii, no lookahead

• Shift and reduce items can’t be in the sameconfiguration set

– Accepting state doesn’t count as reduce item• At most one reduce item per set

Set-of-items with Epsilon rulesS’ → •SS → •AaAbS → •BaBbA → ε•B → ε•

S → Aa•AbA → ε•

S → A•aAb

S → AaAbS → BaBbA → εB → ε S → B•aBb

S → Ba•BbB → ε•

S → AaA•bS → AaAb•

S → BaB•b

S → BaBb•

S’ → S•S

LR(0) conflicts:

Need more lookahead: SLR(1)

S’ → FF → id | ( T )F → id = T ;T → T * FT → id

5: F → id •F → id • = T

Shift/reduce conflict

2: F → id •T → id •

Reduce/Reduce conflict

SLR(1) : Simple LR(1) ParsingS’ → TT → F | T * F | C ( T )F → id | id ++ | ( T )C → id

0: S’ → • TT → • FT → • T * FT → • C (T)F → • idF → • id ++F → • ( T )C → • id

1: F → id • F → id • ++ C → id •

Follow(F) = ?{ *, ), $ }

Follow(C) = ?{ ( }

action[1,*]= action[1,)] = action[1,$] = Reduce F → idaction[1,(] = Reduce C → id

action[1,++] = Shift

SLR(1) Construction1. Construct F = {I0, I1, …In}2. a) if {A → α•} ∈ Ii and A != S’

then action[i, b] := reduce A → αfor all b ∈ Follow(A)

b) if {S’ → S•} ∈ Ii then action[i, $] := accept

c) if {A → α•aβ} ∈ Ii and Successor(Ii, a) = Ij then action[i, a] := shift j

3. if Successor(Ii, A) = Ij then goto[i, A] := j

SLR(1) Construction (cont’d)

• Note: SLR(1) only reduces{A → α•} if lookahead in Follow(A)

• Shift and reduce items or more than one reduceitem can be in the same configuration set aslong as lookaheads are disjoint

SLR(1) Conditions

• A grammar is SLR(1) if for eachconfiguration set:– For any item {A → α•xβ: x ∈ T} there is no {B → γ•: x ∈ Follow(B)}– For any two items {A → α•} and {B → β•}

Follow(A) ∩ Follow(B) = ∅

LR(0) Grammars ⊂ SLR(1) Grammars

Is this grammar SLR(1)?S’ → •SS → •AaAbS → •BaBbA → ε•B → ε•

S → Aa•AbA → ε•

S → A•aAb

S → AaAbS → BaBbA → εB → ε S → B•aBb

S → Ba•BbB → ε•

S → AaA•bS → AaAb•

S → BaB•b

S → BaBb•

S’ → S•S

SLR limitation: lack of context

0: S’ → • SS → • L = RS → • RL → • * RL → • idR → • L

S’ → SS → L = R | RL → *R | idR → L

1: L → id •

2: S → L • = RR → L •

L 3: S → L = • RR → • LL → • * RL → • id

Follow(R) = ?{ =, $ }

Input: id =id

Solution: Canonical LR(1)

• Extend definition of configuration– Remember lookahead

• New closure method• Extend definition of Successor

LR(1) Configurations

• [A → α•β, a] for a ∈ T is valid for a viableprefix δα if there is a rightmost derivation

S ⇒* δAη ⇒* δαβη and(η = aγ) or (η = ε and a = $)

• Notation: [A → α•β, a/b/c]– if [A → α•β, a], [A → α•β, b], [A → α•β, c]

are valid configurations

LR(1) Configurations

S → B BB → a B | b

• S ⇒*rm aaBab ⇒rm aaaBab• Item [B → a • B, a] is valid for viable

prefix aaa• S ⇒*rm BaB ⇒rm BaaB• Also, item [B → a • B, $] is valid for

viable prefix Baa

LR(1) Closure

Closure property:• If [A → α • Bβ, a] is in set, then

[B → • γ, b] is in set if b ∈ First(βa)• Compute as fixed point• Only include contextually valid lookaheads

to guide reducing to B

Starting Configuration

• Augment Grammar with S’ just like forLR(0), SLR(1)

• Initial configuration set isI = closure([S’ → • S, $])

Example: closure([S’ → • S, $])

[ S’ → • S, $][S → • L = R, $][S → • R, $][L → • * R, =][L → • id, =][R → • L, $][L → • *R, $][L → • id, $]

S’ → SS → L = R | RL → *R | idR → L

LR(1) Successor(C, X)

• Let I = [A → α • Bβ, a]• Successor(I, B) = closure([A → αB • β, a])

LR(1) Example: *id = id

0: S’ → • S, $S → • L = R, $S → • R, $L → • * R, =/$L → • id, =/$R → • L, $

1: L → id •, $/=

2: S → L • = R, $R → L •, $

3: S → L = • R, $R → • L, $L → • *R, $L → • id, $

4: L → id •, $

5: R → L •, $L

6: S → L = R•, $R7: S’ → S •, $

LR(1) Example: *id = id

0: S’ → • S, $S → • L = R, $S → • R, $L → • * R, =/$L → • id, =/$R → • L, $ 8: L → * • R, =/$

R → • L, =/$ L → • *R, =/$ L → • id, =/$

1: L → id •, =/$id

10: R → L •, =/$ L

11: L → *R •, =/$R

12: S → R•, $R

LR(1) Construction

1. Construct F = {I0, I1, …In}2. a) if [A → α•, a] ∈ Ii and A != S’

then action[i, a] := reduce A → α b) if [S’ → S•, $] ∈ Ii

then action[i, $] := accept c) if [A → α•aβ, b] ∈ Ii and Successor(Ii, a)=Ij

then action[i, a] := shift j3. if Successor(Ii, A) = Ij then goto[i, A] := j

LR(1) Construction (cont’d)

• Note: LR(1) only reduces using A → α for [A → α•, a] if a follows• LR(1) states remember context by virtue of

lookahead• Possibly many states!

– LALR(1) combines some states

LR(1) Conditions

• A grammar is LR(1) if for each configuration setholds:– For any item [A → α•xβ, a] with x ∈ T there is no

[B → γ•, x]– For any two complete items [A → γ•, a] and

[B → β•, b] it follows a and a != b.• Grammars:

– LR(0) ⊂ SLR(1) ⊂ LR(1) ⊂ LR(k)• Languages expressible by grammars:

– LR(0) ⊂ SLR(1) ⊂ LR(1) = LR(k)

Canonical LR(1) Recap

• LR(1) uses left context, current handle andlookahead to decide when to reduce or shift

• Most powerful parser so far• LALR(1) is practical simplification with

fewer states

Merging States in LALR(1)

• S’ → SS → XXX → aXX → b

• Same CoreSet

• Differentlookaheads

6: X → a • X, $X → • a X, $X → • b, $

3: X → a • X, a/bX → • a X, a/bX → • b, a/b

36: X → a • X, a/b/$ X → • a X, a/b/$ X → • b, a/b/$

R/R conflicts when merging

• B → dB → f X gX → …

• If R/R conflictsare introduced,grammar is notLALR(1)!

4: B → d •, gB → f X g •, c

2: B → d •, cB → f X g •, e

24: B → d •, c/g B → f X g •, c/e

LALR(1)

• LALR(1) Condition:– Merging in this way does not introduce reduce/reduce

conflicts– Shift/reduce can’t be introduced

• Merging brute force or step-by-step• More compact than canonical LR, like SLR(1)• More powerful than SLR(1)

– Not always merge to full Follow Set

S/R & ambiguous grammars

• Lx(k) Grammar vs. Language– Grammar is Lx(k) if it can be parsed by Lx(k) method

– according to criteria that is specific to the method.– A Lx(k) grammar may or may not exist for a language.

• Even if a given grammar is not LR(k),shift/reduce parser can sometimes handle them byaccounting for ambiguities– Example: ‘dangling’ else

• Preferring shift to reduce means matching inner ‘if’

Dangling ‘else’1. S → if E then S2. S → if E then S else S• Viable prefix “if E then if E then S”

– Then read else• Shift “else” (means go for 2)• Reduce (reduce using production #1)• NB: dangling else as written above is ambiguous

– NB: Ambiguity can be resolved, but there’s still noLR(k) grammar

Precedence & Associativity

• Consider E → E - E | E * E | id

id - id * id E - E• *

EReduce

id - id * id

E - E• *

id - id - id

E - E• -

Reduce

Precedence Relations

• Let A → w be a rule in the grammar• And b is a terminal• In some state q of the LR(1) parser there is

a shift-reduce conflict:– either reduce with A → w or shift on b

• Write down a rule, either:A → w, < b or A → w, > b

• A → w, < b means rule has less precedenceand so we shift if we see b in the lookahead

• A → w, > b means rule has higherprecedence and so we reduce if we see b inthe lookahead

• If there are multiple terminals with shift-reduce conflicts, then we list them all:A → w, > b, < c, > d

• Consider the grammarE → E + E | E * E | ( E ) | a

• Assume left-association so that E+E+E isinterpreted as (E+E)+E

• Assume multiplication has higherprecedence than addition

• Then we can write precedence rules/relns:E → E + E, > +, < *E → E * E, > +, > *

Precedence & Associativity

2:E → E * E • 1:E → E • + E 2:E → E • * E

1:E → E + E • 1:E → E • + E 2:E → E • * E

E → E + E, > +, < *E → E * E, > +, > *

Handling S/R & R/R Conflicts

• Have a conflict?– No? – Done, grammar is compliant.

• Already using most powerful parseravailable?– No? – Upgrade and goto 1

• Can the grammar be rearranged so that theconflict disappears?– While preserving the language!

Conflicts revisited (cont’d)

• Can the grammar be rearranged so that theconflict disappears?– No?

• Is the conflict S/R and does shift-to-reduce preference yielddesired result?

– Yes: Done. (Example: dangling else)• Else: Bad luck

– Yes: Is it worth it?• Yes, resolve conflict.• No: live with default or specified conflict resolution

(precedence, associativity)

Compiler (parser) compilers

• Rather than build a parser for a particulargrammar (e.g. recursive descent), writedown a grammar as a text file

• Run through a compiler compiler whichproduces a parser for that grammar

• The parser is a program that can becompiled and accepts input strings andproduces user-defined output

Compiler (parser) compilers

• For LR parsing, all it needs to do is produceaction/goto table– Yacc (yet another compiler compiler) was distributed

with Unix, the most popular tool. Uses LALR(1).– Many variants of yacc exist for many languages

• As we will see later, translation of the parse treeinto machine code (or anything else) can also bewritten down with the grammar

• Handling errors and interaction with the lexicalanalyzer have to be precisely defined

Parsing CFGs

• Consider the problem of parsing witharbitrary CFGs

• For any input string, the parser has toproduce a parse tree

• The simpler problem: print yes if the inputstring is generated by the grammar, printno otherwise

• This problem is called recognition

CKY Recognition Algorithm

• The Cocke-Kasami-Younger algorithm• As we shall see it runs in time that is

polynomial in the size of the input• It takes space polynomial in the size of the

input• Remarkable fact: it can find all possible

parse trees (exponentially many) inpolynomial time

Chomsky Normal Form

• Before we can see how CKY works, weneed to convert the input CFG intoChomsky Normal Form

• CNF means that the input CFG G isconverted to a new CFG G’ in which allrules are of the form:A → B CA → a

Epsilon Removal

• First step, remove epsilon rulesA → B CC → ε | C D | aD → b B → b

• After ε-removal:A → B | B C D | B aC → D | C D D | a D | C D | aD → b B → b

Removal of Chain Rules

• Second step, remove chain rulesA → B C | C D CC → D | aD → d B → b

• After removal of chain rules:A → B a | B D | a D a | a D D | D D a | D D DD → d B → b

Eliminate terminals from RHS

• Third step, remove terminals from the rhsof rulesA → B a C d

• After removal of terminals from the rhs:A → B N1 C N2N1 → aN2 → d

Binarize RHS with Nonterminals

• Fourth step, convert the rhs of each rule to havetwo non-terminalsA → B N1 C N2N1 → aN2 → d

• After converting to binary form:A → B N3 N1 → aN3 → N1 N4 N2 → dN4 → C N2

CKY algorithm

• We will consider the working of thealgorithm on an example CFG and inputstring

• Example CFG:S → A X | Y BX → A B | B A Y → B AA → a B → a

• Example input string: aaa

CKY Algorithm

A, BA → aB → a

X, YX → A B | B AY → B A

A, BA → aB → a

SS → A(0,1) X(1,3)

S → Y(0,2) B(2,3)

X, YX → A B | B AY → B A

A, BA → aB → a

0 1 2 3

Parse trees

CKY AlgorithmInput string input of size nCreate a 2D table chart of size n2

for i=0 to n-1chart[i][i+1] = A if there is a rule A → a and input[i]=a

for j=2 to Nfor i=j-2 downto 0

for k=i+1 to j-1chart[i][j] = A if there is a rule A → B C and

chart[i][k] = B and chart[k][j] = Creturn yes if chart[0][n] has the start symbolelse return no

CKY algorithm summary

• Parsing arbitrary CFGs• For the CKY algorithm, the time complexity is

O(|G|2 n3)• The space requirement is O(n2)• The CKY algorithm handles arbitrary ambiguous

CFGs• All ambiguous choices are stored in the chart• For compilers we consider parsing algorithms for

CFGs that do not handle ambiguous grammars

GLR – Generalized LR Parsing

• Works for any CFG (just like CKY algorithm)– Masaru Tomita [1986]

• If you have shift/reduce conflict, just clone yourstack and shift in one clone, reduce in the otherclone– proceed in lockstep– parser that get into error states die– merge parsers that lead to identical reductions (graph

structured stack)

Parsing - Summary

• Parsing arbitrary CFGs: O(n3) time complexity• Top-down vs. bottom-up• Lookahead: FIRST and FOLLOW sets• LL(1) – Parsing: O(n) time complexity

– recursive-descent and table-driven predictive parsing• LR(k) – Parsing : O(n) time complexity

– LR(0), SLR(1), LR(1), LALR(1)• Resolving shift/reduce conflicts

– using precedence, associativity

CMPT 755 Compilers - cs.sfu.caanoop/courses/CMPT-755-Fall-2004/parsing… · Predictive Top-Down...

Documents