1 Syntax Analysis. 2 Introduction to parsers Context-free grammars Push-down automata Top-down...

Post on 24-Dec-2015

232 views 1 download

transcript

1

Syntax AnalysisSyntax Analysis

2

Syntax AnalysisSyntax Analysis

Introduction to parsersContext-free grammarsPush-down automataTop-down parsingButtom-up parsingBison - a parser generator

3

Introduction to parsersIntroduction to parsers

LexicalAnalyzer

Parser

SymbolTable

token

next token

source SemanticAnalyzer

syntaxtreecode

4

Context-Free GrammarsContext-Free Grammars

A set of terminals: basic symbols from which sentences are formed

A set of nonterminals: syntactic categories denoting sets of sentences

A set of productions: rules specifying how the terminals and nonterminals can be combined to form sentences

The start symbol: a distinguished nonterminal denoting the language

5

An ExampleAn Example

Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: expr, opProductions:

expr expr op expr expr ‘(’ expr ‘)’

expr ‘-’ expr expr id

op ‘+’ | ‘-’ | ‘*’ | ‘/’ The start symbol: expr

6

DerivationsDerivationsA derivation step is an application of a

production as a rewriting ruleE - E

A sequence of derivation stepsE - E - ( E ) - ( id )

is called a derivation of “- ( id )” from EThe symbol * denotes “derives in zero or

more steps”; the symbol + denotes “derives in one or more steps E * - ( id ) E + - ( id )

7

Context-Free LanguagesContext-Free Languages

A context-free language L(G) is the language defined by a context-free grammar G

A string of terminals is in L(G) if and only if S + , is called a sentence of G

If S * , where may contain nonterminals, then we call a sentential form of G

E - E - ( E ) - ( id ) G1 is equivalent to G2 if L(G1) = L(G2)

8

Left- & Right-most DerivationsLeft- & Right-most DerivationsEach derivation step needs to choose

– a nonterminal to rewrite– a production to apply

A leftmost derivation always chooses the leftmost nonterminal to rewrite

E lm - E lm - ( E ) lm - ( E + E ) lm - ( id + E ) lm - ( id + id )

A rightmost derivation always chooses the rightmost nonterminal to rewrite

E rm - E rm - ( E ) rm - ( E + E ) rm - (E + id ) rm - ( id + id )

9

Parse TreesParse Trees

A parse tree is a graphical representation for a derivation that filters out the order of choosing nonterminals for rewriting

Many derivations may correspond to the same parse tree, but every parse tree has associated with it a unique leftmost and a unique rightmost derivation

10

An ExampleAn Example

E

-

( )

+

id id

E

E E

E E lm - E lm - ( E ) lm - ( E + E )lm - ( id + E ) lm - ( id + id )

E rm - E rm - ( E ) rm - ( E + E )rm - ( E + id ) rm - ( id + id )

11

Ambiguous GrammarAmbiguous Grammar

A grammar is ambiguous if it produces more than one parse tree for some sentence

E E + E id + E id + E * E id + id * E id + id * id

E E * E E + E * E id + E * E id + id * E id + id * id

12

Ambiguous GrammarAmbiguous Grammar

E

+E E

id

id

*E E

id

E

*E E

id

id

+E E

id

13

Resolving AmbiguityResolving Ambiguity

Use disambiguiting rules to throw away

undesirable parse trees

Rewrite grammars by incorporating disa

mbiguiting rules into grammars

14

An ExampleAn Example

The dangling-else grammar stmt if expr then stmt | if expr then stmt else stmt

| other

Two parse trees forif E1 then if E2 then S1 else S2

15

An ExampleAn Example

S

elseE S Sif then

if E then S

elseE

S

S Sif then

if E then S

16

Disambiguiting RulesDisambiguiting Rules

Rule: match each else with the closest

previous unmatched then

Remove undesired state transitions in the

pushdown automaton

17

Grammar RewritingGrammar Rewriting

stmt m_stmt | unm_stmt

m_stmt if expr then m_stmt else m_stmt | other

unm_stmt if expr then stmt | if expr then m_stmt else unm_stmt

18

RE vs. CFGRE vs. CFG

Every language described by a RE can also be described by a CFG

Why use REs for lexical syntax?– do not need a notation as powerful as CFGs– are more concise and easier to understand than

CFGs– More efficient lexical analyzers can be constru

cted from REs than from CFGs– Provide a way for modularizing the front end i

nto two manageable-sized components

19

Push-Down AutomataPush-Down Automata

Finite Automaton

Input

OutputStack

$

$

20

An ExampleAn Example

S’ S $

S a S b

S

1 2 3start (a, $)

a(b, a)

a($, $)

(a, a)

a(b, a)

a

0

($, $)

21

Nonregular ConstructsNonregular Constructs

REs can denote only a fixed number of repetitions or an unspecified number of repetitions of one given construct:

an, a*

A nonregular construct:– L = {anbn | n 0}

22

Non-Context-Free ConstructsNon-Context-Free Constructs

CFGs can denote only a fixed number of repetitions or an unspecified number of repetitions of one or two given constructs

Some non-context-free constructs:– L1 = {wcw | w is in (a | b)*}

– L2 = {anbmcndm | n 1 and m 1}

– L3 = {anbncn | n 0}

23

共勉

大學之道︰ 在明明德,在親民,在止於至善。

-- 大學

24

Top-Down ParsingTop-Down ParsingConstruct a parse tree from the root to the

leaves using leftmost derivation

1. S c A B input: cad2. A a b 3. A a4. B d

S S

c A B1

S

c A B

a b

2S

c A B

a d

4S

c A B

a

3

backtrack

25

Predictive ParsingPredictive Parsing

A top-down parsing without backtracking– there is only one alternative production to choo

se at each derivation step

stmt if expr then stmt else stmt | while expr do stmt | begin stmt_list end

26

LL(LL(kk) Parsing) Parsing

The first L stands for scanning the input from left to right

The second L stands for producing a leftmost derivation

The k stands for the number of lookahead input symbols used to choose alternative productions at each derivation step

27

LL(1) ParsingLL(1) Parsing

Use one input symbol of lookahead

Recursive-descent parsing

Nonrecursive predictive parsing

28

An ExampleAn Example

LL(1): S a b e | c d e

LL(2): S a b e | a d e

29

Recursive Descent ParsingRecursive Descent Parsing

The parser consists of a set of (possibly recursive) procedures

Each procedure is associated with a nonterminal of the grammar that is responsible to derive the productions of that nonterminal

Each procedure should be able to choose a unique production to derive based on the current token

30

An ExampleAn Example

type simple | id | array [ simple ] of type

simple integer | char | num dotdot num

{integer, char, num}

31

Recursive Descent ParsingRecursive Descent Parsing

♥ For each terminal in the production, the terminal is matched with the current token

♥ For each nonterminal in the production, the procedure associated with the nonterminal is called

♥ The sequence of matchings and procedure calls in processing the input implicitly defines a parse tree for the input

32

An ExampleAn Example

type

array [ simple ] of type

dotdotnum num simple

integer

array [ num dotdot num ] of integer

33

An ExampleAn Example

procedure match(t : terminal);begin if lookahead = t then lookahead := nexttoken else errorend;

34

An ExampleAn Exampleprocedure type;begin if lookahead is in { integer, char, num } then simple else if lookahead = id then match(id) else if lookahead = array then begin match(array); match('['); simple; match(']'); match(of); type end else errorend;

35

An ExampleAn Example

procedure simple;begin if lookahead = integer then match(integer) else if lookahead = char then match(char) else if lookahead = num then begin match(num); match(dotdot); match(num) end else errorend;

36

First Sets

The first set of a string is the set of terminals that begin the strings derived from. If * , then is also in the first set of

.

37

First Sets

If X is terminal, then FIRST(X) is {X} If X is nonterminal and X is a production,

then add to FIRST(X) If X is nonterminal and X Y1 Y2 ... Yk is a pr

oduction, then add a to FIRST(X) if for some i, a is in FIRST(Yi) and is in all of FIRST(Y1), ..., FIRST(Yi-1). If is in FIRST(Yj) for all j, then add to FIRST(X)

38

An ExampleAn Example

E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

FIRST(F) = { (, id }FIRST(T') = { *, }, FIRST(T) = { (, id }FIRST(E') = { +, }, FIRST(E) = { (, id }

39

Follow Sets

The follow set of a nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form, namely,

S * A a

a is in the follow set of A.

40

Follow Sets

Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker

If there is a production A B , then everything in FIRST() except for is placed in FOLLOW(B)

If there is a production A B or A B where FIRST() contains , then everything in FOLLOW(A) is in FOLLOW(B)

41

An ExampleAn Example

E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }, FIRST(T') = { *, }FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }

42

Nonrecursive Predictive ParsingNonrecursive Predictive Parsing

Parsing driver

Parsing table

Input

OutputStack

43

Stack OperationsStack Operations

Match– when the top stack symbol is a terminal and it

matches the input token, pop the terminal and advance the input pointer

Expand– when the top stack symbol is a nonterminal, re

place this symbol by the right hand side of one of its productions (pop the nonterminal and push the right hand side of a production in reverse order)

44

An ExampleAn Example

type simple | id | array [ simple ] of type

simple integer | char | num dotdot num

45

An ExampleAn ExampleAction Stack InputE type array [ num dotdot num ] of integerM type of ] simple [ array array [ num dotdot num ] of integerM type of ] simple [ [ num dotdot num ] of integerE type of ] simple num dotdot num ] of integerM type of ] num dotdot num num dotdot num ] of integerM type of ] num dotdot dotdot num ] of integerM type of ] num num ] of integerM type of ] ] of integerM type of of integerE type integerE simple integerM integer integer

46

Parsing Driver

push $S onto the stack, where S is the start symbolset ip to point to the first symbol of w$;repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error else /* X is a nonterminal */

if M[X, a] = X Y1 Y2 ... Yk then pop X from the stack and push Yk ... Y2 Y1 onto the stack else erroruntil X = $ and a = $

47

Constructing Parsing TableConstructing Parsing Table

Input. Grammar G.

Output. Parsing Table M.

Method.

1. For each production A , do steps 2 and 3.

2. For each terminal a in FIRST( ), add A to M[A, a].

3. If is in FIRST( ), add A to M[A, b] for each

symbol b in FOLLOW(A).

4. Make each undefined entry of M be error.

48

An ExampleAn Example

id + * ( ) $E E TE' E TE'E' E' +TE' E' E' T T FT' T FT' T' T' T' *FT' T' T' F F id F (E)

FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }, FIRST(T') = { *, }FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }

49

An ExampleAn Example Stack Input Output$E id + id * id$ $E'T id + id * id$ E TE' $E'T'F id + id * id$ T FT' $E'T'id id + id * id$ F id$E'T' + id * id$$E' + id * id$ T' $E'T+ + id * id$ E' +TE' $E'T id * id$$E'T'F id * id$ T FT' $E'T'id id * id$ F id$E'T' * id$

$E'T'F* * id$ T' *FT' $E'T'F id$$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'

50

LL(1) GrammarsLL(1) Grammars

A grammar is an LL(1) grammar if its LL(1) parsing table has no multiply-defined entries

51

A Counter ExampleA Counter Example

S i E t S S' | a FOLLOW(S) = {$, e}S' e S | FOLLOW(S') = {$, e}E b FOLLOW(E) = {t}

a b e i t $S S a S i E t S S'S' S' e S S' S' E E b

52

LL(1) GrammarsLL(1) Grammars

A grammar G is LL(1) iff whenever A | are two distinct productions of G, the following conditions hold:– For no terminal a do both and derive strings be

ginning with a.– At most one of and can derive the empty string.– If * , then does not derive any string beginn

ing with a terminal in FOLLOW(A).

FIRST(α) FIRST(β) =

FIRST(α) FOLLOW(A) =

53

Left RecursionLeft Recursion

A grammar is left recursive if it has a nonterminal A such that A * A

A A | A R R R |

A

A

A

A

A R

RRR

54

Direct Left RecursionDirect Left Recursion

A A 1 | A 2 | ... | A m | 1 | 2 | ... | n

A 1 A' | 2 A' | ... | n A'

A' 1 A' | 2 A' | ... | m A' |

55

An ExampleAn Example

E E + T | TT T * F | FF ( E ) | id

E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

56

Indirect Left RecursionIndirect Left Recursion

S A a | bA A c | S d |

S A a S d a

A A c | A a d | b d |

S A a | bA b d A' | A'A' c A' | a d A' |

57

Indirect Left RecursionIndirect Left Recursion

Input. Grammar G with no cycles (derivations of the form A + A) or -production (productions of the form A ).Output. An equivalent grammar with no left recursion.1. Arrange the nonterminals in some order A1, A2, ..., An

2. for i := 1 to n do beginfor j := 1 to i - 1 do begin replace each production of the form Ai Aj

by the production Ai 1 | 2 | ... | k where Aj 1 | 2 | ... | k are all thecurrent Aj-productions;

endeliminate direct left recursion among Ai-productions

end

58

Left FactoringLeft Factoring

Two alternatives of a nonterminal A have a nontrivial common prefix if , and

A 1 | 2

A A'A' 1 | 2

59

An ExampleAn Example

S i E t S | i E t S e S | aE b

S i E t S S' | aS' e S | E b

60

Error RecoveryError Recovery

Panic mode: skip tokens until a token in a set of synchronizing tokens appears

1. If a terminal on stack cannot be matched, pop the terminal

2. use FOLLOW(A) as sync set for A (pop A)

3. use the first set of a higher construct as sync set for A

4. use FIRST(A) as sync set for A

5. use the production deriving as the default for A

61

An ExampleAn ExampleE T E'E' + T E' | T F T'T' * F T' | F ( E ) | id

FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }FIRST(T') = { *, }FOLLOW(E) = FOLLOW(E') = { ), $ }FOLLOW(T) = FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }

62

An ExampleAn Example

id + * ( ) $E E TE' E TE' sync2 sync2

E' E' +TE' E' E' T T FT' sync2 T FT' sync2 sync2

T' T' T' *FT' T' T' F F id sync2 sync2 F (E) sync2 sync2

63

An ExampleAn Example Stack Input Output$E ) id * + id$ error, skip )$E id * + id$ $E'T id * + id$ E TE' $E'T'F id * + id$ T FT' $E'T'id id * + id$ F id$E'T' * + id$$E'T'F* * + id$ T' *FT' $E'T'F + id$ error$E'T' + id$ F has been poped$E' + id$$E'T+ + id$ E' +TE' $E'T id$$E'T'F id$ T FT'$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'

64

共勉

樊遲問仁。子曰︰愛人。

子曰︰人之過也,各於其黨, 觀過,斯知仁矣。 -- 論語

人生的目的在追尋快樂。 -- 達賴喇嘛

65

Bottom-Up ParsingBottom-Up Parsing

Construct a parse tree from the leaves to the root using rightmost derivation in reverse

S a A B e input: abbcdeA A b c | bB d

ca d eb

A

b

A

ca d eb

A

b

BA

ca d eb

A

b

S

BA

ca d eb

A

bca d ebb

abbcde rm aAbcde rm aAde rm aABe rm S

66

HandlesHandles

A handle of a right-sentential form consists of– a production A – a position of where can be replaced by A to

produce the previous right-sentential form in a rightmost derivation of

abbcde rm aAbcde rm aAde rm aABe rm S

A b A A b c B d S a A B e

67

Handle PruningHandle Pruning

rm A rm S

S

A

The string to the right of the handle contains only terminals A is the bottommost leftmost interior node with all its children in the tree

68

An ExampleAn Example

S

S

BA

ca d eb

A

b

S

BA

ca d eb

A

S

BA

a d e

S

BA

a e

69

Shift-Reduce ParsingShift-Reduce Parsing

Parsing driver

Parsing table

Input

Output

Stack

Handle

$

$

70

Stack OperationsStack Operations

Shift: shift the next input symbol onto the top of the stack

Reduce: replace the handle at the top of the stack with the corresponding nonterminal

Accept: announce successful completion of the parsing

Error: call an error recovery routine

71

An ExampleAn Example

Action Stack InputS $ a b b c d e $S $ a b b c d e $R $ a b b c d e $S $ a A b c d e $S $ a A b c d e $R $ a A b c d e $S $ a A d e $R $ a A d e $S $ a A B e $R $ a A B e $A $ S $

72

Shift/Reduce ConflictShift/Reduce Conflict

stmt if expr then stmt | if expr then stmt else stmt | other

Stack Input$ - - - if expr then stmt else - - - $

Shift if expr then stmt else stmt Reduce if expr then stmt

73

Reduce/Reduce ConflictReduce/Reduce Conflictstmt id ( para_list ) stmt expr := expr para_list para_list , parapara_list parapara idexpr id ( expr_list ) expr idexpr_list expr_list , exprexpr_list expr

Stack Input$ - - - id ( id , id ) - - - $

$- - - procid ( id , id ) - - - $

74

LR(k) ParsingLR(k) Parsing

The L stands for scanning the input from left to right

The R stands for constructing a rightmost derivation in reverse

The k stands for the number of lookahead input symbols used to make parsing decisions

75

LR ParsingLR Parsing

The LR parsing algorithm

Constructing SLR(1) parsing tables

Constructing LR(1) parsing tables

Constructing LALR(1) parsing tables

76

Model of an LR ParserModel of an LR Parser

Parsing driver

Input

Output

Stack

Action Goto

Sm

Sm-1

Xm-1

Xm

S0Parsing table

$

$

77

An ExampleAn Example

(1) E E + T (2) E T(3) T T * F (4) T F(5) F ( E ) (6) F id

State Action Goto id + * ( ) $ E T F0 s5 s4 1 2 31 s6 acc2 r2 s7 r2 r23 r4 r4 r4 r44 s5 s4 8 2 35 r6 r6 r6 r66 s5 s4 9 37 s5 s4 108 s6 s119 r1 s7 r1 r110 r3 r3 r3 r311 r5 r5 r5 r5

78

An ExampleAn ExampleAction Stack Inputs5 $0 id + id * id $r6 $0 id5 + id * id $r4 $0 F3 + id * id $r2 $0 T2 + id * id $s6 $0 E1 + id * id $s5 $0 E1 +6 id * id $r6 $0 E1 +6 id5 * id $r4 $0 E1 +6 F3 * id $s7 $0 E1 +6 T9 * id $s5 $0 E1 +6 T9 *7 id $r6 $0 E1 +6 T9 *7 id5 $r3 $0 E1 +6 T9 *7 F10 $r1 $0 E1 +6 T9 $acc $0 E1 $

79

LR Parsing DriverLR Parsing Driver

push $s0 onto the stack, where s0 is the initial stateset ip to point to the first symbol of w$;repeat let s be the top state on the stack and a the symbol pointed to by ip; if action[s, a] == shift s’ then push a and s’ onto the stack and advance ip

else if action[s, a] == reduce A then pop 2 * | | symbols off the stack; s’ = goto[top(), A]; push a and s’ onto the stack and advance ip else if action[s, a] == accept then return else erroruntil false

80

LR(0) ItemsLR(0) Items• An LR(0) item of a grammar in G is a

production of G with a dot at some position of the right-hand side, A

• The production A X Y Z yields the following four LR(0) items

A • X Y Z, A X • Y Z, A X Y • Z, A X Y Z •

• An LR(0) item represents a state in an NPDA indicating how much of a production we have seen at a given point in the parsing process

81

From CFG to NPDAFrom CFG to NPDA

• The state A B will go to the state B via an edge of the empty string

• The state A a will go to the state A a via an edge of terminal a (a shifting)

• The state A will cause a reduction on seeing a terminal in FOLLOW(A)

• The state A B will go to the state A B via an edge of nonterminal B (after a reduction)

82

An ExampleAn Example

1. E’ E2. E E + T 3. E T4. T T * F 5. T F6. F ( E ) 7. F id

Augmented grammar: Easier to identify the accepting state

83

An ExampleAn Example

E’•E

0

E

E’E•

7

T

ET•

9

FTF•

11

E•E+T

1

E•T

2

T•T*F

3

T•F

4

EE•+T

8E

EE+•T

14+

17

EE+T•T

F•(E)

5

F•id

6 Fid•

13id

F(•E)

12(

TT•*F

10

TTT*•F*

15

TT*F•F

18

E F(E•)

16)

F(E)•

19

6

7

2

3

4

5

84

From NPDA to DPDAFrom NPDA to DPDA

• There are two functions performed on sets of LR(0) items (DPDA states)

• The function closure(I) adds more items to I when there is a dot to the left of a nonterminal (corresponding to edges)

• The function goto(I, X) moves the dot past the symbol X in all items in I that contain X (corresponding to non- edges)

85

The Closure FunctionThe Closure Function

function closure(I);begin J := I; repeat for each item A B in J and each production B of G such that B is not in J do

J = J { B } until no more items can be added to J; return Jend

86

An ExampleAn Example

1. E’ E2. E E + T 3. E T4. T T * F 5. T F6. F ( E ) 7. F id

s0 = E’ E,I0 = closure({s0 }) = { E’ E, E E + T, E T, T T * F, T F, F ( E ), F id }

87

The Goto FunctionThe Goto Function

function goto(I, X);begin set J to the empty set for any item A X in I do add A X to J return closure(J)end

88

An ExampleAn Example

I0 = {E’ E, E E + T, E T, T T * F, T F, F ( E ), F id }

goto(I0 , E) = closure({E’ E , E E + T })= {E’ E , E E + T }

89

Subset ConstructionSubset Construction

function items(G’);begin C := {closure({S’ S})} repeat for each set of items I in C and each symbol X do J := goto(I, X) if J is not empty and not in C then C = C { J } until no more sets of items can be added to C return Cend

90

An ExampleAn Example

1. E’ E2. E E + T 3. E T4. T T * F 5. T F6. F ( E ) 7. F id

91

I0 : E’ E E E + T E T T T * F T F F ( E ) F idgoto(I0, E) =I1 : E’ E E E + Tgoto(I0, T) =I2 : E T T T * Fgoto(I0, F) =I3 : T F

goto(I0, ‘(’) =I4 : F ( E ) E E + T E T T T * F T F F ( E ) F idgoto(I0, id) =I5 : F id goto(I1, ‘+’) =I6 : E E + T T T * F T F F ( E ) F id

goto(I2, ‘*’) =I7 : T T * F F ( E ) F idgoto(I4, E) =I8 : F ( E ) E E + Tgoto(I6, T) =I9 : E E + T T T * Fgoto(I7, F) =I10 : T T * F goto(I8, ‘)’) =I11 : F ( E )

92

An ExampleAn Example

E’ • E E • E + TE • TT • T * FT • FF • ( E )F • id

E’ E • E E • + T

E T •T T • * F

E T

T F •

F

F ( • E )E • E + TE • TT • T * FT • FF • ( E )F • id

F id • id

(

T T * • FF • ( E )F • id*

E E + • TT • T * FT • FF • ( E )F • id

+

F ( E • )E E • + T

F T T * F •

E E + T •T T • * FT

F ( E ) •

)

0

1

2

3

4

5

6

7

8

9

10

11

(id *

+

id

ET

F

F(

(id

93

SLR(1) Parsing Table GenerationSLR(1) Parsing Table Generation

procedure SLR(G’);begin for each state I in items(G’) do begin if A a in I and goto(I, a) = J for a terminal a then action[I, a] = “shift J” if A in I and A S’ then action[I, a] = “reduce A ” for all a in Follow(A) if S’ S in I then action[I, $] = “accept” if A X in I and goto(I, X) = J for a nonterminal X then goto[I, X] = J end all other entries in action and goto are made errorend

94

An ExampleAn Example

+ * ( ) id $ E T F 0 s4 s5 1 2 3 1 s6 a 2 r3 s7 r3 r3 3 r5 r5 r5 r5 4 s4 s5 8 2 3 5 r7 r7 r7 r7 6 s4 s5 9 3 7 s4 s5 10 8 s6 s11 9 r2 s7 r2 r2 10 r4 r4 r4 r4 11 r6 r6 r6 r6

95

共勉

子曰︰唯仁者,能好人,能惡人。

子曰︰茍志於仁矣,無惡也。-- 論語

子曰︰志士仁人,無求生以害仁, 有殺身以成仁。

96

LR(1) ItemsLR(1) Items

• An LR(1) item of a grammar in G is a pair, ( A , a ), of an LR(0) item A and a lookahead symbol a

• The lookahead has no effect in an LR(1) item of the form ( A , a ), where is not

• An LR(1) item of the form ( A , a ) calls for a reduction by A only if the next input symbol is a

97

The Closure FunctionThe Closure Functionfunction closure(I);begin J := I; repeat for each item (A B , a) in J and each production B of G and each b FIRST( a) such that (B , b) is not in J do

J = J { (B , b) } until no more items can be added to J; return Jend

98

The Goto FunctionThe Goto Function

function goto(I, X);begin set J to the empty set for any item (A X , a) in I do add (A X , a) to J return closure(J)end

99

Subset ConstructionSubset Construction

function items(G’);begin C := {closure({S’ S, $})} repeat for each set of items I in C and each symbol X do J := goto(I, X) if J is not empty and not in C then C = C { J } until no more sets of items can be added to C return Cend

100

An ExampleAn Example

1. S’ S 2. S C C 3. C c C4. C d

101

An ExampleAn Example

I0: closure({(S’ S, $)}) = (S’ S, $) (S C C, $) (C c C, c/d) (C d, c/d)

I1: goto(I0, S) = (S’ S , $)

I2: goto(I0, C) = (S C C, $) (C c C, $) (C d, $)

I3: goto(I0, c) = (C c C, c/d) (C c C, c/d) (C d, c/d)

I4: goto(I0, d) = (C d , c/d)

I5: goto(I2, C) = (S C C , $)

102

An ExampleAn Example

I6: goto(I2, c) = (C c C, $) (C c C, $) (C d, $)

I7: goto(I2, d) = (C d , $)

I8: goto(I3, C) = (C c C , c/d)

: goto(I3, c) = I3

: goto(I3, d) = I4

I9: goto(I6, C) = (C c C , $)

: goto(I6, c) = I6

: goto(I6, d) = I7

103

LR(1) Parsing Table GenerationLR(1) Parsing Table Generation

procedure LR(G’);begin for each state I in items(G’) do begin if (A a , b) in I and goto(I, a) = J for a terminal a then action[I, a] = “shift J” if (A , a) in I and A S’ then action[I, a] = “reduce A ” if (S’ S , $) in I then action[I, $] = “accept” if (A X , a) in I and goto(I, X) = J for a nonterminal X then goto[I, X] = J end all other entries in action and goto are made errorend

104

An ExampleAn Example

c d $ S C 0 s3 s4 1 2 1 a 2 s6 s7 5 3 s3 s4 8 4 r4 r4 5 r2 6 s6 s7 9 7 r4 8 r3 r3 9 r3

105

The Core of LR(1) ItemsThe Core of LR(1) Items

• The core of a set of LR(1) Items is the set of their first components (i.e., LR(0) items)

• The core of the set of LR(1) items{ (C c C, c/d),

(C c C, c/d), (C d, c/d) }

is {C c C, C c C, C d }

106

Merging CoresMerging Cores

I3: { (C c C, c/d) (C c C, c/d) (C d, c/d) }

I4: { (C d , c/d) }

I8: { (C c C , c/d) }

I6: { (C c C, $) (C c C, $) (C d, $) }

I7: { (C d , $) }

I9: { (C c C , $) }

107

LALR(1) LALR(1) ParsingParsing Table Generation Table Generation

procedure LALR(G’);begin for each state I in mergeCore(items(G’)) do begin if (A a , b) in I and goto(I, a) = J for a terminal a then action[I, a] = “shift J” if (A , a) in I and A S’ then action[I, a] = “reduce A ” if (S’ S , $) in I then action[I, $] = “accept” if (A X , a) in I and goto(I, X) = J for a nonterminal X then goto[I, X] = J end all other entries in action and goto are made errorend

108

An ExampleAn Example

c d $ S C 0 s36 s47 1 2 1 a 2 s36 s47 5 36 s36 s47 89 47 r4 r4 r4 5 r2 89 r3 r3 r3

109

LR GrammarsLR Grammars

• A grammar is SLR(1) iff its SLR(1) parsing table has no multiply-defined entries

• A grammar is LR(1) iff its LR(1) parsing table has no multiply-defined entries

• A grammar is LALR(1) iff its LALR(1) parsing table has no multiply-defined entries

110

Hierarchy of Grammar ClassesHierarchy of Grammar Classes

Unambiguous Grammars Ambiguous Grammars

LL(k) LR(k)

LR(1)

LALR(1)

LL(1) SLR(1)

111

Hierarchy of Grammar ClassesHierarchy of Grammar Classes

• Why LL(k) LR(k)?

• Why SLR(k) LALR(k) LR(k)?

112

LL(k) vs. LR(k)LL(k) vs. LR(k)• For a grammar to be LL(k), we must be able t

o recognize the use of a production by seeing only the first k symbols of what its right-hand side derives

• For a grammar to be LR(k), we must be able to recognize the use of a production by having seen all of what is derived from its right-hand side with k more symbols of lookahead

113

LALR(k) vs. LR(k)LALR(k) vs. LR(k)

• The merge of the sets of LR(1) items having the same core does not introduce shift/reduce conflicts

• Suppose there is a shift-reduce conflict on lookahead a in the merged set because of

1. (A , a) 2. (B a , b)• Then some set of items has item (A , a) , and

since the cores of all sets merged are the same, it must have an item (B a , c) for some c

• But then this set has the same shift/reduce conflict on a

114

LALR(k) vs. LR(k)LALR(k) vs. LR(k)• The merge of the sets of LR(1) items having the sam

e core may introduce reduce/reduce conflicts• As an example, consider the grammar

1. S’ S 2. S a A d | a B e | b A e | b B d 3. A c 4. B c

that generates acd, ace, bce, bcd• The set {(A c , d), (B c , e)} is valid for acx• The set {(A c , e), (B c , d)} is valid for bcx• But the union {(A c , d/e), (B c , d/e)} genera

tes a reduce/reduce conflict

115

SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)

1. S’ S 2. S L = R3. S R 4. L * R5. L id6. R L

116

SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)

I0: closure({S’ S}) = S’ S S L = R S R L * R L id R L

I1: goto(I0, S) = S’ S

I2: goto(I0, L) = S L = R R L

I3: goto(I0, R) = S R I4: goto(I0, *) = L * R R L L * R L id

I5: goto(I0, id) = L id

FOLLOW(R) = {=, $}

117

SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)

I6: goto(I2, =) = S L = R R L L * R L id

I7: goto(I4, R) = L * R

I8: goto(I4, L) = R L

I9: goto(I6, R) = S L = R

118

SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)

I0: closure({(S’ S, $)}) = (S’ S, $) (S L = R, $) (S R, $) (L * R, =/$) (L id, =/$) (R L, $)

I1: goto(I0, S) = (S’ S , $)

I2: goto(I0, L) = (S L = R, $) (R L , $)

I3: goto(I0, R) = (S R , $) I4: goto(I0, *) = (L * R, =/$) (R L, =/$) (L * R, =/$) (L id, =/$)

I5: goto(I0, id) = (L id , =/$)

119

SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)I6: goto(I2, =) = (S L = R, $) (R L, $) (L * R, $) (L id, $)

I7: goto(I4, R) = (L * R , =/$)

I8: goto(I4, L) = (R L , =/$)

I9: goto(I6, R) = (S L = R , $)

I10: goto(I6, L) = (R L , $)

I11: goto(I6, *) = (L * R, $) (R L, $) (L * R, $) (L id, $)

I12: goto(I6, id) = (L id , $)

I13: goto(I11, R) = (L * R , $)

I4

I5

120

Bison – A Parser GeneratorBison – A Parser Generator

Bison compiler

C compiler

a.out

lang.ylang.tab.clang.tab.h (-d option)

lang.tab.c a.out

tokens syntax tree

A langauge for specifying parsers and semantic analyzers

121

Bison ProgramsBison Programs

%{C declarations%}Bison declarations%%Grammar rules%%Additional C code

122

An ExampleAn Example

line expr ‘\n’expr expr ‘+’ term | termterm term ‘*’ factor | factorfactor ‘(’ expr ‘)’ | DIGIT

123

An ExampleAn Example

%token DIGIT%start line%%line : expr ‘\n’ {printf(“line: expr \\n\n”);} ;expr: expr ‘+’ term {printf(“expr: expr + term\n”);} | term {printf(“expr: term\n”} ;term: term ‘*’ factor {printf(“term: term * factor\n”;} | factor {printf(“term: factor\n”);} ;factor: ‘(’ expr ‘)’ {printf(“factor: ( expr )\n”);} | DIGIT {printf(“factor: DIGIT\n”);} ;

124

Functions and VariablesFunctions and Variables

• yyparse(): the parser function• yylex(): the lexical analyzer function. Bison

recognizes any non-positive value as indicating the end of the input

• yylval: the attribute value of a token. Its default type is int, and can be declared to be multiple types in the first section using

%union {int ival;double dval;

}

125

Conflict ResolutionsConflict Resolutions

• A reduce/reduce conflict is resolved by choosing the production listed first

• A shift/reduce conflict is resolved in favor of shift

• A mechanism for assigning precedences and assocoativities to terminals

126

Precedence and AssociativityPrecedence and Associativity

• The precedence and associativity of operators are declared simultaneously

%nonassoc ‘<’ /* lowest */ %left ‘+’ ‘-’

%right ‘^’ /* highest */• The precedence of a rule is determined by the prec

edence of its rightmost terminal• The precedence of a rule can be modified by addin

g %prec <terminal> to its right end

127

An ExampleAn Example

%{#include <stdio.h>%}

%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS

%%

128

An ExampleAn Example

line : expr ‘\n’ ;expr: expr ‘+’ expr | expr ‘-’ expr | expr ‘*’ expr | expr ‘/’ expr | ‘-’ expr %prec UMINUS | ‘(’ expr ‘)’ | NUMBER ;

129

Error RecoveryError Recovery

• Error recovery is performed via error productions

• An error production is a production containing the predefined terminal error

• After adding an error production, A B | error

on encountering an error in the middle of B, the parser pops symbols from its stack until , shifts error, and skips input tokens until a token in FIRST()

130

Error RecoveryError Recovery

• The parser can report a syntax error by calling the user provided function yyerror(char *)

• The parser will suppress the report of another error message for 3 tokens

• You can resume error report immediately by using the macro yyerrok

• Error productions are used for major nonterminals

131

An ExampleAn Example

line : expr ‘\n’ | error ‘\n’ {yyerror("reenter last line:");

yyerrok;} ;expr: expr ‘+’ expr | expr ‘*’ expr | ‘-’ expr %prec UMINUS | ‘(’ expr ‘)’ | NUMBER ;

132

共勉

子曰︰里仁為美。擇不處仁,焉得知?

子曰︰朝聞道,夕死可矣!-- 論語

子曰︰不仁者不可以久處約, 不可以長處樂。仁者安仁,知者利仁。