+ All Categories
Home > Documents > The Parser

The Parser

Date post: 23-Mar-2016
Category:
Upload: audra
View: 89 times
Download: 6 times
Share this document with a friend
Description:
The Parser. Its job: Check and verify syntax based on specified syntax rules Report errors Build IR Good news the process can be automated. Parsing. The scanner recognizes words The parser recognizes syntactic units Can we use regular expressions? - PowerPoint PPT Presentation
32
1 The Parser Its job: Check and verify syntax based on specified syntax rules Report errors Build IR Good news the process can be automated
Transcript
Page 1: The Parser

1

The Parser• Its job:

– Check and verify syntax based on specified syntax rules

– Report errors– Build IR

• Good news– the process can be automated

Page 2: The Parser

2

Parsing• The scanner recognizes words • The parser recognizes syntactic units• Can we use regular expressions?

– No. How would we recognize L={pkqk} then?• We use Context-Free Grammars (CFGs)

to specify context-free syntax.• Coming up:

– CFGs– Top-down parsing– Bottom-up parsing

Page 3: The Parser

3

CFGs• A CFG describes how a sentence of a

language may be generated.• Example:

• Use this grammar to generate the sentence mwa ha ha ha

EvilLaugh mwa EvilCackleEvilCackle ha EvilCackleEvilCackle ha

Page 4: The Parser

4

CFGs• A CFG is a quadruple (N, T, R, S) where

– N is the set of non-terminal symbols– T is the set of terminal symbols– S N is the starting symbol– R N(NT)* is a set of rules

• Example: Consider G = (N, T, R, S) where – N = {S}– T ={ (, ) }– R ={ S (S) , SSS, S }

Page 5: The Parser

5

Derivations• The language described by a CFG is the set of

strings that can be derived from the start symbol using the rules of the grammar.

• At each step, we choose a non-terminal to replace.

S (S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( )))

derivation sentential form

This example demonstrates a leftmost derivation : one where we always expand the leftmost non-terminal in the sentential form.

Page 6: The Parser

6

Derivations and parse trees• We can describe a derivation using a

graphical representation called parse tree:– the root is labeled with the start symbol, S– each internal node is labeled with a non-

terminal– the children of an internal node A are the right-

hand side of a production A– each leaf is labeled with a terminal

• A parse tree has a unique leftmost and a unique rightmost derivation (however, we cannot tell which one was used by looking at the tree)

Page 7: The Parser

7

Derivations and parse trees• So, how can we use the grammar

described earlier to verify the syntax of "(( )((( ))))"?– We must try to find a derivation for that string.– We can work top-down (starting at the

root/start symbol) or bottom-up (starting at the leaves).

• Careful! – There may be more than one grammars to

describe the same language. – Not all grammars are suitable

Page 8: The Parser

8

Problems in parsing• Consider S if E then S else S | if E

then S– What is the parse tree for

if E1 then S1 else if E2 then S2 else S3

– There are two possible parse trees! This problem is called ambiguity

• A CFG is ambiguous if one or more terminal strings have multiple leftmost derivations from the start symbol.

Page 9: The Parser

9

Ambiguity• There is no general algorithm to tell

whether a CFG is ambiguous or not.• There is no standard procedure for

eliminating ambiguity.• Some languages are inherently

ambiguous.– In those cases, any grammar we come up

with will be ambiguous.

Page 10: The Parser

10

Ambiguity• In general, we try to eliminate

ambiguity by rewriting the grammar.• Example:

– EE+E | EE | id becomes:

EE+T | T TTF | F F id

Page 11: The Parser

11

Ambiguity• In general, we try to eliminate

ambiguity by rewriting the grammar.• Example:

– Sif E then S else S | if E then S | other becomes:

S EwithElse | EnoElseEwithElse if E then EwithElse else EwithElse | otherEnoElse if E then S

| if E then EwithElse else EnoElse

Page 12: The Parser

12

Top-down parsing• Main idea:

– Start at the root, grow towards leaves– Pick a production and try to match input– May need to backtrack– Some grammars are backtrack-free

• Example:– Use the expression grammar to parse x-2*y

Page 13: The Parser

13

Grammar problems• Because we try to generate a leftmost

derivation by scanning the input from left to right, grammars of the form A A x may cause endless recursion.

• Such grammars are called left-recursive and they must be transformed if we want to use a top-down parser.

Page 14: The Parser

14

Left recursion• A grammar is left recursive if for a non-

terminal A, there is a derivation A+ A

• There are three types of left recursion:– direct (A A x)– indirect (A B C, B A )– hidden (A B A, B )

Page 15: The Parser

15

Left recursion• To eliminate direct left recursion replace

A A1 | A2 | ... | Am | 1 | 2 | ... | n

with

A 1B | 2B | ... | nB B 1B | 2B | ... | mB |

Page 16: The Parser

16

Left recursion• How about this:

S EE E+TE TT E-TT id

There is direct recursion: EE+TThere is indirect recursion: TE+T, ET

Algorithm for eliminating indirect recursionList the nonterminals in some order A1, A2, ...,Anfor i=1 to n for j=1 to i-1 if there is a production AiAj, replace Aj with its rhs eliminate any direct left recursion on Ai

Page 17: The Parser

17

Eliminating indirect left recursion

S EE E+TE TT E-TT FF E*FF id

i=Sordering: S, E, T, FS EE E+TE TT E-TT FF E*FF id

i=ES EE TE'E'+TE'|T E-TT FF E*FF id

i=T, j=ES EE TE'E'+TE'|T TE'-TT FF E*FF id

S EE TE'E'+TE'|T FT'T' E'-TT'|F E*FF id

Page 18: The Parser

18

Eliminating indirect left recursioni=F, j=E

S EE TE'E'+TE'|T FT'T' E'-TT'|F TE'*FF id

i=F, j=TS EE TE'E'+TE'|T FT'T' E'-TT'|F FT'E'*FF id

S EE TE'E'+TE'|T FT'T' E'-TT'|F idF'F' T'E'*FF'|

Page 19: The Parser

19

Grammar problems• Consider S if E then S else S | if E

then S– Which of the two productions should we use

to expand non-terminal S when the next token is if?

– We can solve this problem by factoring out the common part in these rules. This way, we are postponing the decision about which rule to choose until we have more information (namely, whether there is an else or not).

– This is called left factoring

Page 20: The Parser

20

Left factoring

A 1 | 2 |...| n | becomes

A B| B 1 | 2 |...| n

Page 21: The Parser

21

Grammar problems• A symbol XV is useless if

– there is no derivation from X to any string in the language (non-terminating)

– there is no derivation from S that reaches a sentential form containing X (non-reachable)

• Reduced grammar = a grammar that does not contain any useless symbols.

Page 22: The Parser

22

Useless symbols• In order to remove useless symbols,

apply two algorithms:– First, remove all non-terminating symbols– Then, remove all non-reachable symbols.

• The order is important!– For example, consider S + X where

contains a non-terminating symbol. What will happen if we apply the algorithms in the wrong order?

• Concrete example: S AB | a, A a

Page 23: The Parser

23

Useless symbols• Example

Initial grammar:

S AB | CA

A a

B CB | AB

C cB | b

D aD | d

Algorithm 1 (terminating symbols):

A is in because of A a

C is in because of C b

D is in because of D d

S is in because A, C are in and S AC

Page 24: The Parser

24

Useless symbols• Example continued

After algorithm 1:

S CA

A a

C b

D aD | d

Algorithm 2 (reachable symbols):

S is in because it is the start symbol

C and A are in because S is in and S CA

Final grammar:

S CA

A a

C b

Page 25: The Parser

25

Predictive parsing• Basic idea:

– Given A | the parser should be able to choose between and

• How?– What if we do some "preprocessing" to

answer the question: Given a non-terminal A and lookahead t, which (if any) production of A is guaranteed to start with a t?

Page 26: The Parser

26

Predictive parsing• If we have two productions: A , we

want a distinct way of choosing the correct one.

• Define: – for G, x FIRST() iff * x

• If FIRST() and FIRST() contain no common symbols, we will know whether we should choose A or A by looking at the lookahead symbol.– Almost right...

Page 27: The Parser

27

Predictive parsing• Compute FIRST(X) as follows:

– if X is a terminal, then FIRST(X)={X}– if X is a production, then add to FIRST(X)– if X is a non-terminal and XY1Y2...Yn is a

production, add FIRST(Yi) to FIRST(X) if the preceding Yjs contain in their FIRSTs

Page 28: The Parser

28

Predictive parsing• What if we have a "candidate"

production A where = or *? • We could expand if we knew that there

is some sentential form where the current input symbol appears after A.

• Define: – for AN, xFOLLOW(A) iff S*Ax

Page 29: The Parser

29

Predictive parsing• Compute FOLLOW as follows:

– FOLLOW(S) contains EOF– For productions AB, everything in

FIRST() except goes into FOLLOW(B)– For productions AB or AB where

FIRST() contains , FOLLOW(B) contains everything that is in FOLLOW(A)

Page 30: The Parser

30

Predictive parsing• Armed with

– FIRST– FOLLOW

• we can build parser where no backtracking is required!

Page 31: The Parser

31

Predictive parsing (w/table)• For each production A do:

– For each terminal a FIRST() add A to entry M[A,a]

– If FIRST(), add A to entry M[A,b] for each terminal b FOLLOW(A).

– If FIRST() and EOFFOLLOW(A), add A to M[A,EOF]

• Use table and stack to simulate recursion.

Page 32: The Parser

32

Recursive Descent Parsing• Basic idea:

– Write a routine to recognize each lhs– This produces a parser with mutually

recursive routines.– Good for hand-coded parsers.


Recommended