+ All Categories
Home > Documents > 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations:...

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations:...

Date post: 14-Dec-2015
Category:
Upload: alex-mesman
View: 219 times
Download: 0 times
Share this document with a friend
23
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules Report errors Build IR Automation: The process can be automated
Transcript
Page 1: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

1

Parsing

The scanner recognizes words The parser recognizes syntactic units

Parser operations: Check and verify syntax based on specified syntax

rules Report errors Build IR

Automation: The process can be automated

Page 2: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

2

Parsing

Check and verify syntax based on specified syntax rules Are regular expressions sufficient for describing syntax?

Example 1: Infix expressions Example 2: Nested parentheses

We use Context-Free Grammars (CFGs) to specify context-free syntax.

A CFG describes how a sentence of a language may be generated.

Example:

Use this grammar to generate the sentence mwa ha ha ha!

EvilLaugh mwa EvilCackle EvilCackle ha EvilCackle EvilCackle ha!

Page 3: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

3

CFGs

A CFG is a quadruple (N, T, R, S) where N is the set of non-terminal symbols T is the set of terminal symbols S N is the starting symbol R N(NT)* is a set of rules

Example: The grammar of nested parenthesesG = (N, T, R, S) where N = {S} T ={ (, ) } R ={ S (S) , SSS, S }

Page 4: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

4

Derivations

The language described by a CFG is the set of strings that can be derived from the start symbol using the rules of the grammar.

At each step, we choose a non-terminal to replace.

S (S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( )))

derivation sentential form

This example demonstrates a leftmost derivation : one where we always expand the leftmost non-terminal in the sentential form.

Page 5: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

5

Derivations and parse trees

We can describe a derivation using a graphical representation called parse tree: the root is labeled with the start symbol, S each internal node is labeled with a non-terminal the children of an internal node A are the right-hand

side of a production A each leaf is labeled with a terminal

A parse tree has a unique leftmost and a unique rightmost derivation (however, we cannot tell which one was used by looking at the tree)

Page 6: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

6

Derivations and parse trees

So, how can we use the grammar described earlier to verify the syntax of "(( )((( ))))"? We must try to find a derivation for that string. We can work top-down (starting at the root/start

symbol) or bottom-up (starting at the leaves).

Careful! There may be more than one grammars to describe the

same language. Not all grammars are suitable

Page 7: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

7

Problems in parsing

Consider S if E then S else S | if E then S What is the parse tree for

if E then if E then S else S There are two possible parse trees! This problem is

called ambiguity

A CFG is ambiguous if one or more terminal strings have multiple leftmost derivations from the start symbol.

S

if E then S

S

if E then S

if E then S else S

if E then S else S

Page 8: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

8

Ambiguity

There is no general algorithm to tell whether a CFG is ambiguous or not.

There is no standard procedure for eliminating ambiguity.

Some languages are inherently ambiguous. In those cases, any grammar we come up with will

be ambiguous.

Page 9: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

9

Ambiguity

In general, we try to eliminate ambiguity by rewriting the grammar.

Example: EE+E | EE | id becomes:

EE+T | T TTF | F F id

Page 10: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

10

Ambiguity

In general, we try to eliminate ambiguity by rewriting the grammar.

Example: Sif E then S else S | if E then S | other

becomes:

S EwithElse | EnoElse

EwithElse if E then EwithElse else EwithElse | otherEnoElse if E then S

| if E then EwithElse else EnoElse

Page 11: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

11

Top-down parsing

Main idea: Start at the root, grow towards leaves Pick a production and try to match input May need to backtrack

Example: Use the expression grammar to parse x-2*y

Page 12: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

12

Grammar problems

Because we try to generate a leftmost derivation by scanning the input from left to right, grammars of the form A A x may cause endless recursion.

Such grammars are called left-recursive and they must be transformed if we want to use a top-down parser.

Page 13: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

13

Left recursion

A grammar is left recursive if for a non-terminal A, there is a derivation A+ A

There are three types of left recursion: direct (A A x) indirect (A B C, B A ) hidden (A B A, B )

Page 14: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

14

Left recursion

To eliminate direct left recursion replace

A A1 | A2 | ... | Am | 1 | 2 | ... | n

with

A 1B | 2B | ... | nB

B 1B | 2B | ... | mB |

Page 15: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

15

Left recursion

How about this:S EE E+TE TT E-TT id

There is direct recursion: EE+TThere is indirect recursion: TE+T, ET

Algorithm for eliminating indirect recursionList the nonterminals in some order A1, A2, ...,An

for i=1 to n for j=1 to i-1 if there is a production AiAj, replace Aj with its rhs eliminate any direct left recursion on Ai

Page 16: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

16

Eliminating indirect left recursion

S EE E+TE TT E-TT FF E*FF id

i=Sordering: S, E, T, F

S EE E+TE TT E-TT FF E*FF id

i=E

S EE TE'E'+TE'|T E-TT FF E*FF id

i=T, j=E

S EE TE'E'+TE'|T TE'-TT FF E*FF id

S EE TE'E'+TE'|T FT'T' E'-TT'|F E*FF id

Page 17: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

17

Eliminating indirect left recursioni=F, j=E

S EE TE'E'+TE'|T FT'T' E'-TT'|F TE'*FF id

i=F, j=T

S EE TE'E'+TE'|T FT'T' E'-TT'|F FT'E'*FF id

S EE TE'E'+TE'|T FT'T' E'-TT'|F idF'F' T'E'*FF'|

Page 18: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

18

Grammar problems

Consider S if E then S else S | if E then S Which of the two productions should we use to

expand non-terminal S when the next token is if? We can solve this problem by factoring out the

common part in these rules. This way, we are postponing the decision about which rule to choose until we have more information (namely, whether there is an else or not).

This is called left factoring

Page 19: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

19

Left factoring

A 1 | 2 |...| n |

becomes

A B| B 1 | 2 |...| n

Page 20: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

20

Grammar problems

A symbol XV is useless if there is no derivation from X to any string in the

language (non-terminating) there is no derivation from S that reaches a

sentential form containing X (non-reachable)

Reduced grammar = a grammar that does not contain any useless symbols.

Page 21: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

21

Useless symbols

In order to remove useless symbols, apply two algorithms: First, remove all non-terminating symbols Then, remove all non-reachable symbols.

The order is important! For example, consider S + X where contains

a non-terminating symbol. What will happen if we apply the algorithms in the wrong order?

Concrete example: S AB | a, A a

Page 22: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

22

Useless symbols

Example

Initial grammar:

S AB | CA

A a

B CB | AB

C cB | b

D aD | d

Algorithm 1 (terminating symbols):

A is in because of A a

C is in because of C b

D is in because of D d

S is in because A, C are in and S AC

Page 23: 1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

23

Useless symbols

Example continued

After algorithm 1:

S CA

A a

C b

D aD | d

Algorithm 2 (reachable symbols):

S is in because it is the start symbol

C and A are in because S is in and S CA

Final grammar:

S CA

A a

C b


Recommended