+ All Categories
Home > Documents > 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing...

2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing...

Date post: 22-Dec-2015
Category:
View: 215 times
Download: 0 times
Share this document with a friend
24
2/20/2008 Prof. Hilfinger CS164 Lecture 12 1 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger
Transcript
Page 1: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 1

Earley’s Algorithm: General Context-Free Parsing

Lecture 12P. N. Hilfinger

Page 2: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 2

Parsing General Context-Free Grammars

• Shift-reduce parsing can work for most practical applications.

• However, one must sometimes munge the grammar, though not as much as LL(1).

• Cannot handle ambiguity, nor situations where resolving ambiguities requires looking far ahead.

• Today, we’ll look at a method that can: Earley’s Algorithm.

• In fact, shift-reduce parsing is a highly optimized special case of this algorithm.

Page 3: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 3

Earley’s Algorithm: Basic Idea

• Scan tokens left-to-right.• At each point, keep track of all possible

subtrees that could include the current point in the input, based on everthing seen so far.

• At the end of the input, if there is a tree that is rooted at the start symbol, we’ve found a parse (possibly many).

Page 4: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 4

Some Notation

• If input is s=s1s2…sn then “position k’’ in the input is just after sk and before sk+1, with position 0 at the beginning and position n at the end.

• At each input position, k, compute a set of items, where each item has the form

A , mwhere A is a production and 0≤m≤k.

• Together, the items in the set describe all subtrees of possible parse trees that begin or end at position k or have a child that does.

Page 5: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 5

Meaning of an Item

• An item A , m at position k means:1. The input between positions m and k matches .2. Depending on what sk+1…sn

is, there might be a

subtree formed from production A in the (or a) parse tree for the entire string.

3. So when is empty, means that there is a possible handle for A that ends at k.

• So that leaves the problem of figuring out what items to put in each set.

Page 6: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 6

Example

• Grammar: E E + T E T T T * int T int

• Input:

0 int 1 + 2 int 3 * 4 int 5

• At position 0, we expect to see an E to our right, formed from one of E’s productions.

• Plus, since an E can start with a T, we won’t be surprised by a T formed from one of its productions.

Page 7: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 7

Example: Getting Started

E T, 0E E + T, 0

int0 1

and (since E can start with T), also add itemsfor T

+

T int, 0T T * int, 0

Start with items for startsymbol E

Page 8: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 8

Closure Items

• Whenever we have an item B A , j in item set m, it indicates that a substring producing A might start at this position.

• That’s what the item A , m means, so we also add those items (for each production A ) to item set m.

• These are called closure items.• Other items are kernel items.

Page 9: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 9

Example: Computing next item set

E T, 0E E + T, 0T int, 0T T * int, 0

int0 1

T int , 0

T T * int, 0

E T , 0

E E + T, 0

+

Page 10: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 10

Computing next item set

• For each item of the form A c , k in item set m, where c=sm+1 is the next input symbol, insert A c , k in item set m+1.

• For each complete item, A , k in item set m+1, and each item B A , j back in item set k, add item B A , j to item set m+1. (When creating a parse tree, the A in this new item will have have children , as denoted by dashed red arrows in our examples).

Page 11: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 11

Continuing the Example, Set 2

T int , 0

T T * int, 0

E T , 0

E E + T, 0

1+

2

E E + T, 0

T T * int, 2

T int, 2

closure items

int

Page 12: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 12

Continuing the Example, Set 3

2

E E + T, 0

T T * int, 2

T int, 2

int

T int , 2

T T * int, 2

E E + T , 0

3*

E E + T, 0from item set 0

Page 13: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 13

Continuing the Example, Sets 4 & 5

T int , 2

T T * int, 2

E E + T , 0

3*

E E + T, 0

T T * int, 2

4

T T * int , 2

5

int

T T * int, 2

E E + T , 0

E E + T, 0ACCEPT!

Page 14: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 14

Accepting the String

• In the last item set, have a completed item for the start symbol that started in set 0.

• That means “the input between 0 and end matches an entire production for the start symbol,” so the string parses correctly.

Page 15: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 15

Retrieving a Parse Tree or Derivation

• Start with a completed item in the last set that produces the whole input (has form S…,0 for start symbol S).

• Follow the red arrows to find how to expand that symbol.

• Work backwards through the sets to find the expansions of the other nonterminals.

Page 16: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 16

Getting a Tree from our Example (I)

T T * int , 2

5

int

T T * int, 2

E E + T , 0

E E + T, 0start here

E

E + T

T

* int

To find out howto expand this T,go back to chart3 (before * int)

Page 17: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 17

Getting a Tree from our Example (II)

int

T int , 2

T T * int, 2

E E + T , 0

3

E E + T, 0

E

E + T

T

* intint

To find out howto expand this E,go back to chart1 (before +)

Page 18: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 18

Figuring out Where to Look

• In the last slide, we had to figure out where to look for the derivation of the E in E + T

• We used the items

T T * int, 2 and T int , 2

to get the T in E + T, both of which tell us that the T started after item set #2.

• And since + is a terminal, we then have to go back one more.

Page 19: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 19

Getting a Tree from our Example (III)

E

E T

T

* intint

int 1

T int , 0

T T * int, 0

E T , 0

E E + T, 0

start here

T

+int

Page 20: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 20

An Ambiguous Grammar (I)

• Grammar: E E + E E E * E E int

• Input:

0 int 1 + 2 int 3 * 4 int 5

E int, 0E E + E, 0E E * E, 0

E int , 0E E + E, 0E E * E, 0

0 int 1

Page 21: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 21

An Ambiguous Grammar (II)

E int , 0E E + E, 0E E * E, 0

1 + 2 int 3

E E + E, 0E int, 2E E + E, 2E E * E, 2

E int , 2E E + E, 2E E * E, 2E E + E , 0E E + E, 0E E * E, 0

Page 22: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 22

An Ambiguous Grammar (III)

3 * 4 int 5

E int , 2E E + E, 2E E * E, 2E E + E , 0E E + E, 0E E * E, 0

E E * E, 2E E * E, 0E int, 4E E + E, 4E E * E, 4

E int , 4E E * E , 2E E * E , 0E E + E, 4E E * E, 4E E + E , 0

There are two ways to produce the E starting at 0, reflectingambiguity.

Page 23: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 23

Just for Fun…

E E E E

Grammar is ferociously ambiguous:produces an infinite number ofways!

E , 0

E E E, 0

E E E, 0

E E E , 0! ! !

0

Page 24: 2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.

2/20/2008 Prof. Hilfinger CS164 Lecture 12 24

Relationship to LR Shift-Reduce Parsing

• With an LR(1) grammar, never have item sets where two items have the same production, with the dot in the same place, but different starting positions.

• So, ignoring the starting positions, there is a finite number of possible item sets.

• These are the states in the shift-reduce parser.


Recommended