+ All Categories
Home > Documents > CYK Parser

CYK Parser

Date post: 01-Jan-2016
Category:
Upload: dacey-buck
View: 47 times
Download: 2 times
Share this document with a friend
Description:
CYK Parser. Von Carla und Cornelia Kempa. Overview. C ocke Y ounger K asami -method. Recognition phase. Example grammar. Number(s)  Integer | Real Integer  Digit | Integer Digit Real  Integer Fraction Scale Fraction  . Integer Scale  e Sign Integer | Empty - PowerPoint PPT Presentation
Popular Tags:
53
CYK Parser Von Carla und Cornelia Kempa
Transcript
Page 1: CYK Parser

CYK Parser

Von Carla und Cornelia Kempa

Page 2: CYK Parser

Overview

Top-down Bottom-up

Non-directional methods

Unger Parser CYK Parser

Page 3: CYK Parser

Cocke Younger Kasami -method

Page 4: CYK Parser

Recognition phase

Page 5: CYK Parser

Example grammar

• Number(s) Integer | Real• Integer Digit | Integer Digit• Real Integer Fraction Scale

• Fraction . Integer

• Scale e Sign Integer | Empty• Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9• Empty ɛ• Sign + | -

Page 6: CYK Parser

Example Sentence: 32.5e+1

• 1. concentrate on the substrings of the input sentence

Page 7: CYK Parser

Building the recognition table

Page 8: CYK Parser
Page 9: CYK Parser
Page 10: CYK Parser
Page 11: CYK Parser
Page 12: CYK Parser
Page 13: CYK Parser
Page 14: CYK Parser
Page 15: CYK Parser
Page 16: CYK Parser

32.5e +1 is in the language

• What problems can we already see in this example?

Page 17: CYK Parser

Another complication: Ɛ- rules

Input : 43.1

Page 18: CYK Parser
Page 19: CYK Parser

The ɛ- Problem

Shortest substrings of any input sentence :

ɛ-substrings

We must compute Rɛ the set of non-terminals that derive ɛ

Rɛ = { Empty, Scale }

Page 20: CYK Parser

Non- empty substrings of the input sentence

• Input : z = z1 z2 z3 z4 ….zn

• Compute the set of Non-Terminals

that derive the substring of z starting at position i, of length l.

Page 21: CYK Parser

Terminology (also on the handout)

• i index we are starting at

• l length of this substring

• R s i,l set of Non-Terminals deriving the substring s i, l

• S i, 0 = ɛ

• Set of Non- Terminals that derive ɛ :

R s i,0 = R ɛ

Page 22: CYK Parser

S i, l =z i z i+1 …… z i+ l-1

Page 23: CYK Parser

The set of Non- Terminals deriving the substring s i, l : R s i, l

1.) substrings of length 0

S i, 0 = ɛ and R s i, l = R ɛ

2.) short substrings

3.) longer substrings (say l = j )

All the information on substrings with

l < j is available

Page 24: CYK Parser

Check each RH-side (Right-Hand -side) in

the grammar to see if it derives s i, l

• L A1 ….Am

S i, l

( divided into m segments (= possibly empty))

A1 first segment of s i, l

A2 second segment of s i, l

…. ….

Page 25: CYK Parser

A 1 ….Am s i,l

• So A1 first part of s i,l

(let´s say A1 has to derive a first part of

s i, l of length k)

A1 s i, k

A1 is in the set R s i,k

Page 26: CYK Parser

A 1 ….Am s i,l

• Assuming this A2…Am has to derive the rest:

A2 … Am Si+k, l-k

This is attempted for every k

Page 27: CYK Parser

Problems with this Approach

1) Consider A2…Am

m could be 1 and A1 a Non-terminal

We are Dealing with a unit- rule

A1 must derive the whole substring

s i, l and thus be a member of R s i, l

But that´s the set we are computing right

now …

Page 28: CYK Parser

Solution to this problem

• A1 s i, l

• Somewhere along the derivation there must be a first step not using a unit rule

A1 B … C * s i, l

C is the first Non-Terminal using a

non-unit-rule in the derivation

Page 29: CYK Parser

Solution cont.

At some stage C is added to Rs i, l

If we repeat the process again and again

At some point B will be added and in the next step A1 will be added

We have to repeat the process again and again until no new Non-Terminals are added to R s i,l

Page 30: CYK Parser

Problem 2

Ɛ-rulesConsider all but one of the At derive Ɛ

B A1 A2 A3 A4 A5 …. AtB and A1 - t are Non-TerminalsA2 – At derive ƐSo what stays is : B A1 A unit-rule

Page 31: CYK Parser

We have computed all the Rs i,l

• If S is a member of Rs 1, n the start symbol derives z (=s 1, n) (the input string)

Page 32: CYK Parser

CYK recognition with a grammar in ****- form:

• What are the Restrictions we want to have on our grammar ?

Page 33: CYK Parser

Useful Restrictions

• No ɛ- rules• No unit-rules• Limit the length of the right- hand side of each rule, say

to two

• What we get out of this:• A a• A BC

• Where a is a terminal and ABC are Non- Terminals

Page 34: CYK Parser

Chomsky-Normal-Form…

(… not only to annoy students )

• Perfect grammar for CYK

Page 35: CYK Parser

How CYK works for a grammar in CNF

• Rɛ is empty

• R s i, 1 can be read directly from the rules

(A a)

A rule A BC

can never derive a single terminal

Page 36: CYK Parser

Procedure

• Iteratively (as before) :• 1) Fill the sets R s, 1 directly• 2) Process all substrings of length 1 • 3) Process all substrings of length 2• 4) Process all substrings of length l• For the first step we use the rules of the form A

a• For all the following steps we have to use the

rules of the form: A BC

Page 37: CYK Parser

CYK and CNF

Question the CYK-Parser has to answear is:

Does such a k exist?

Page 38: CYK Parser

Answearing this question is easy:

• Just try all possibilities

• no problem since you are a computer ;-)

• Range : from 1 to (l-1)

• All the sets R s i,k and R s i+k , l-1

have already been computed at this point

Page 39: CYK Parser

Transform our sample CF-grammar into Chomsky Normal Form

• Overview

• 1) eliminate ɛ-rules

• 2) eliminate unit-rules

• 3) remove non-productive non-terminals

• 4) remove non –reachable non-terminals

• 5) modify the rest until all grammar rules are of the form A a , A BC

Page 40: CYK Parser

Our number grammar in CNF• Number(s) 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9• Number(s) Integer Digit• Number(s) N1 Scale´ | Integer Fraction• N1 Integer Fraction• Integer 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9• Integer Integer Digit• Fraction T1 Integer

• T1 .• Scale ´ N2 Integer• N2 T2 Sign • T2 e• Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9• Sign + | -

Page 41: CYK Parser

Building the recognition table

• Input :

Our example grammar in CNF

input sentence: 32.5 e + 1

Page 42: CYK Parser

Building the recognition table

• 1) bottom-row : read directly from the grammar (rules of the form A a )

• 2) Check each RHS in the grammar

Page 43: CYK Parser

Check each RHS of the grammar

• Two Ways: Example: 2.5 e ( = s 2, 4)

• 1) check each RHS e.g N1 Scale´

• 2) compute possible RH-Sides from the recognition table

Page 44: CYK Parser

How this is done

1) N1 not in R s 2, 1 or R s 2, 2N1 is a member of R s 2, 3But Scale´ is not a member of R s 5, 1

2) R s 2, 4 is the set of Non- Terminals that have a RHS AB where either:

A in R s 2, 1 and B in R s 3, 3A in R s 2, 2 and B in R s 4, 2A in R s 2, 3 and B in R s 5, 1Possible combinations: N1 T2 or Number T2In our grammar we do not have such a RHS, so nothing is

added to R s 2, 4.

Page 45: CYK Parser

Recognition table

Page 46: CYK Parser

Recognition table (well-formed substring table)

Page 47: CYK Parser

Computing R s i, l:follow the arrows V and W simultaneously

A BC ,

B a member of a set on the V arrow ,

C a member of a set on the W arrow

Page 48: CYK Parser

Comparison

• This process is much less complicated than the one we saw before

• Why?

Page 49: CYK Parser

Conclusion

» This process is much less complicated

• Reasons:

1) We do not have to repeat the process again and again until no new Non-Terminals are added to R s i,l

(The substrings we are dealing with

are really substrings and cannot be equal to the string we start with)

Page 50: CYK Parser

Reasons cont.

2) We only have to find one place where the substring must be split into two

A B C

Here !

Page 51: CYK Parser

Result of the algorithm we have seen so far:

• Complete collection of sets R s i, l

• These sets can be organized in a triangular table:

Page 52: CYK Parser

Cost of CYK - algorithm

• Operations dependent on n,

the number of input symbols:

• (n * ( n+1) ) / 2 substrings to be examined

• For each substring : n-1 different k-positions as the worst case

Page 53: CYK Parser

Cost of CYK – algorithm cont.

• All other operations are independent of n The algorithm works in a time at most

proportional to n ³

That´s far more efficient than exhaustive search (time exponential in the length of the input sentence)


Recommended