The CYK Parsing MethodChiyo Hotani
Tanya Petrova
CL2 Parsing Course28 November, 2007
Overview
CYK Recognition with CF grammar Basic Algorithm Problems: unit-rules, є-rules Recognition with a grammar in CNF
CYK Parsing with CNF Parsing with CNF Recognition Table
Chart Parsing Summary
Advantages and Disadvantages Other remarks
Basic Algorithm of CYK Recognition (1)
Example Grammar:
A grammar describing numbers in scientific notation
Input: 32.5e+1
derivations of substrings of length 1
Basic Algorithm of CYK Recognition (2)
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Sign -> + | -
NumberS -> Integer | Real
Integer -> Digit | Integer Digit
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
derivations of substrings of length 1
Unit Rule: rules of the form AB, where A and B are non-terminals. We can have chains of them in a derivation.
Basic Algorithm of CYK Recognition (3)
NumberS -> Integer | RealInteger -> Digit | Integer DigitFraction -> . IntegerScale -> e Sign Integer | Empty
Basic Algorithm of CYK Recognition (4)
NumberS -> Integer | RealReal -> Integer Fraction Scale
Number does indeed derive 32.5e+1.
Basic Algorithm of CYK Recognition (5)
є-rules
Basic Algorithm of CYK Recognition (6)
Rє = { Empty, Scale }
sentence: z = z1 z2 . . . zn
substring of z starting at positi
on i, of length l.
si,l = zizi+1. . . zi+l-1
Rsi,l: the set of non-terminals
deriving the substring si,l
A graphical presentation of substrings
Basic Algorithm of CYK Recognition (7)
CYK recognition with a grammar in CNF
Required restrictions: Eliminate є-rules and unit rulesLimit the maximum length of RHS of the
rule to 2CNF
No є-rules and unit rules all rules have one of the following two forms:
AaABC
Our example grammar in CNF
CYK Parsing with CNF
Building the recognition tableInput :
Our example grammar in CNF
input sentence: 32.5 e + 1
CYK Parsing with the CNF
bottom-row : read directly from the grammar (rules of the form A a )
Two Ways to Copmute a R s i,l:
check each right-hand side
compute possible right-hand sides from the recognition table
How this is done
Example: 2.5 e ( = s 2, 4)
1) N1 not in R s 2, 1 or R s 2, 2N1 is a member of R s 2, 3But Scale´ is not a member of R s 5, 1
2) R s 2, 4 is the set of Non- Terminals that have a right-hand side AB where either:
A in R s 2, 1 and B in R s 3, 3A in R s 2, 2 and B in R s 4, 2A in R s 2, 3 and B in R s 5, 1Possible combinations: N1 T2 or Number T2In our grammar we do not have such a right-
hand side, so nothing is added to R s 2, 4.
Recognition table
l
i
As a result we find out that:
This process is much less complicated than the one we saw before
Reasons
• We do not have to repeat the process again and again until no new Non-Terminals are added to R s i,l
(The substrings we are dealing with
are really substrings and cannot be equal to the string we start with)
• We only have to find one place where the substring must be split into two A B C
Here !
Chart Parsing
A chart is just a recognition table.
A short retrospective of CYK
First: recognition table using the original grammar.
Then: transforming grammar to CNF.
A short retrospective of CYK cont.
CNF is useful for improving the efficiency, but it is actually a bit too restrictive
Disadvantage of CNF: Resulting recognition table lacks the
information we need to construct a derivation using the original grammar!
A short retrospective of CYK cont.
In the transformation process, some non-terminals were thrown away
(non-productive)Missing information could be added.
A short retrospective of CYK cont.
Result: almost the same recognition table.Extra information on non-terminalsObtained in a simpler and much more
efficient way.
Thank you
for your attention!