Natural Language Processing
Lecture 12: Context-Free Parsing
Levels of Linguistic Representation
discourse
semantcs
pragmatcs
lexemes
morphology
orthographyphonology
phonetcs
speech text
anal
ysis
gen
erat
on
most of this class
syntax
Context-Free Grammars
• Using grammars Recogniton Parsing
• Parsing algorithms Top down Bottom up
• CNF• CKY Algorithm
• Cocke-Younger-Kasami
Parsing vs Word Matching
• Consider• The student who was taught by Alan Black won the prize
• Who won the prize?• String matching
"Alan Black won the prize.”
• Parsing based• ((The student (who was taught by Alan Black))
won the prize)• “The student won the prize”
Context-Free Grammars
• Vocabulary of terminal symbols, Σ• Set of nonterminal symbols (a.k.a. variables), N• Special start symbol S N∈• Producton rules of the form X → α where
X N∈α (N Σ)*∈ ∪
Two Related Problems
• Input: sentence w = (w1, ..., wn) and CFG G• Output (recogniton): true if w Language(∈ G)• Output (parsing): one or more derivatons for w, under G
Parsing as Search
S
w1 ... ... wn
top-down bottom-up
Implementng Recognizers as Search
Agenda = { state0 }while(Agenda not empty) s = pop a state from Agenda if s is a success-state return s // valid parse tree else if s is not a failure-state: generate new states from s push new states onto Agenda
return nil // no parse!
Example Grammar and Lexicon
Recursive Descent (A Top-Down Parser)
Start state: (S, 0)Scan: From (wj+1 β, j), you can get to (β, j + 1).Predict: If Z → γ, then from (Z β, j), you can get to (γβ, j).Final state: (ε, n)
Example Grammar and Lexicon
Shif-Reduce (A Bottom-Up Parser)
• Start state: (ε, 0)• Shif: From (α, j), you can get to (α wj+1, j + 1).• Reduce: If Z → γ, then from (αγ, j) you can get
to (α Z, j).• Final state: (S, n)
Simple Grammar
• S -> NP VP• VP -> V NP• NP -> John• NP -> Delta• V -> fies
Context-Free Grammars in Chomsky Normal Form
• Vocabulary of terminal symbols, Σ• Set of nonterminal symbols (a.k.a. variables), N• Special start symbol S N∈• Producton rules of the form X → α where
X N∈α N,N Σ∈ ∪
Convert CFGs to CNF
• For each rule X → A B C
• Rewrite as X → A X2 X2 → B C
• Introducing a new non-terminal
CKY Algorithm
for i = 1 ... nC[i-1, i] = { V | V → wi }
for ℓ = 2 ... n // widthfor i = 0 ... n - ℓ // lef boundary
k = i + ℓ // right boundary for j = i + 1 ... k – 1 // midpoint
C[i, k] = C[i, k] ∪ { V | V → YZ, Y ∈ C[i, j], Z ∈
C[j, k] }return true if S ∈ C[0, n]
CKY Algorithm: Chart
book
this
fight
through
Houston
CKY Algorithm: Chart
Noun
book
this
fight
through
Houston
CKY Algorithm: Chart
Noun,Verb
book
this
fight
through
Houston
CKY Algorithm: Chart
Noun,Verb
book Det
this Noun
fight Prep
through PNoun
Houston
CKY Algorithm: Chart
Noun,Verb
book Det
this Noun
fight Prep
through PNoun, NP
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det
this Noun
fight Prep
through PNoun
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det NP
this Noun
fight Prep
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det NP
this Noun
fight Prep
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det NP
this Noun -
fight Prep
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det NP -
this Noun -
fight Prep
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det NP -
this Noun -
fight Prep PP
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det NP -
this Noun - -
fight Prep PP
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
-
book Det NP - NP
this Noun - -
fight Prep PP
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
- VP
book Det NP - NP
this Noun - -
fight Prep PP
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
- VP,S
book Det NP - NP
this Noun - -
fight Prep PP
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
- VP,S -
book Det NP - NP
this Noun - -
fight Prep PP
through PNoun,NP
Houston
CKY Algorithm: Chart
Noun,Verb
- VP,S - S
book Det NP - NP
this Noun - -
fight Prep PP
through PNoun,NP
Houston
CKY Algorithm
for i = 1 ... nC[i-1, i] = { V | V → wi }
for ℓ = 2 ... n // widthfor i = 0 ... n - ℓ // lef boundary
k = i + ℓ // right boundary for j = i + 1 ... k – 1 // midpoint
C[i, k] = C[i, k] ∪ { V | V → YZ, Y ∈ C[i, j], Z ∈
C[j, k] }return true if S ∈ C[0, n]
CKY Equatons
CKY Complexity
• CKY worst case is O(n^3 . G)• Best is worst case• (Others better in average case)
Probabilistc/Weighted Parsing
Example: ambiguous parse
Probabilistc CFG
Ambiguous parse w/probabilites
P(lef) = 2.2 *10^-6 P(right) = 6.1 *10^-7
0.05
0.20
0.20
0.20
0.75
0.30
0.60
0.10
0.40
0.05
0.10
0.20 0.20
0.75 0.75
0.30
0.60
0.10 0.40
Review: Context-Free Grammars
• Vocabulary of terminal symbols, Σ
• Set of nonterminal symbols (a.k.a. variables), N
• Special start symbol S N∈• Producton rules of the form X → α where
X N∈α (N Σ)* ∈ ∪ (in CNF: α N∈ 2 Σ)∪
Probabilistc Context-Free Grammars
• Vocabulary of terminal symbols, Σ
• Set of nonterminal symbols (a.k.a. variables), N
• Special start symbol S N∈• Producton rules of the form X → α, each with
a positve weight p(X → α), whereX N∈α (N Σ)* ∈ ∪ (in CNF: α N∈ 2 Σ)∪∀X N, ∑∈ α p(X → α) = 1
CKY Algorithm: Review
for i = 1 ... n
C[i-1, i] = { V | V → wi }
for ℓ = 2 ... n // width
for i = 0 ... n - ℓ // lef boundary
k = i + ℓ // right boundary
for j = i + 1 ... k – 1 // midpoint
C[i, k] = C[i, k] ∪ { V | V → YZ, Y ∈ C[i, j], Z ∈ C[j, k] }
return true if S ∈ C[0, n]
Weighted CKY Algorithm
for i = 1 ... n, V N∈C[V, i-1, i] = p(V → wi)
for ℓ = 2 ... n // width of span
for i = 0 ... n - ℓ // lef boundary
k = i + ℓ // right boundary
for j = i + 1 ... k – 1 // midpoint
for each binary rule V → YZ
C[V, i, k] = max{ C[V, i, k], C[Y, i, j] × C[Z, j, k] × p(V → YZ) }
return true if S ∈ C[·,0, n]
CKY Algorithm: Review
Weighted CKY Algorithm
P-CKY algorithm from book
Parsing as (Weighted) Deducton
Earley’s Algorithm
Example Grammar (same for CKY)
02/25/2020 Speech and
Language Processing - Jurafsky and Martn
55
Earley Parsing
• Allows arbitrary CFGs• Top-down control• Fills a table (or chart) in a single sweep over
the input– Table is length N+1; N is number of words– Table entries represent
• Completed consttuents and their locatons• In-progress consttuents• Predicted consttuents
02/25/2020 Speech and
Language Processing - Jurafsky and Martn
56
States
• The table-entries are called states and are represented with dotted-rules.
S . VP A VP is predicted
NP Det . Nominal An NP is in progress
VP V NP . A VP has been found
02/25/2020 Speech and
Language Processing - Jurafsky and Martn
57
States/Locatons
• S . VP [0,0]
• NP Det .Nominal [1,2]
• VP V NP . [0,3]
A VP is predicted at the start of the sentence
An NP is in progress; the Det goes from 1 to 2
A VP has been found startng at 0 and ending at 3
02/25/2020 Speech and
Language Processing - Jurafsky and Martn
58
Earley top-level
• As with most dynamic programming approaches, the answer is found by looking in the table in the right place.
• In this case, there should be an S state in the final column that spans from 0 to N and is complete. That is,
S α . [0,N]
• If that’s the case, you’re done.
02/25/2020 Speech and
Language Processing - Jurafsky and Martn
59
Earley top-level (2)
• So sweep through the table from 0 to N…– New predicted states are created by startng top-
down from S
– New incomplete states are created by advancing existng states as new consttuents are discovered
– New complete states are created in the same way.
02/25/2020 Speech and
Language Processing - Jurafsky and Martn
60
Earley top-level (3)
• More specifically…1. Predict all the states you can upfront
2. Read a word1. Extend states based on matches
2. Generate new predictons
3. Go to step 2
3. When you’re out of words, look at the chart to see if you have a winner
Earley code: top-level
Earley code: 3 main functons
02/25/2020 Speech and
Language Processing - Jurafsky and Martn
63
Extended Earley Example
• Book that fight
• We should find: an S from 0 to 3 that is a completed state
Earley’s Algorithm in equatons
• We can look at this from the declaratve programming point of view too.
ROOT → • S [0,0] goal:ROOT → S• [0,n]
book the fight through Chicago
Earley’s Algorithm: PREDICT
Given V → α•Xβ [i, j] and the rule X → γ,create X → •γ [j, j]
ROOT → • S [0,0]S → • VP [0,0]S → • NP VP [0,0]...VP → • V NP [0,0]...NP → • DT N [0,0]...
book the fight through Chicago
ROOT → • S [0,0]S→ VPS → • VP [0,0]
Earley’s Algorithm: SCAN
Given V → α•Tβ [i, j] and the rule T → wj+1,
create T → wj+1• [j, j+1]
ROOT → • S [0,0]S → • VP [0,0]S → • NP VP [0,0]...VP → • V NP [0,0]...NP → • DT N [0,0]...
V → book• [0, 1]
book the fight through Chicago
VP → • V NP [0,0]V → bookV → book • [0,1]
Earley’s Algorithm: COMPLETE
Given V → α•Xβ [i, j] and X → γ• [j, k],create V → αX•β [i, k]
ROOT → • S [0,0]S → • VP [0,0]S → • NP VP [0,0]...VP → • V NP [0,0]...NP → • DT N [0,0]...
V → book• [0, 1]VP → V • NP [0,1]
book the fight through Chicago
VP → • V NP [0,0]V → book • [0,1]VP → V • NP [0,1]
Thought Questons
• Runtme?– O(n3)
• Memory?– O(n2)
• Can we make it faster?
• Recovering trees?
Make it an Earley Parser
Parsing as Search, Again
Implementng Recognizers as Search
Agenda = { state0 }
while(Agenda not empty) s = pop a state from Agendaif s is a success-state return s // valid parse treeelse if s is not a failure-state: generate new states from spush new states onto Agenda
return nil // no parse!
Agenda-Based Probabilistc Parsing
Agenda = { (item, value) : inital updates from equatons }// items take the form [X, i, j]; values are reals
while(Agenda not empty) u = pop an update from Agendaif u.item is goal return u.value // valid parse treeelse if u.value -> Chart[u.item] store Chart[u.item] ← u.valueif u.item combines with other Chart items:
generate new updates from u and items stored in Chartpush new updates onto Agenda
return nil // no parse!
Catalog of CF Parsing Algorithms
• Recogniton/Boolean vs. parsing/probabilistc
• Chomsky normal form/CKY vs. general/Earley’s
• Exhaustve vs. agenda