Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | jaycee-staton |
View: | 220 times |
Download: | 1 times |
Basic Parsing with Context-Free Grammars
CS 4705
Julia Hirschberg
1
Some slides adapted from Kathy McKeown and Dan Jurafsky
Syntactic Parsing
bull Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you whether a given string is legal in a particular language
bull Parsing algorithms specify how to recognize the strings of a language and assign one (or more) syntactic analyses to each string
2
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
S
NP VP
NPV
DETNOM
N PP
DET NOM
N
The old dog the
footstepsof the young
How do we create this parse tree
Parsing is a form of Search
bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA
bull We search CFGs byndash Finding the correct parse tree among all possible
parse treesndash Search space defined by the grammar
bull Constraints provided by the input sentence and the automaton or grammar
5
Top Down Parsing
bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy
ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of
current word in input string
6
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Syntactic Parsing
bull Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you whether a given string is legal in a particular language
bull Parsing algorithms specify how to recognize the strings of a language and assign one (or more) syntactic analyses to each string
2
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
S
NP VP
NPV
DETNOM
N PP
DET NOM
N
The old dog the
footstepsof the young
How do we create this parse tree
Parsing is a form of Search
bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA
bull We search CFGs byndash Finding the correct parse tree among all possible
parse treesndash Search space defined by the grammar
bull Constraints provided by the input sentence and the automaton or grammar
5
Top Down Parsing
bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy
ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of
current word in input string
6
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
S
NP VP
NPV
DETNOM
N PP
DET NOM
N
The old dog the
footstepsof the young
How do we create this parse tree
Parsing is a form of Search
bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA
bull We search CFGs byndash Finding the correct parse tree among all possible
parse treesndash Search space defined by the grammar
bull Constraints provided by the input sentence and the automaton or grammar
5
Top Down Parsing
bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy
ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of
current word in input string
6
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
S
NP VP
NPV
DETNOM
N PP
DET NOM
N
The old dog the
footstepsof the young
How do we create this parse tree
Parsing is a form of Search
bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA
bull We search CFGs byndash Finding the correct parse tree among all possible
parse treesndash Search space defined by the grammar
bull Constraints provided by the input sentence and the automaton or grammar
5
Top Down Parsing
bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy
ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of
current word in input string
6
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Parsing is a form of Search
bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA
bull We search CFGs byndash Finding the correct parse tree among all possible
parse treesndash Search space defined by the grammar
bull Constraints provided by the input sentence and the automaton or grammar
5
Top Down Parsing
bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy
ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of
current word in input string
6
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Top Down Parsing
bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy
ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of
current word in input string
6
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Expanding the Rules
bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process
8
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Bottom Up Parsing
bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches
Det N V Det N Prep Det N
The old dog the footsteps of the young
Det Adj N Det N Prep Det N
The old dog the footsteps of the young
Parse continues until an S root node reached or no further node expansion possible
9
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
S NP VP VP V
S Aux NP VP VP -gt V PP
S -gt VP PP -gt Prep NP
NP Det Nom N old | dog | footsteps | young
NP PropN V dog | eat | sleep | bark | meow
Nom -gt Adj N Aux does | can
Nom N Prep from | to | on | of
Nom N Nom PropN Fido | Felix
Nom Nom PP Det that | this | a | the
VP V NP Adj -gt old | happy| young
ldquoThe old dog the footsteps of the youngrdquo
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Bottom Up Parsing
bull When does disambiguation occurbull What are the computational advantages and
disadvantagesbull What could we do to make this process more
efficient
11
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Issues to Address
bull Ambiguityndash POSndash Attachment
bull PPhellipbull Coordination old dogs and cats
ndash Overgenerating useless hypothesesndash Regenerating good hypotheses
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Dynamic Programming
bull Fill in tables with solutions to subproblemsbull For parsing
ndash Store possible subtrees for each substring as they are discovered in the input
ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)
bull Many parsers take advantage of this approach
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Review Minimal Edit Distance
bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete
substitute) needed to transform one string into another
ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on
the minimal path between the beginning and end of the 2 strings
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Example of MED Calculation
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
DP for Parsing
bull Table cells represented state of parse of input up to this point
bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each
possible analysis into constituents
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Parsers Using DP
bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic
theorybull Earley Parsing Algorithm
ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added
bull Chart Parser
17
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Cocke-Kasami-Younger Algorithm
bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas
bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-
terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions
they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-
terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)
bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
A CFG
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Figure 138
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
CYK in Action
bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span
positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is
below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Figure 138
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
CYK Parse Table
X2
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
CYK Algorithm
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Filling in [0N] Adding X2[0n]
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Filling the Final Column (1)
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Filling the Final Column (2)
X2
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Earley Algorithm
bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over
input of N wordsndash Chart entries represent state of parse at each word
positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents
29
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Parser States
bull The table-entries are called states and are represented with dotted-rules
S -gt VP A VP is predicted
NP -gt Det Nominal An NP is in progress
VP -gt V NP A VP has been found
30
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
CFG for Fragment of EnglishS NP VP VP V
S Aux NP VP PP -gt Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a | the
VP V NP
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
S8
S9
S10
S11
S13
S12
S8
S9
S8
Some Parse States for Book that flight
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Filling in the Chart
bull March through chart left-to-rightbull At each step apply 1 of 3 operators
ndash Predictorbull Create new states representing top-down expectations
ndash Scannerbull Match word predictions (rule with POS following dot)
to words in input
ndash Completerbull When a state is complete see what rules were looking
for that complete constituent
33
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Top Level Earley
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Predictor
bull Given a statendash With a non-terminal to right of dot (not a part-of-speech
category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state
beginning and ending where generating state ends ndash So predictor looking at
bull S -gt VP [00] ndash results in
bull VP -gt Verb [00]bull VP -gt Verb NP [00]
35
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Scanner
bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal
bull Eg scanner looking at VP -gt Verb NP [00]
ndash If next word can be a verb add new statebull VP -gt Verb NP [01]
ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --
only POS predicted by some state can be added to chart
36
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Completer
bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo
this categoryndash Copy state move dot insert in current chart entry
bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like
VP -gt Verb NP [01] in chartbull Add
ndash VP -gt Verb NP [03] to same cell of chart
37
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Reaching a Final State
bull Find an S state in chart that spans input from 0 to N+1 and is complete
bull Declare victoryndash S ndashgt α [0N+1]
38
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Converting from Recognizer to Parser
bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state
bull Read off all the backpointers from every complete S
39
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Gist of Earley Parsing
1 Predict all the states you can as soon as you can
2 Read a word1 Extend states based on matches
2 Add new predictions
3 Go to 2
3 Look at N+1 to see if you have a winner
40
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Example
bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage
41
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Figure 1314
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Figure 1314 continued
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Final Parse States
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Chart Parsing
bull CKY and Earley are deterministic given an input all actions are taken is predetermined order
bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created
and predictions madendash Fundamental rule if chart includes 2 contiguous
states st one provides a constituent the other needs a new state spanning the two states is created with the new information
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46
Summing Up
bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine
bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be
processedndash Next Shallow Parsing and Review
46