+ All Categories
Home > Documents > Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy...

Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy...

Date post: 14-Dec-2015
Category:
Upload: jaycee-staton
View: 220 times
Download: 1 times
Share this document with a friend
Popular Tags:
46
Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky
Transcript

Basic Parsing with Context-Free Grammars

CS 4705

Julia Hirschberg

1

Some slides adapted from Kathy McKeown and Dan Jurafsky

Syntactic Parsing

bull Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you whether a given string is legal in a particular language

bull Parsing algorithms specify how to recognize the strings of a language and assign one (or more) syntactic analyses to each string

2

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

How do we create this parse tree

Parsing is a form of Search

bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA

bull We search CFGs byndash Finding the correct parse tree among all possible

parse treesndash Search space defined by the grammar

bull Constraints provided by the input sentence and the automaton or grammar

5

Top Down Parsing

bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy

ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of

current word in input string

6

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Syntactic Parsing

bull Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you whether a given string is legal in a particular language

bull Parsing algorithms specify how to recognize the strings of a language and assign one (or more) syntactic analyses to each string

2

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

How do we create this parse tree

Parsing is a form of Search

bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA

bull We search CFGs byndash Finding the correct parse tree among all possible

parse treesndash Search space defined by the grammar

bull Constraints provided by the input sentence and the automaton or grammar

5

Top Down Parsing

bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy

ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of

current word in input string

6

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

How do we create this parse tree

Parsing is a form of Search

bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA

bull We search CFGs byndash Finding the correct parse tree among all possible

parse treesndash Search space defined by the grammar

bull Constraints provided by the input sentence and the automaton or grammar

5

Top Down Parsing

bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy

ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of

current word in input string

6

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

How do we create this parse tree

Parsing is a form of Search

bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA

bull We search CFGs byndash Finding the correct parse tree among all possible

parse treesndash Search space defined by the grammar

bull Constraints provided by the input sentence and the automaton or grammar

5

Top Down Parsing

bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy

ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of

current word in input string

6

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Parsing is a form of Search

bull We search FSAs byndash Finding the correct path through the automatonndash Search space defined by structure of FSA

bull We search CFGs byndash Finding the correct parse tree among all possible

parse treesndash Search space defined by the grammar

bull Constraints provided by the input sentence and the automaton or grammar

5

Top Down Parsing

bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy

ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of

current word in input string

6

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Top Down Parsing

bull Builds from the root S node to the leavesbull Expectation-basedbull Common top-down search strategy

ndash Top-down left-to-right with backtrackingndash Try first rule st LHS is Sndash Next expand all constituents on RHSndash Iterate until all leaves are POSndash Backtrack when candidate POS does not match POS of

current word in input string

6

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Expanding the Rules

bull The old dog the footsteps of the youngbull Where does backtracking happen bull What are the computational disadvantagesbull What are the advantagesbull What could we do to improve the process

8

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Bottom Up Parsing

bull Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det N

The old dog the footsteps of the young

Det Adj N Det N Prep Det N

The old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

9

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

S NP VP VP V

S Aux NP VP VP -gt V PP

S -gt VP PP -gt Prep NP

NP Det Nom N old | dog | footsteps | young

NP PropN V dog | eat | sleep | bark | meow

Nom -gt Adj N Aux does | can

Nom N Prep from | to | on | of

Nom N Nom PropN Fido | Felix

Nom Nom PP Det that | this | a | the

VP V NP Adj -gt old | happy| young

ldquoThe old dog the footsteps of the youngrdquo

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Bottom Up Parsing

bull When does disambiguation occurbull What are the computational advantages and

disadvantagesbull What could we do to make this process more

efficient

11

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Issues to Address

bull Ambiguityndash POSndash Attachment

bull PPhellipbull Coordination old dogs and cats

ndash Overgenerating useless hypothesesndash Regenerating good hypotheses

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Dynamic Programming

bull Fill in tables with solutions to subproblemsbull For parsing

ndash Store possible subtrees for each substring as they are discovered in the input

ndash Ambiguous strings are given multiple entriesndash Table look-up to come up with final parse(s)

bull Many parsers take advantage of this approach

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Review Minimal Edit Distance

bull Simple example of DP find the minimal lsquodistancersquo between 2 stringsndash Minimal number of operations (insert delete

substitute) needed to transform one string into another

ndash Levenstein distances (subst=1 or 2)ndash Key idea minimal path between substrings is on

the minimal path between the beginning and end of the 2 strings

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Example of MED Calculation

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

DP for Parsing

bull Table cells represented state of parse of input up to this point

bull Can be calculated from neighboring state(s)bull Only need to parse each substring once for each

possible analysis into constituents

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Parsers Using DP

bull CKY Parsing Algorithmndash Bottom-upndash Grammar must be in Chomsky Normal Formndash The parse tree might not be consistent with linguistic

theorybull Earley Parsing Algorithm

ndash Top-downndash Expectations about constituents are confirmed by inputndash A POS tag for a word that is not predicted is never added

bull Chart Parser

17

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Cocke-Kasami-Younger Algorithm

bull Convert grammar to Chomsky Normal Formndash Every CFG has a weakly equivalent CNF grammarndash A B C (non-terminals)ndash A w (terminal)ndash Basic ideas

bull Keep rules conforming to CNFbull Introduce dummy non-terminals for rules that mix terminal and non-

terminals (eg A Bw becomes A BBrsquo Brsquo w)bull Rewrite RHS of unit productions with RHS of all non-unit productions

they lead to (eg A B B w becomes A w)bull For RHS longer than 2 non-terminals replace leftmost pairs of non-

terminals with a new non-terminal and add a new production rule (eg A BCD becomes A ZD Z BC)

bull For ε-productions find all occurences of LHS in 2-variable RHSs and create new rule without the LHS (eg C ABA ε becomes CB)

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

A CFG

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Figure 138

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

CYK in Action

bull Each non-terminal above POS level has 2 daughtersndash Encode entire parse tree in N+1 x N+1 tablendash Each cell [ij] contains all non-terminals that span

positions [i-j] betw input wordsndash Cell [0N] represents all inputndash For each [ij] st iltkltj [ik] is to left and [kj] is

below in tablendash Diagonal contains POS of each input wordndash Fill in table from diagonal on up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

ndash For any cell [ij] cells (constituents) contributing to [ij] are to left and below already filled in

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Figure 138

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

CYK Parse Table

X2

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

CYK Algorithm

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Filling in [0N] Adding X2[0n]

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Filling the Final Column (1)

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Filling the Final Column (2)

X2

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Earley Algorithm

bull Top-down parsing algorithm using DPbull Allows arbitrary CFGs closer to linguisticsbull Fills a chart of length N+1 in a single sweep over

input of N wordsndash Chart entries represent state of parse at each word

positionbull Completed constituents and their locationsbull In-progress constituentsbull Predicted constituents

29

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Parser States

bull The table-entries are called states and are represented with dotted-rules

S -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

30

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

CFG for Fragment of EnglishS NP VP VP V

S Aux NP VP PP -gt Prep NP

S VP N book | flight | meal | money

NP Det Nom V book | include | prefer

NP PropN Aux does

Nom N Nom Prep from | to | on

Nom N PropN Houston | TWA

Nom Nom PP Det that | this | a | the

VP V NP

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

S8

S9

S10

S11

S13

S12

S8

S9

S8

Some Parse States for Book that flight

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Filling in the Chart

bull March through chart left-to-rightbull At each step apply 1 of 3 operators

ndash Predictorbull Create new states representing top-down expectations

ndash Scannerbull Match word predictions (rule with POS following dot)

to words in input

ndash Completerbull When a state is complete see what rules were looking

for that complete constituent

33

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Top Level Earley

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Predictor

bull Given a statendash With a non-terminal to right of dot (not a part-of-speech

category)ndash Create a new state for each expansion of the non-terminalndash Put predicted states in same chart cell as generating state

beginning and ending where generating state ends ndash So predictor looking at

bull S -gt VP [00] ndash results in

bull VP -gt Verb [00]bull VP -gt Verb NP [00]

35

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Scanner

bull Given a statendash With a non-terminal to right of dot that is a POS categoryndash If next word in input matches this POSndash Create a new state with dot moved past the non-terminal

bull Eg scanner looking at VP -gt Verb NP [00]

ndash If next word can be a verb add new statebull VP -gt Verb NP [01]

ndash Add this state to chart entry following current onendash NB Earley uses top-down input to disambiguate POS --

only POS predicted by some state can be added to chart

36

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Completer

bull Given a state ndash Whose dot has reached right end of rulendash Parser has discovered a constituent over some span of inputndash Find and advance all previous states that are lsquolooking forrsquo

this categoryndash Copy state move dot insert in current chart entry

bull Eg if processingndash NP -gt Det Nominal [13] and if state expecting an NP like

VP -gt Verb NP [01] in chartbull Add

ndash VP -gt Verb NP [03] to same cell of chart

37

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Reaching a Final State

bull Find an S state in chart that spans input from 0 to N+1 and is complete

bull Declare victoryndash S ndashgt α [0N+1]

38

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Converting from Recognizer to Parser

bull Augment the ldquoCompleterrdquo to include pointer to each previous (now completed) state

bull Read off all the backpointers from every complete S

39

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Gist of Earley Parsing

1 Predict all the states you can as soon as you can

2 Read a word1 Extend states based on matches

2 Add new predictions

3 Go to 2

3 Look at N+1 to see if you have a winner

40

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Example

bull Book that flightbull Goal Find a completed S from 0 to 3bull Chart[0] shows Predictor operationsbull Chart[1] S12 shows Scannerbull Chart[3] shows Completer stage

41

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Figure 1314

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Figure 1314 continued

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Final Parse States

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Chart Parsing

bull CKY and Earley are deterministic given an input all actions are taken is predetermined order

bull Chart Parsing allows for flexibility of events via separate policy that determines order of an agenda of statesndash Policy determines order in which states are created

and predictions madendash Fundamental rule if chart includes 2 contiguous

states st one provides a constituent the other needs a new state spanning the two states is created with the new information

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Summing Up

bull Parsing as search what search strategies to usendash Top downndash Bottom upndash How to combine

bull How to parse as little as possiblendash Dynamic Programmingndash Different policies for ordering states to be

processedndash Next Shallow Parsing and Review

46

  • Basic Parsing with Context-Free Grammars
  • Syntactic Parsing
  • ldquoThe old dog the footsteps of the youngrdquo
  • How do we create this parse tree
  • Parsing is a form of Search
  • Top Down Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (2)
  • Expanding the Rules
  • Bottom Up Parsing
  • ldquoThe old dog the footsteps of the youngrdquo (3)
  • Bottom Up Parsing (2)
  • Issues to Address
  • Dynamic Programming
  • Review Minimal Edit Distance
  • Example of MED Calculation
  • DP for Parsing
  • Parsers Using DP
  • Cocke-Kasami-Younger Algorithm
  • A CFG
  • Figure 138
  • CYK in Action
  • Slide 22
  • Figure 138 (2)
  • CYK Parse Table
  • CYK Algorithm
  • Filling in [0N] Adding X2
  • Filling the Final Column (1)
  • Filling the Final Column (2)
  • Earley Algorithm
  • Parser States
  • CFG for Fragment of English
  • Some Parse States for Book that flight
  • Filling in the Chart
  • Top Level Earley
  • Predictor
  • Scanner
  • Completer
  • Reaching a Final State
  • Converting from Recognizer to Parser
  • Gist of Earley Parsing
  • Example
  • Figure 1314
  • Figure 1314 continued
  • Final Parse States
  • Chart Parsing
  • Summing Up

Recommended