Speech & NLP (Fall 2014): Parsing with Context-Free Grammars and Probabilistic Context-Free Grammars

transcript

Speech & NLP

Parsing with Context-Free Grammars&

Probabilistic Context-Free Grammars

Vladimir Kulyukin

ww.vkedco.blogspot.com

Outline

● Background● Chart Parsing with Context-Free Grammars: The Early Algorithm

● Probabilistic Context-Free Grammars● Part-of-Speech Tagging

Parts of Speech & Syntactic Categories

Part of Speech vs. Syntactic Category

In NLP, every wordform has a part of speech (e.g., NOUN, PREPOSITION, VERB, DET, etc.)

Syntactic categories are higher level in that they are composed of parts of speech (e.g. NP DET NOMINAL)

In context-free grammars for formal languages a part of speech is any variable V that has a production V → t, where t is a terminal

NL Example

S ::= NP VPS ::= AUX NP VPS ::= VPNP ::= DET NOMINALNOMINAL ::= NOUNNOMINAL ::= NOUN NOMINALNP ::= ProperNounVP ::= VERBVP ::= VERB NP

Suppose a CFG G has the following productions:

S, NP, VP, NOMINAL are syntactic categories;DET, NOUN, VERB, AUX, PREP, ProperNoun are parts of speech.

Formal Language Example

E ::= E MINUS EE ::= E TIMES EE ::= E EQUALS EE ::= E LESS EE ::= LP E RPE ::= AE ::= BE ::= C

MINUS ::= -TIMES ::= *EQUALS ::= =LESS ::= <LP ::= (RP ::= )A ::= aB ::= bC ::= c

E is the only syntactic category in G;MINUS, TIMES, EQUALS, LESS, LP, RP, A, B, C are parts of speech.

Early Algorithm

General Remarks

The Early Algorithm (EA) uses a dynamic programming approach to find all possible parse trees of a given input

EA makes a single left-to-right pass that gradually fills an array called a chart (this class of algorithms is known as chart parsers)

Each found parse tree can be retrieved from the chart, which makes this algorithm well suited for retrieving partial inputs

EA can be used as a prediction management framework

Helper Data Structures & Functions

Production { LHS; // left-hand side variable RHS; // sequence of right-hand side variables and terminals}// Suppose that A is a syntactic or part-of-speech category and G is a CFG.FindProductions(A, G) returns a set of grammar productions {(A λ) | where A is the LHS and λ is the RHS of a production in G}

PartOfSpeech(wordform) returns a set of part-of-speech symbols for wordform (e.g., PartOfSpeech(“make”) returns { NOUN, VERB }

NL Example

S ::= NP VPS ::= AUX NP VPS ::= VPNP ::= DET NOMINALNOMINAL ::= NOUNNOMINAL ::= NOUN NOMINALNP ::= ProperNounVP ::= VERBVP ::= VERB NP

FindProductions(S, G) returns { S ::= NP VP, S ::= AUX NP VP, S ::= VP };PartOfSpeech(book) returns { NOUN, VERB}

Formal Language Example

FindProductions(E, G) returns { E ::= E MINUS E, E ::= E TIMES E, E ::= E EQUALS E, E ::= E LESS E, E ::= LP E RP, E ::= A, E ::= B, E ::= C};PartOfSpeech(a) returns { A };PartOfSpeech(<) returns { LESS }

Parser States (aka Dotted Rules)

Input & Input Positions

“MAKE” “A” “LEFT”

0 1 2 3

INPUT: “MAKE A LEFT”

Relationship b/w Input & Parser States

Current state of production

(B E F * G, i, j)

(A B C D *, 0, N)

… … …j

Input Start Input End

Input & Parser State: Examples

“MAKE” “A” “LEFT”0 1 2 3

(NP DET * NOMINAL, 1, 2)

(VP Verb NP *, 0, 3)

(S *VP, 0, 0)

Predictions, Scans, & Completions

PredictionPrediction is the creation of new parser states (aka dotted rules) that predict what is expected to be seen in the input

If a parser state’s production’s right-hand side has the dot to the left of a variable (i.e., non-terminal) that is not a part-of-speech category, prediction is applicable

Prediction generates new parser states that start and end at the same input position as the parser state from which they were predicted

Prediction Example

● Suppose a CFG grammar G contains the following productions: S → VP, VP → Verb, VP → Verb NP

● Suppose that there is a parser state (S → * VP, 0, 0)● Since VP is not a part-of-speech category, prediction applies● Prediction generates two new parser states from the old parser

state (S → * VP, 0, 0):● 1) (VP → * Verb, 0, 0)● 2) (VP → * Verb NP, 0, 0)

Prediction Pseudocode

// G is a grammar, chart[] is an array of parser statesPredict((A → α *X β, start, end), G, chart[]) { Productions = FindProductions(X, G);

find each production (X → λ) in Productions { // Insert a new parser state ((X → *λ, end, end) // into chart[end] AddToChart((X → *λ, end, end), chart[end]); }}

Example

“a” “*” “b”0 1 2 3

INPUT: “a * b”

Suppose a CFG G has these productions:

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E4. ((E *E EQUALS E), 0, 0) // prediction from E ::= E EQUALS E

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E4. ((E *E EQUALS E), 0, 0) // prediction from E ::= E EQUALS E5. ((E *E LESS E), 0, 0) // prediction from E ::= LESS E

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E4. ((E *E EQUALS E), 0, 0) // prediction from E ::= E EQUALS E5. ((E *E LESS E), 0, 0) // prediction from E ::= LESS E6. ((E *LP E RP), 0, 0) // prediction from E ::= LP E RP

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E4. ((E *E EQUALS E), 0, 0) // prediction from E ::= E EQUALS E5. ((E *E LESS E), 0, 0) // prediction from E ::= LESS E6. ((E *LP E RP), 0, 0) // prediction from E ::= LP E RP7. ((E *A), 0, 0) // prediction from E ::= A

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E4. ((E *E EQUALS E), 0, 0) // prediction from E ::= E EQUALS E5. ((E *E LESS E), 0, 0) // prediction from E ::= LESS E6. ((E *LP E RP), 0, 0) // prediction from E ::= LP E RP7. ((E *A), 0, 0) // prediction from E ::= A8. ((E *B), 0, 0) // prediction from E ::= B

Example

INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E4. ((E *E EQUALS E), 0, 0) // prediction from E ::= E EQUALS E5. ((E *E LESS E), 0, 0) // prediction from E ::= LESS E6. ((E *LP E RP), 0, 0) // prediction from E ::= LP E RP7. ((E *A), 0, 0) // prediction from E ::= A8. ((E *B), 0, 0) // prediction from E ::= B9. ((E *C), 0, 0) // prediction from E ::= C

Scanning Scanning advances the input pointer exactly one wordform to

the right (or left as the case may be) Scanning applies to any parser state whose production has a

dot (any other symbol can be used) to the left of a part-of-speech category

Scanning creates a new parser state (NPS) from an old parser state (OPS) by 1) moving the dot in the OPR’s production’s RHS one symbol right; 2) incrementing the OPS’s input end position by 1, and 3) placing the NPS to the chart entry that follows the OPS’s chart entry

Scanning Example

Suppose the current wordform at position 0 in the input is MAKE

Suppose an old parser state in chart[0] is (VP → * VERB NP, 0, 0)

Since VERB is a part-of-speech category and MAKE is a VERB, scanning applies and generates a new parser state (VP → VERB * NP, 0, 1) that is placed in chart[1]

Scanning Pseudocode

// α, β are possibly empty sequences of variables and// terminalsScan((A → α *X β, start, end), input[], G, chart[]) { POSList = PartsOfSpeech(input[end]); if ( X is in POSList ) { AddToChart((X → input[end]*, end, end+1), chart[end+1]);}

Example INPUT: “a * b”

1. ((λ *E), 0, 0) // dummy prediction to start everything going2. ((E *E MINUS E), 0, 0) // prediction from E ::= E MINUS E3. ((E *E TIMES E), 0, 0) // prediction from E ::= E TIMES E4. ((E *E EQUALS E), 0, 0) // prediction from E ::= E EQUALS E5. ((E *E LESS E), 0, 0) // prediction from E ::= LESS E6. ((E *LP E RP), 0, 0) // prediction from E ::= LP E RP7. ((E *A), 0, 0) // prediction from E ::= A8. ((E *B), 0, 0) // prediction from E ::= B9. ((E *C), 0, 0) // prediction from E ::= C10. ((A a*), 0, 1) // scanning

Scanning is applicable in state ((E *A), 0, 0), because A is a part-of-speech and the input at position 0 is a.

Completion Completion is the process that indicates the completion of a

production’s RHS Completion is applied to any parser state whose dot has

reached the end of its production’s RHS (to the right of the rightmost symbol)

In other words, completion signals that the parser has completed a production (LHS → RHS) over a specific portion of the input

Completion triggers the process of finding all previously created parser states that wait for a specific category (LHS) at the completed parser state’s start position

Completion Example● Suppose that there are two parser states:● 1) (NP → DET NOMINAL *, 1, 3)● 2) (VP → VERB * NP, 0, 1)● Since the dot has reached the end of the 1st production’s RHS, completion applies (this

parser state is called completed parser state)● Completion finds all states that end at position 1 and look for NP; in this case, (VP →

VERB * NP, 0, 1) is found● For each found old parser state (OPS – old parser state):

● a new parser state (NPS) is created by advancing the OPS’s dot one symbol to the right and setting the NPS’s end position to the completed parser state’s end position; in this case, (VP → VERB NP *, 0, 3)

● NPS is placed in chart[completed parser state’s end position]

Completion Pseudocode

// α, β, λ are sequences of variables and// terminals, G is a grammarComplete((A λ *, compStart, compEnd), input[], G, chart[]) { for each old parser state ((X α *A β), oldStart, compStart) { AddToChart((X α A * β, oldStart, compEnd), chart[compEnd]);}

EA PseudocodeEarlyParse(input[], G, chart[]) { AddToChart((λ *S, 0, 0), char[0]); for i from 0 to length(input[]) { for each parser state PS in chart[i] { if ( isIncomplete(PS) && !isPartOfSpeech(nextCat(PS)) ) { Predict(PS, G, chart[]); } else if ( isIncomplete(PS) && isPartOfSpeech(nextCat(PS) ) { Scan(PS, input[], G, chart[]); } else { Complete(PS, input[], G, chart[]); }}}}

ExampleINPUT: “a * b”

CFG G1) ((λ *E), 0, 0) // dummy prediction to start everything2) ((E *E MINUS E), 0, 0) // prediction3) ((E *E TIMES E), 0, 0) // prediction4) ((E *E EQUALS E), 0, 0) // prediction5) ((E *E LESS E), 0, 0) // prediction6) ((LP *LP E RP), 0, 0) // prediction7) ((E *A), 0, 0) // prediction8) ((E *B), 0, 0) // prediction9) ((E *C), 0, 0) // prediction10) ((A a *, 0, 1) // scanning11) ((E A*, 0, 1) // completion of parser state 712) ((λ E *, 0, 1) // completion of parser state 113) ((E E * MINUS E), 0, 1) // completion of parser state 214) ((E E *TIMES E), 0, 1) // completion of parser state 315) ((E E *EQUALS E), 0, 1) // completion of parser state 416) ((E E *LESS E , 0, 1) // completion of parser state 5

Implementation Notesof

the Early Parser

source code is here

CFGSymbol.java public class CFGSymbol { String mSymbolName;

public CFGSymbol() { mSymbolName = ""; }

public CFGSymbol(String n) { mSymbolName = new String(n); }

public CFGSymbol(CFGSymbol s) { mSymbolName = new String(s.mSymbolName); }

public String toString() { return mSymbolName; }

public boolean isEqual(CFGSymbol s) { if ( s == null ) return false; else return this.mSymbolName.equals(s.mSymbolName); }}

CFProduction.javapublic class CFProduction { int mID; CFGSymbol mLHS; // left-hand side ArrayList<CFGSymbol> mRHS; // right-hand side // production ID defaults to 0 public CFProduction(CFGSymbol lhs, ArrayList<CFGSymbol> rhs) { this.mID = 0; this.mLHS = new CFGSymbol(lhs); this.mRHS = new ArrayList<CFGSymbol>(); Iterator<CFGSymbol> iter = rhs.iterator(); while ( iter.hasNext() ) { this.mRHS.add(new CFGSymbol(iter.next())); } }

public CFProduction(int id, CFGSymbol lhs, ArrayList<CFGSymbol> rhs) { this(lhs, rhs); this.mID = id; }}

CFGrammar.java

public class CFGrammar {

protected ArrayList<String> mVariables; // sorted array of variables protected ArrayList<String> mTerminals; // sorted array of terminals protected TreeMap<String, ArrayList<CFProduction>> mProductions; // variable to // productions map where variable is lhs protected CFGSymbol mStartSymbol; protected ArrayList<CFProduction> mIdToProductionMap; // given an id, find a rule. protected ArrayList<String> mPosVars; // sorted lists of parts of speech. protected TreeMap<String, ArrayList<CFGSymbol>> mTerminalsToPosVarsMap;

CFGrammar.java: Compiling CFG from File

public CFGrammar() { mVariables = new ArrayList<String>(); mTerminals = new ArrayList<String>(); mProductions = new TreeMap<String, ArrayList<CFProduction>>(); mIdToProductionMap = new ArrayList<CFProduction>(); mPosVars = new ArrayList<String>(); mTerminalsToPosVarsMap = new TreeMap<String, ArrayList<CFGSymbol>>(); mStartSymbol = null; }

public CFGrammar(String grammarFilePath) { this(); compileCFGGrammar(grammarFilePath); }

RecognizerState.java public class RecognizerState { int mRuleNum; // number of a cfg rule that this state tracks. int mDotPos; // dot position in the rhs of the rule. int mInputFromPos; // where in the input the rule starts. int mUptoPos; // where in the input the rule ends. CFProduction mTrackedRule; // the actual CFG rule being tracked; the rule's number == ruleNum.

public CFGSymbol nextCat() { if ( mDotPos < mTrackedRule.mRHS.size() ) return mTrackedRule.mRHS.get(mDotPos); else return null; }

public boolean isComplete() { return mDotPos == mTrackedRule.mRHS.size(); }

public int getDotPos() { return mDotPos; } public int getFromPos() { return mInputFromPos; } public int getUptoPos() { return mUptoPos; } public CFProduction getCFGRule() { return mTrackedRule; } public int getRuleNum() { return mRuleNum; }

ParserState.java public class ParserState extends RecognizerState { static int mCount = 0; int mID = 0; // this is the unique id of this ParserState ArrayList<ParserState> mParentParserStates = null; // IDs of parent parser states String mOrigin = "N"; // it can be N (None), S (Scanner), P (Predictor), C (Completer)

public ParserState(int ruleNum, int dotPos, int fromPos, int uptoPos, CFProduction r) { super(ruleNum, dotPos, fromPos, uptoPos, r); mID = mCount; mCount = mCount + 1; mParentParserStates = new ArrayList<ParserState>(); }

String getID() { return "PS" + mID; } void addPreviousState(ParserState ps) { mParentParserStates.add(ps); } // add all previous states of ps into the this state. void addPreviousStatesOf(ParserState ps) { if ( ps.mParentParserStates.isEmpty() ) return; Iterator<ParserState> iter = ps.mParentParserStates.iterator(); while ( iter.hasNext() ) { addPreviousState(iter.next()); } } void setOrigin(String origin) { mOrigin = origin; }

Retrieval of Parse Trees

Each state is augmented with an additional member variable to its parents, i.e., the completed parser states that generated it

The parse tree retrieval algorithm begins with each parser state that spans the entire input and completes a production whose LHS is S

The algorithm recursively retrieves each sub-tree by for the parents and then linking them to the tree of the current parser state

Probabilistic Context-Free Grammars

Probabilistic CFGs

PCFGs extend regular CFGs Productions are assigned probabilities The probability of a derivation is the product of all applied

productions in that derivation The production probabilities are typically learned through

various machine learning approaches applied to very large tree banks (e.g., the Penn State Tree Bank Project at www.cis.upenn.edu/~treebank )

PCFGs have been used in NLP, RNA analysis, and programming language design

Formal Definition

.each toiesprobabilit assignshat function t a is .5

symbol;start theis .4

; and where

, form theof sproduction ofset a is 3.

symbols; terminalofset a is 2.

symbols; terminal-non ofset a is N .1

,,,, tuple-5 a isG PCFG A

Production Probabilities

or as expand will lnontermina that the

y probabilit lconditiona theis p where,

form theof isPCFG ain productionEach

Production Probabilities

.40 can

example,For 1. toup summust

terminal-non a of expansions possible All

doesAux

NP VPAuxS

PCFG & Disambiguation

Given a PCFG G, we can assign a probability to each parse tree of a given sentence

The parse tree with the largest probability is chosen as the parse of the sentence

PCFG-based disambiguation approaches typically take the winner-takes-all approach: the parse with the highest probability always wins

Probability of a Parse Tree

rule. that ofy probabilit theis

and it, expand toused rule theis ,

in node a is where,,

Then . treeparse specific a be Let

nnrpSTP

Probability of a Parse TreeS

Aux NP VP

Can you

Pronoun

V NP NP

TWA flights

NomPNoun

S AUX NP VP→ [.15]AUX Can→ [.40]NP Pronoun→ [.40]VP V NP NP→ [.05]V book→ [.30]NP PNoun→ [.35]PNoun TWA→ [.40]NP Nom→ [.05]Nom Noun→ [.75]Noun flights→ [.50]

PCFG Fragment

flightsNounpNounNompNomNPpTWAPNounp

PNounNPpbookVpVVPpp

NPpCanAuxpVPNPAuxSpSTP

NP NP you Pronound

Pronoun ,

Picking Most Probable Parse Tree

because istion simplifica second The 2.

s.' allfor same the

is because istion simplificafirst The 1.

.maxarg,maxarg,

maxarg

.Sfor treesparse ofset a be Let

T PTSPT P

TPSTPSP

T(S)TT(S)TT(S)T

Part-of-Speech Tagging With

Apache Open NL

Installing Apache Open NL

Download & unzip the JAR files from http://opennlp.apache.org/cgi-bin/download.cgi

Create a Java project and add the JARs to your project For example, you can include the following JARs: jwnl-1.3.3.jar opennlp-maxent-3.0.3.jar opennlp-tools-1.5.3.jar opennlp-uima-1.5.3.jar Download the language model files from

http://opennlp.sourceforge.net/models-1.5/ Place the model files into a directory

Tokenization of Input

Apache Open NL provides a set of tools that can be used to: - Split text into tokens or sentences

- Do part-of-speech tagging and use the output as the input to a parser - The categories used in Open NL POS tagging are at http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

PCFG & Dependency Parsing with Stanford Parser

Stanford Parser

Stanford is an example of a probabilistic parser Probabilistic parsers require extensive training, but, once the

training is done, they work robustly Even when they fail (independence assumption & lexical

dependency), they give you partially completed parser trees that can be put to productive use

Stanford Parser does both PCFG and dependency parsing

References & Reading Suggestions

Early, J. (1968). An Efficient Context-Free Parsing Algorithm. Ph.D. thesis, CMU, Pittsburgh, PA

Ch. 12 in Jurafsky & Martin. Speech & Language Processing. Prentice Hall Hopcroft and Ullman. Introduction to Automata Theory, Languages, and

Computation, Narosa Publishing House URL for Stanford Parser: http://nlp.stanford.edu/software/lex-parser.shtml

Speech & NLP (Fall 2014): Parsing with Context-Free Grammars and Probabilistic Context-Free Grammars

Science