+ All Categories
Home > Documents > October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The...

October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The...

Date post: 27-Dec-2015
Category:
Upload: cathleen-young
View: 235 times
Download: 2 times
Share this document with a friend
Popular Tags:
31
October 2008 csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I • The Parsing Problem • Parsing as Search • Top Down/Bottom Up Parsing Strategies
Transcript
Page 1: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

1

CSA350: NLP Algorithms

Sentence Parsing I • The Parsing Problem• Parsing as Search• Top Down/Bottom Up Parsing Strategies

Page 2: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

2

References

• This lecture is largely based on material found in Jurafsky & Martin chapter 13

Page 3: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

3

Handling Sentences

• Sentence boundary detection.• Finite state techniques are fine for certain

kinds of analysis:– named entity recognition– NP chunking

• But FS techniques are of limited use when trying to compute grammatical relationships between parts of sentences.

• We need these to get at meanings.

Page 4: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

4

Grammatical Relationships:e.g. subject

Wikipaedia definition:

The subject has the grammatical function in a sentence of relating its constituent (a noun phrase) by means of the verb to any other elements present in the sentence, i.e. objects, complements and adverbials.

Page 5: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

5

Grammatical Relationships:e.g. subject

• The dictionary helps me find words. • Ice cream appeared on the table. • The man that is sitting over there told

me that he just bought a ticket to Tahiti. • Nothing else is good enough. • That nothing else is good enough

shouldn't come as a surprise. • To eat six different kinds of vegetables

a day is healthy.

Page 6: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

6

Why not use FS techniques for describing NL sentences

• Descriptive Adequacy– Some NL phenomena cannot be described within FS

framework.– example: central embedding

• Notational Efficiency– The notation does not facilitate 'factoring out' the

similarities. – To describe sentences of the form subject-verb-object

using a FSA, we must describe possible subjects and objects, even though almost all phrases that can appear as one can equally appear as the other.

Page 7: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

7

Central Embedding

• The following sentences– The cat spat

1 1– The cat the boy saw spat

1 2 2 1– The cat the boy the girl liked saw spat

1 2 3 3 2 1

• Require at least a grammar of the formS → An Bn

Page 8: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

8

DCG-style Grammar/Lexicon

% GRAMMARs --> np, vp.s --> aux, np, vp.

s --> vp.np --> det nom.nom --> noun.nom --> noun, nom.nom --> nom, pppp --> prep, np.np --> pn.vp --> v.vp --> v np

% LEXICONd --> [that];[this];[a].n --> [book];[flight];

[meal];[money].v --> [book];[include];

[prefer].aux --> [does].prep --> [from];[to];

[on].pn --> [‘Houston’];

[‘TWA’].

Page 9: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

9

Definite Clause Grammars

• Prolog Based• LHS --> RHS1, RHS2, ..., {code}.

• s(s(NP,VP)) -->np(NP), vp(VP), {mk-subj(NP)}

• Rules are translated into executable Prolog program.

• No clear distinction between rules for grammar and lexicon.

Page 10: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

10

Parsing Problem• Given grammar G and sentence A discover all

valid parse trees for G that exactly cover A

S

VP

NPV

DetNom

Nbook

that

flight

Page 11: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

11

The elephant is in the trousers

I shot an elephant in my trousers

NP

VP

NP

PP

NP

S

Page 12: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

12

I was wearing the trousers

I shot an elephant in my trousers

NP

VP

PP

NP

S

Page 13: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

13

Parsing as Search

• Search within a space defined by– Start State– Goal State– State to state transformations

• Two distinct parsing strategies:– Top down– Bottom up

• Different parsing strategy, different state space, different problem.

• N.B. Parsing strategy ≠ search strategy

Page 14: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

14

Top Down

• Each state comprises:– a tree– an open node– an input pointer

• Together these encode the current state of the parse.

• Top down parser tries to build from the root node S down to the leaves by replacing nodes with non-terminal labels with RHS of corresponding grammar rules.

• Nodes with pre-terminal (word class) labels are compared to input words.

Page 15: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

15

Top Down Search Space

Start node →

Goal node↓

Page 16: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

16

Bottom Up

• Each state is a forest of trees.

• Start node is a forest of nodes labelled with pre-terminal categories (word classes derived from lexicon)

• Transformations look for places where RHS of rules can fit.

• Any such place is replaced with a node labelled with LHS of rule.

Page 17: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

17

Bottom Up Search Space

fl fl

fl fl fl

fl fl

failed BU derivation

Page 18: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

18

Top Down vs Bottom UpSearch Spaces

• Top down – For: space excludes

trees that cannot be derived from S

– Against: space includes trees that are not consistent with the input

• Bottom up– For: space excludes

states containing trees that cannot lead to input text segments.

– Against: space includes states containing subtrees that can never lead to an S node.

Page 19: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

19

Top Down Parsing - Remarks

• Top-down parsers do well if there is useful grammar driven control: search can be directed by the grammar.

• Not too many different rules for the same category

• Not too much distance between non terminal and terminal categories.

• Top-down is unsuitable for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup.

Page 20: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

20

Bottom Up Parsing - Remarks

• It is data-directed: it attempts to parse the words that are there.

• Does well, e.g. for lexical lookup.• Does badly if there are many rules with similar

RHS categories.• Inefficient when there is great lexical ambiguity

(grammar driven control might help here)• Empty categories: termination problem unless

rewriting of empty constituents is somehow restricted (but then it’s generally incomplete)

Page 21: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

21

Basic Parsing Algorithms

• Top Down

• Bottom Up

• see Jurafsky & Martin Ch. 10

Page 22: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

22

Top Down Algorithm

Page 23: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

23

Recoding the Grammar/Lexicon

% Grammar

rule(s,[np,vp]).

rule(np,[d,n]).

rule(vp,[v]).

rule(vp,[v,np]).

% Lexicon

word(d,the).

word(n,dog).

word(n,cat).

word(n,dogs).

word(n,cats).

word(v,chase).

word(v,chases).

Page 24: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

24

Top Down Depth First Recognitionin Prolog

parse(C,[Word|S],S) :- word(C,Word). % word(noun,cat).

parse(C,S1,S) :- rule(C,Cs), % rule(s,[np,vp]) parse_list(Cs,S1,S).

parse_list([],S,S).parse_list([C|Cs],S1,S) :- parse(C,S1,S2),

parse_list(Cs,S2,S).

Page 25: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2006 csa3180: Setence Parsing Algorithms 1

25

Derivation top down,

left-to-right,

depth first

Page 26: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

26

Bottom UpShift/Reduce Algorithm

• Two data structures– input string– stack

• Repeat until input is exhausted– Shift word to stack– Reduce stack using grammar and lexicon until no

further reductions are possible

• Unlike top down, algorithm does not require category to be specified in advance. It simply finds all possible trees.

Page 27: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

27

Shift/Reduce Operation

→|Step Action Stack Input0 (start) the dog barked1 shift the dog barked2 reduce d dog barked3 shift dog d barked4 reduce n d barked5 reduce np barked6 shift barked np7 reduce v np8 reduce vp np9 reduce s

Page 28: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

28

Shift/Reduce Implementation

parse(S,Res) :- sr(S,[],Res).

sr(S,Stk,Res) :- shift(Stk,S,NewStk,S1), reduce(NewStk,RedStk), sr(S1,RedStk,Res).sr([],Res,Res).

shift(X,[H|Y],[H|X],Y).

reduce(Stk,RedStk) :- brule(Stk,Stk2), reduce(Stk2,RedStk).reduce(Stk,Stk).

%grammarbrule([vp,np|X],[s|X]).brule([n,d|X],[np|X]).brule([np,v|X],[vp|X]).brule([v|X],[vp|X]).

%interface to lexiconbrule([Word|X],[C|X]) :- word(C,Word).

↑ ↑ ↑ ↑stack sent nstack nsent

Page 29: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

29

Shift/Reduce Operation

• Words are shifted to the beginning of the stack, which ends up in reverse order.

• The reduce step is simplified if we also store the rules backward, so that the rule s → np vp is stored as the fact

brule([vp,np|X],[s|X]).

• The term [a,b|X] matches any list whose first and second elements are a and b respectively.

• The first argument directly matches the stack to which this rule applies

• The second argument is what the stack becomes after reduction.

Page 30: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

30

Shift Reduce Parser

• Standard implementations do not perform backtracking (e.g. NLTK)

• Only one result is returned even when sentence is ambiguous.

• May not fail even when sentence is grammatical

• Shift/Reduce conflict

• Reduce/Reduce conflict

Page 31: October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

October 2008 csa3180: Setence Parsing Algorithms 1

31

Handling Conflicts

• Shift-reduce parsers may employ policies for resolving such conflicts, e.g.

• For Shift/Reduce Conflicts– Prefer shift– Prefer reduce

• For Reduce/Reduce Conflicts– Choose reduction which removes most

elements from the stack


Recommended