+ All Categories
Home > Documents > Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn...

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn...

Date post: 15-Dec-2015
Category:
Upload: jaheem-cockett
View: 223 times
Download: 1 times
Share this document with a friend
Popular Tags:
83
Lecture 1, 7/21/2005 Natural Language Processing 1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007
Transcript
Page 1: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 1

CS60057Speech &Natural Language

Processing

Autumn 2007

Lecture 12

22 August 2007

Page 2: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 2

LING 180 SYMBSYS 138Intro to Computer Speech and

Language Processing

Lecture 13: Grammar and Parsing (I)

October 24, 2006

Dan Jurafsky

Thanks to Jim Martin for many of these slides!

Page 3: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 3

Outline for Grammar/Parsing Week Context-Free Grammars and Constituency Some common CFG phenomena for English

Sentence-level constructions NP, PP, VP Coordination Subcategorization

Top-down and Bottom-up Parsing Earley Parsing Quick sketch of probabilistic parsing

Page 4: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 4

Review

Parts of Speech Basic syntactic/morphological categories that words

belong to Part of Speech tagging

Assigning parts of speech to all the words in a sentence

Page 5: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 5

Syntax

Syntax: from Greek syntaxis, “setting out together, arrangmenet’

Refers to the way words are arranged together, and the relationship between them.

Distinction: Prescriptive grammar: how people ought to talk Descriptive grammar: how they do talk

Goal of syntax is to model the knowledge of that people unconsciously have about the grammar of their native language

Page 6: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 6

Syntax

Why should you care? Grammar checkers Question answering Information extraction Machine translation

Page 7: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 7

4 key ideas of syntax

Constituency (we’ll spend most of our time on this)

Grammatical relations Subcategorization Lexical dependencies

Plus one part we won’t have time for: Movement/long-distance dependency

Page 8: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 8

Context-Free Grammars

Capture constituency and ordering Ordering:

What are the rules that govern the ordering of words and bigger units in the language?

Constituency:

How words group into units and how the various kinds of units behave

Page 9: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 9

Constituency

Noun phrases (NPs) Three parties from Brooklyn A high-class spot such as Mindy’s The Broadway coppers They Harry the Horse The reason he comes into the Hot Box

How do we know these form a constituent? They can all appear before a verb:

Three parties from Brooklyn arrive… A high-class spot such as Mindy’s attracts… The Broadway coppers love… They sit…

Page 10: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 10

Constituency (II)

They can all appear before a verb: Three parties from Brooklyn arrive… A high-class spot such as Mindy’s attracts… The Broadway coppers love… They sit

But individual words can’t always appear before verbs: *from arrive… *as attracts… *the is *spot is…

Must be able to state generalizations like: Noun phrases occur before verbs

Page 11: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 11

Constituency (III)

Preposing and postposing: On September 17th, I’d like to fly from Atlanta to Denver I’d like to fly on September 17th from Atlanta to Denver I’d like to fly from Atlanta to Denver on September 17th.

But not: *On September, I’d like to fly 17th from Atlanta to Denver *On I’d like to fly September 17th from Atlanta to Denver

Page 12: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 12

CFG Examples

S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left

Page 13: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 13

CFGs

S -> NP VP This says that there are units called S, NP, and VP in

this language That an S consists of an NP followed immediately by

a VP Doesn’t say that that’s the only kind of S Nor does it say that this is the only place that NPs and

VPs occur

Page 14: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 14

Generativity

As with FSAs and FSTs you can view these rules as either analysis or synthesis machines Generate strings in the language Reject strings not in the language Impose structures (trees) on strings in the language

Page 15: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 15

Derivations

A derivation is a sequence of rules applied to a string that accounts for that string Covers all the elements in the string Covers only the elements in the string

Page 16: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 16

Derivations as Trees

Page 17: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 17

CFGs more formally

A context-free grammar has 4 parameters (“is a 4-tuple”)

1) A set of non-terminal symbols (“variables”) N

2) A set of terminal symbols (disjoint from N)

3) A set of productions P, each of the form A -> Where A is a non-terminal and is a string of symbols from

the infinite set of strings ( N)*

4) A designated start symbol S

Page 18: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 18

Defining a CF language via derivation A string A derives a string B if

A can be rewritten as B via some series of rule applications More formally:

If A -> is a production of P and are any strings in the set ( N)* Then we say that

A directly derives Or A

Derivation is a generalization of direct derivation Let 1, 2, … m be strings in ( N)*, m>= 1, s.t.

1 2, 2 3… m-1 m We say that 1derives m or 1* m

We then formally define language LG generated by grammar G As set of strings composed of terminal symbols derived from S LG = {w | w is in * and S * w}

Page 19: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 19

Parsing

Parsing is the process of taking a string and a grammar and returning a (many?) parse tree(s) for that string

Page 20: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 20

Context?

The notion of context in CFGs has nothing to do with the ordinary meaning of the word context in language.

All it really means is that the non-terminal on the left-hand side of a rule is out there all by itself (free of context)

A -> B C

Means that I can rewrite an A as a B followed by a C regardless of the context in which A is found

Page 21: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 21

Key Constituents (English)

Sentences Noun phrases Verb phrases Prepositional phrases

Page 22: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 22

Sentence-Types

Declaratives: A plane leftS -> NP VP

Imperatives: Leave!S -> VP

Yes-No Questions: Did the plane leave?S -> Aux NP VP

WH Questions: When did the plane leave?S -> WH Aux NP VP

Page 23: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 23

NPs

NP -> Pronoun I came, you saw it, they conquered

NP -> Proper-Noun Los Angeles is west of Texas John Hennesey is the president of Stanford

NP -> Det Noun The president

NP -> Nominal Nominal -> Noun Noun

A morning flight to Denver

Page 24: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 24

PPs

PP -> Preposition NP From LA To Boston On Tuesday With lunch

Page 25: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 25

Recursion

We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly).

NP -> NP PP [[The flight] [to Boston]]

VP -> VP PP [[departed Miami] [at noon]]

Page 26: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 26

Recursion

Of course, this is what makes syntax interesting

flights from Denver

Flights from Denver to Miami

Flights from Denver to Miami in February

Flights from Denver to Miami in February on a Friday

Flights from Denver to Miami in February on a Friday under $300

Flights from Denver to Miami in February on a Friday under $300 with lunch

Page 27: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 27

Recursion

Of course, this is what makes syntax interesting

[[flights] [from Denver]]

[[[Flights] [from Denver]] [to Miami]]

[[[[Flights] [from Denver]] [to Miami]] [in February]]

[[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a Friday]]

Etc.

Page 28: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 28

Implications of recursion and context-freeness

If you have a rule like VP -> V NP

It only cares that the thing after the verb is an NP. It doesn’t have to know about the internal affairs of that NP

Page 29: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 29

The Point

VP -> V NP I hate

flights from Denver

Flights from Denver to Miami

Flights from Denver to Miami in February

Flights from Denver to Miami in February on a Friday

Flights from Denver to Miami in February on a Friday under $300

Flights from Denver to Miami in February on a Friday under $300 with lunch

Page 30: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 30

Bracketed Notation

[S [NP [PRO I] [VP [V prefer [NP [NP [Det a] [Nom [N morning] [N flight]]]]

Page 31: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 31

Coordination Constructions

S -> S and S John went to NY and Mary followed him

NP -> NP and NP VP -> VP and VP … In fact the right rule for English is

X -> X and X

Page 32: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 32

Problems

Agreement Subcategorization Movement (for want of a better term)

Page 33: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 33

Agreement

This dog Those dogs

This dog eats Those dogs eat

*This dogs *Those dog

*This dog eat *Those dogs eats

Page 34: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 34

Possible CFG Solution

S -> NP VP NP -> Det Nominal VP -> V NP …

SgS -> SgNP SgVP PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP ->SgV Np …

Page 35: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 35

CFG Solution for Agreement

It works and stays within the power of CFGs But its ugly And it doesn’t scale all that well

Page 36: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 36

Subcategorization

Sneeze: John sneezed Find: Please find [a flight to NY]NP

Give: Give [me]NP[a cheaper fare]NP

Help: Can you help [me]NP[with a flight]PP

Prefer: I prefer [to leave earlier]TO-VP

Said: You said [United has a flight]S

Page 37: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 37

Subcategorization

*John sneezed the book *I prefer United has a flight *Give with a flight

Subcat expresses the constraints that a predicate (verb for now) places on the number and syntactic types of arguments it wants to take (occur with).

Page 38: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 38

So?

So the various rules for VPs overgenerate. They permit the presence of strings containing verbs

and arguments that don’t go together For example VP -> V NP therefore

Sneezed the book is a VP since “sneeze” is a verb and “the book” is a valid NP

Page 39: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 39

Subcategorization

Sneeze: John sneezed Find: Please find [a flight to NY]NP

Give: Give [me]NP[a cheaper fare]NP

Help: Can you help [me]NP[with a flight]PP

Prefer: I prefer [to leave earlier]TO-VP

Told: I was told [United has a flight]S

Page 40: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 40

Forward Pointer

It turns out that verb subcategorization facts will provide a key element for semantic analysis (determining who did what to who in an event).

Page 41: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 41

Possible CFG Solution

VP -> V VP -> V NP VP -> V NP PP …

VP -> IntransV VP -> TransV NP VP -> TransVwPP NP PP …

Page 42: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 42

Movement

Core example My travel agent booked the flight

Page 43: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 43

Movement

Core example [[My travel agent]NP [booked [the flight]NP]VP]S

I.e. “book” is a straightforward transitive verb. It expects a single NP arg within the VP as an argument, and a single NP arg as the subject.

Page 44: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 44

Movement

What about? Which flight do you want me to have the travel agent

book? The direct object argument to “book” isn’t appearing in

the right place. It is in fact a long way from where its supposed to appear.

And note that its separated from its verb by 2 other verbs.

Page 45: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 45

CFGs: a summary

CFGs appear to be just about what we need to account for a lot of basic syntactic structure in English.

But there are problems That can be dealt with adequately, although not elegantly, by

staying within the CFG framework. There are simpler, more elegant, solutions that take us out of the

CFG framework (beyond its formal power) Syntactic theories: HPSG, LFG, CCG, Minimalism, etc

Page 46: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 46

Other Syntactic stuff

Grammatical Relations Subject

I booked a flight to New York The flight was booked by my agent.

Object I booked a flight to New York

Complement I said that I wanted to leave

Page 47: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 47

Dependency Parsing

Word to word links instead of constituency Based on the European rather than American traditions But dates back to the Greeks The original notions of Subject, Object and the progenitor of

subcategorization (called ‘valence’) came out of Dependency theory.

Dependency parsing is quite popular as a computational model Since relationships between words are quite useful

Page 48: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 48

Parsing

Parsing: assigning correct trees to input strings Correct tree: a tree that covers all and only the elements

of the input and has an S at the top For now: enumerate all possible trees

A further task: disambiguation: means choosing the correct tree from among all the possible trees.

Page 49: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 49

Parsing: examples using Rion Snow’s visualizer

Stanford parser http://ai.stanford.edu/~rion/parsing/stanford_viz.html

Minipar parser http://ai.stanford.edu/~rion/parsing/minipar_viz.html

Link grammar parser http://ai.stanford.edu/~rion/parsing/linkparser_viz.html

Page 50: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 50

Treebanks

Parsed corpora in the form of trees Examples:

Page 51: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 51

Parsed Corpora: Treebanks

The Penn Treebank The Brown corpus The WSJ corpus

Tgrep http://www.ldc.upenn.edu/ldc/online/treebank/

Page 52: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 52

Parsing

As with everything of interest, parsing involves a search which involves the making of choices

We’ll start with some basic (meaning bad) methods before moving on to the one or two that you need to know

Page 53: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 53

For Now

Assume… You have all the words already in some buffer The input isn’t pos tagged We won’t worry about morphological analysis All the words are known

Page 54: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 54

Top-Down Parsing

Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S.

Then work your way down from there to the words.

Page 55: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 55

Top Down Space

Page 56: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 56

Bottom-Up Parsing

Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way.

Then work your way up from there.

Page 57: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 57

Bottom-Up Space

Page 58: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 58

Control

Of course, in both cases we left out how to keep track of the search space and how to make choices Which node to try to expand next Which grammar rule to use to expand a node

Page 59: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 59

Top-Down, Depth-First, Left-to-Right Search

Page 60: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 60

Example

Page 61: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 61

Example

Page 62: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 62

Example

Page 63: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 63

Control

Does this sequence make any sense?

Page 64: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 64

Top-Down and Bottom-Up

Top-down Only searches for trees that can be answers (i.e. S’s) But also suggests trees that are not consistent with

the words Bottom-up

Only forms trees consistent with the words Suggest trees that make no sense globally

Page 65: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 65

So Combine Them

There are a million ways to combine top-down expectations with bottom-up data to get more efficient searches

Most use one kind as the control and the other as a filter As in top-down parsing with bottom-up filtering

Page 66: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 66

Bottom-Up Filtering

Page 67: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 67

Top-Down, Depth-First, Left-to-Right Search

Page 68: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 68

TopDownDepthFirstLeftoRight (II)

Page 69: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 69

TopDownDepthFirstLeftoRight (III)

flight flight

Page 70: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 70

TopDownDepthFirstLeftoRight (IV)

flightflight

Page 71: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 71

Adding Bottom-Up Filtering

Page 72: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 72

3 problems with TDDFLtR Parser

Left-Recursion Ambiguity Inefficient reparsing of subtrees

Page 73: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 73

Left-Recursion

What happens in the following situation S -> NP VP S -> Aux NP VP NP -> NP PP NP -> Det Nominal … With the sentence starting with

Did the flight…

Page 74: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 74

Ambiguity

One morning I shot an elephant in my pyjamas. How he got into my pajamas I don’t know. (Groucho Marx)

Page 75: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 75

Lots of ambiguity

VP -> VP PP NP -> NP PP Show me the meal on flight 286 from SF to Denver 14 parses!

Page 76: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 76

Lots of ambiguity

Church and Patil (1982) Number of parses for such sentences grows at rate of number of

parenthesizations of arithmetic expressions Which grow with Catalan numbers

C(n) 1

n 1

2n

n

PPs Parses1 22 53 144 1325 4696 1430

Page 77: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 77

Avoiding Repeated Work

Parsing is hard, and slow. It’s wasteful to redo stuff over and over and over.

Consider an attempt to top-down parse the following as an NP

A flight from Indi to Houston on TWA

Page 78: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 78

flight

Page 79: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 79

flight

flight

Page 80: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 80

Page 81: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 81

Page 82: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 82

Dynamic Programming

We need a method that fills a table with partial results that Does not do (avoidable) repeated work Does not fall prey to left-recursion Can find all the pieces of an exponential number of

trees in polynomial time.

Page 83: Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 12 22 August 2007.

Lecture 1, 7/21/2005 Natural Language Processing 83

in bigram POS tagging, we condition a tag only on the preceding tag

why not... use more context (ex. use trigram model)

more precise: “is clearly marked” --> verb, past participle “he clearly marked” --> verb, past tense

combine trigram, bigram, unigram models condition on words too

but with an n-gram approach, this is too costly (too many parameters to model)

Possible improvements


Recommended