+ All Categories
Home > Documents > 10/3/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

10/3/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.

Date post: 29-Dec-2015
Category:
Upload: claribel-carroll
View: 217 times
Download: 0 times
Share this document with a friend
52
03/25/22 CPSC503 Winter 2009 1 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini
Transcript

04/19/23 CPSC503 Winter 2009 1

CPSC 503Computational Linguistics

Lecture 8Giuseppe Carenini

04/19/23 CPSC503 Winter 2009 2

Today 1/10

• Finish POS tagging• Start Syntax / Parsing (Chp 12!)

04/19/23 CPSC503 Winter 2009 3

Evaluating Taggers

•Accuracy: percent correct (most current taggers 96-7%) *test on unseen data!*

•Human Celing: agreement rate of humans on classification (96-7%)

•Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%)

•What is causing the errors? Build a confusion matrix…

04/19/23 CPSC503 Winter 2009 4

Confusion matrix

• Precision ?• Recall ?

04/19/23 CPSC503 Winter 2009 5

Error Analysis (textbook)

• Look at a confusion matrix

• See what errors are causing problems– Noun (NN) vs ProperNoun (NNP) vs Adj (JJ)– Past tense (VBD) vs Past Participle (VBN)

04/19/23 CPSC503 Winter 2009 6

Knowledge-Formalisms Map(next three lectures)

Logical formalisms (First-Order Logics)

Rule systems (and prob. versions)(e.g., (Prob.) Context-Free

Grammars)

State Machines (and prob. versions)

(Finite State Automata,Finite State Transducers, Markov Models)

Morphology

Syntax

PragmaticsDiscourse

and Dialogue

Semantics

AI planners

04/19/23 CPSC503 Winter 2009 7

Today 1/10

• Finish POS tagging• English Syntax • Context-Free Grammar for English

– Rules– Trees– Recursion– Problems

• Start Parsing

04/19/23 CPSC503 Winter 2009 8

SyntaxDef. The study of how sentences are formed by

grouping and ordering words

Example: Ming and Sue prefer morning flights

* Ming Sue flights morning and prefer

Groups behave as single unit wrt Substitution, Movement, Coordination

04/19/23 CPSC503 Winter 2009 9

Syntax: Useful tasks

• Why should you care?– Grammar checkers– Basis for semantic interpretation

•Question answering •Information extraction•Summarization

– Machine translation– ……

04/19/23 CPSC503 Winter 2009 10

Key Constituents – with heads (English)• Noun phrases• Verb phrases• Prepositional

phrases • Adjective phrases• Sentences

• (Det) N (PP)• (Qual) V (NP)• (Deg) P (NP)• (Deg) A (PP)• (NP) (I) (VP)

Some simple specifiersCategory Typical function

ExamplesDeterminer specifier of N the, a, this,

no..Qualifier specifier of V never,

often..Degree word specifier of A or P very,

almost..

Complements?

(Specifier) X (Complement)

04/19/23 CPSC503 Winter 2009 11

Key Constituents: Examples• Noun phrases

• Verb phrases

• Prepositional phrases

• Adjective phrases

• Sentences

• (Det) N (PP) the cat on the

table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy

about it• (NP) (I) (VP) a mouse -- ate it

04/19/23 CPSC503 Winter 2009 12

Context Free Grammar (Example)

• S -> NP VP• NP -> Det NOMINAL• NOMINAL -> Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left

Terminal

Non-terminal

Start-symbol

04/19/23 CPSC503 Winter 2009 13

CFG more complex Example

LexiconGrammar with example phrases

04/19/23 CPSC503 Winter 2009 14

CFGs• Define a Formal Language

(un/grammatical sentences)

• Generative Formalism– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings

in the language

04/19/23 CPSC503 Winter 2009 15

CFG: Formal Definitions

• 4-tuple (non-term., term., productions, start)

• (N, , P, S)

• P is a set of rules A; AN, (N)*

• A derivation is the process of rewriting 1 into m (both strings in (N)*) by

applying a sequence of rules: 1 * m

• L G = W|w* and S * w

04/19/23 CPSC503 Winter 2009 16

Derivations as Trees

flight

Nominal

Nominal

Context Free?

04/19/23 CPSC503 Winter 2009 17

CFG Parsing

• It is completely analogous to running a finite-state transducer with a tape– It’s just more powerful

• Chpt. 13

Parser

I prefer a morning flight

flight

Nominal

Nominal

04/19/23 CPSC503 Winter 2009 18

Other Options• Regular languages (FSA) A xB or A x

– Too weak (e.g., cannot deal with recursion in a general way – no center-embedding)

• CFGs A (also produce more understandable and “useful” structure)

• Context-sensitive A ; ≠– Can be computationally intractable

• Turing equiv. ; ≠– Too powerful / Computationally

intractable

04/19/23 CPSC503 Winter 2009 19

Common Sentence-Types• Declaratives: A plane left

S -> NP VP

• Imperatives: Leave!S -> VP

• Yes-No Questions: Did the plane leave?S -> Aux NP VP

• WH Questions: Which flights serve breakfast?

S -> WH NP VP

When did the plane leave?S -> WH Aux NP VP

04/19/23 CPSC503 Winter 2009 20

NP: more detailsNP -> Specifiers N Complements

• NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom

e.g., all the other cheap cars

• Nom -> Nom PP (PP) (PP) e.g., reservation on BA456 from NY to

YVRNom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening

04/19/23 CPSC503 Winter 2009 21

Conjunctive Constructions• S -> S and S

– John went to NY and Mary followed him

• NP -> NP and NP– John went to NY and Boston

• VP -> VP and VP– John went to NY and visited MOMA

• …• In fact the right rule for English is

X -> X and X

04/19/23 CPSC503 Winter 2009 22

Problems with CFGs

• Agreement

• Subcategorization

04/19/23 CPSC503 Winter 2009 23

Agreement• In English,

– Determiners and nouns have to agree in number

– Subjects and verbs have to agree in person and number

• Many languages have agreement systems that are far more complex than this (e.g., gender).

04/19/23 CPSC503 Winter 2009 24

Agreement

• This dog• Those dogs

• This dog eats• You have it• Those dogs eat

• *This dogs• *Those dog

• *This dog eat• *You has it• *Those dogs

eats

04/19/23 CPSC503 Winter 2009 25

Possible CFG Solution

• S -> NP VP• NP -> Det Nom• VP -> V NP• …

• SgS -> SgNP SgVP• PlS -> PlNp PlVP• SgNP -> SgDet SgNom• PlNP -> PlDet PlNom• PlVP -> PlV NP• SgVP3p ->SgV3p NP• …

Sg = singularPl = plural

OLD Grammar

NEW Grammar

04/19/23 CPSC503 Winter 2009 26

CFG Solution for Agreement

• It works and stays within the power of CFGs

• But it doesn’t scale all that well (explosion in the number of rules)

04/19/23 CPSC503 Winter 2009 27

Subcategorization

• *John sneezed the book• *I prefer United has a flight• *Give with a flight

• Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments (see first table)

04/19/23 CPSC503 Winter 2009 28

Subcategorization

• Sneeze: John sneezed

• Find: Please find [a flight to NY]NP

• Give: Give [me]NP[a cheaper fare]NP

• Help: Can you help [me]NP[with a flight]PP

• Prefer: I prefer [to leave earlier]TO-VP

• Told: I was told [United has a flight]S

• …

04/19/23 CPSC503 Winter 2009 29

So?

• So the various rules for VPs overgenerate.– They allow strings containing verbs

and arguments that don’t go together– For example:

•VP -> V NP therefore Sneezed the book•VP -> V S therefore go she will go there

04/19/23 CPSC503 Winter 2009 30

Possible CFG Solution

• VP -> V• VP -> V NP• VP -> V NP PP• …

• VP -> IntransV• VP -> TransV NP

• VP -> TransPPto NP PPto

• …

• TransPPto -> hand,give,..

This solution has the same problem as the one for agreement

OLD Grammar

NEW Grammar

04/19/23 CPSC503 Winter 2009 31

CFG for NLP: summary• CFGs cover most syntactic structure

in English.

• But there are problems (overgeneration)– That can be dealt with adequately,

although not elegantly, by staying within the CFG framework.

• There are simpler, more elegant, solutions that take us out of the CFG framework: LFG, XTAGS…Chpt 15 “Features and Unification”

04/19/23 CPSC503 Winter 2009 32

Dependency Grammars• Syntactic structure: binary relations

between words

• Links: grammatical function or very general semantic relation

• Abstract away from word-order variations (simpler grammars)

• Useful features in many NLP applications (for classification, summarization and NLG)

04/19/23 CPSC503 Winter 2009 33

Today 2/10

• English Syntax • Context-Free Grammar for English

– Rules– Trees– Recursion– Problems

• Start Parsing (if time left)

04/19/23 CPSC503 Winter 2009 34

Parsing with CFGs

Assign valid trees: covers all and only the elements of the input and has an S at

the top

Parser

I prefer a morning flight

flight

Nominal

Nominal

CFG

Sequence of words Valid parse trees

04/19/23 CPSC503 Winter 2009 35

Parsing as Search• S -> NP VP• S -> Aux NP VP• NP -> Det Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left,

arrive• Aux -> do, does

Search space of possible parse trees

CFG

defines

Parsing: find all trees that cover all and only the words in the input

04/19/23 CPSC503 Winter 2009 36

Constraints on Search

Parser

I prefer a morning flight

flight

Nominal

NominalCFG

(search space)

Sequence of words Valid parse trees

Search Strategies: • Top-down or goal-directed• Bottom-up or data-directed

04/19/23 CPSC503 Winter 2009 37

Top-Down Parsing• Since we’re trying to find trees

rooted with an S (Sentences) start with the rules that give us an S.

• Then work your way down from there to the words. flightInput

:

04/19/23 CPSC503 Winter 2009 38

Next step: Top Down Space

• When POS categories are reached, reject trees whose leaves fail to match all words in the input

…….. …….. ……..

04/19/23 CPSC503 Winter 2009 39

Bottom-Up Parsing• Of course, we also want trees that

cover the input words. So start with trees that link up with the words in the right way.

• Then work your way up from there.

flight

flight

flight

04/19/23 CPSC503 Winter 2009 40

Two more steps: Bottom-Up Space

flightflightflight

flightflight

flightflight

…….. …….. ……..

04/19/23 CPSC503 Winter 2009 41

Top-Down vs. Bottom-Up• Top-down

– Only searches for trees that can be answers

– But suggests trees that are not consistent with the words

• Bottom-up– Only forms trees consistent with the

words– Suggest trees that make no sense

globally

04/19/23 CPSC503 Winter 2009 42

So Combine Them

• Top-down: control strategy to generate trees

• Bottom-up: to filter out inappropriate parses

Top-down Control strategy:• Depth vs. Breadth first• Which node to try to expand next• Which grammar rule to use to expand a

node

(left-most)

(textual order)

04/19/23 CPSC503 Winter 2009 43

Top-Down, Depth-First, Left-to-Right Search

Sample sentence: “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2009 44

Example “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2009 45

flightflight

Example “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2009 46

flight flight

Example “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2009 47

Adding Bottom-up Filtering

The following sequence was a waste of time because an NP cannot generate a parse tree starting with an AUX

Aux Aux Aux Aux

04/19/23 CPSC503 Winter 2009 48

Bottom-Up FilteringCategory Left Corners

S Det, Proper-Noun, Aux, Verb

NP Det, Proper-Noun

Nominal Noun

VP Verb

Aux Aux Aux

04/19/23 CPSC503 Winter 2009 49

Problems with TD-BU-filtering

• Ambiguity• Repeated Parsing

• SOLUTION: Earley Algorithm (once again dynamic programming!)

04/19/23 CPSC503 Winter 2009 50

For Next Time

• Read Chapter 13 (Parsing)• Optional: Read Chapter 16 (Features and

Unification) – skip algorithms and implementation

04/19/23 CPSC503 Winter 2009 51

Grammars and Constituency• Of course, there’s nothing easy or obvious

about how we come up with right set of constituents and the rules that govern how they combine...

• That’s why there are so many different theories of grammar and competing analyses of the same data.

• The approach to grammar, and the analyses, adopted here are very generic (and don’t correspond to any modern linguistic theory of grammar).

04/19/23 CPSC503 Winter 2009 52

Syntactic Notions so far...

• N-grams: prob. distr. for next word can be effectively approximated knowing previous n words

• POS categories are based on:– distributional properties (what other

words can occur nearby) – morphological properties (affixes they

take)


Recommended