Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | claribel-carroll |
View: | 217 times |
Download: | 0 times |
04/19/23 CPSC503 Winter 2009 3
Evaluating Taggers
•Accuracy: percent correct (most current taggers 96-7%) *test on unseen data!*
•Human Celing: agreement rate of humans on classification (96-7%)
•Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%)
•What is causing the errors? Build a confusion matrix…
04/19/23 CPSC503 Winter 2009 5
Error Analysis (textbook)
• Look at a confusion matrix
• See what errors are causing problems– Noun (NN) vs ProperNoun (NNP) vs Adj (JJ)– Past tense (VBD) vs Past Participle (VBN)
04/19/23 CPSC503 Winter 2009 6
Knowledge-Formalisms Map(next three lectures)
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners
04/19/23 CPSC503 Winter 2009 7
Today 1/10
• Finish POS tagging• English Syntax • Context-Free Grammar for English
– Rules– Trees– Recursion– Problems
• Start Parsing
04/19/23 CPSC503 Winter 2009 8
SyntaxDef. The study of how sentences are formed by
grouping and ordering words
Example: Ming and Sue prefer morning flights
* Ming Sue flights morning and prefer
Groups behave as single unit wrt Substitution, Movement, Coordination
04/19/23 CPSC503 Winter 2009 9
Syntax: Useful tasks
• Why should you care?– Grammar checkers– Basis for semantic interpretation
•Question answering •Information extraction•Summarization
– Machine translation– ……
04/19/23 CPSC503 Winter 2009 10
Key Constituents – with heads (English)• Noun phrases• Verb phrases• Prepositional
phrases • Adjective phrases• Sentences
• (Det) N (PP)• (Qual) V (NP)• (Deg) P (NP)• (Deg) A (PP)• (NP) (I) (VP)
Some simple specifiersCategory Typical function
ExamplesDeterminer specifier of N the, a, this,
no..Qualifier specifier of V never,
often..Degree word specifier of A or P very,
almost..
Complements?
(Specifier) X (Complement)
04/19/23 CPSC503 Winter 2009 11
Key Constituents: Examples• Noun phrases
• Verb phrases
• Prepositional phrases
• Adjective phrases
• Sentences
• (Det) N (PP) the cat on the
table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy
about it• (NP) (I) (VP) a mouse -- ate it
04/19/23 CPSC503 Winter 2009 12
Context Free Grammar (Example)
• S -> NP VP• NP -> Det NOMINAL• NOMINAL -> Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left
Terminal
Non-terminal
Start-symbol
04/19/23 CPSC503 Winter 2009 14
CFGs• Define a Formal Language
(un/grammatical sentences)
• Generative Formalism– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings
in the language
04/19/23 CPSC503 Winter 2009 15
CFG: Formal Definitions
• 4-tuple (non-term., term., productions, start)
• (N, , P, S)
• P is a set of rules A; AN, (N)*
• A derivation is the process of rewriting 1 into m (both strings in (N)*) by
applying a sequence of rules: 1 * m
• L G = W|w* and S * w
04/19/23 CPSC503 Winter 2009 17
CFG Parsing
• It is completely analogous to running a finite-state transducer with a tape– It’s just more powerful
• Chpt. 13
Parser
I prefer a morning flight
flight
Nominal
Nominal
04/19/23 CPSC503 Winter 2009 18
Other Options• Regular languages (FSA) A xB or A x
– Too weak (e.g., cannot deal with recursion in a general way – no center-embedding)
• CFGs A (also produce more understandable and “useful” structure)
• Context-sensitive A ; ≠– Can be computationally intractable
• Turing equiv. ; ≠– Too powerful / Computationally
intractable
04/19/23 CPSC503 Winter 2009 19
Common Sentence-Types• Declaratives: A plane left
S -> NP VP
• Imperatives: Leave!S -> VP
• Yes-No Questions: Did the plane leave?S -> Aux NP VP
• WH Questions: Which flights serve breakfast?
S -> WH NP VP
When did the plane leave?S -> WH Aux NP VP
04/19/23 CPSC503 Winter 2009 20
NP: more detailsNP -> Specifiers N Complements
• NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom
e.g., all the other cheap cars
• Nom -> Nom PP (PP) (PP) e.g., reservation on BA456 from NY to
YVRNom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening
04/19/23 CPSC503 Winter 2009 21
Conjunctive Constructions• S -> S and S
– John went to NY and Mary followed him
• NP -> NP and NP– John went to NY and Boston
• VP -> VP and VP– John went to NY and visited MOMA
• …• In fact the right rule for English is
X -> X and X
04/19/23 CPSC503 Winter 2009 23
Agreement• In English,
– Determiners and nouns have to agree in number
– Subjects and verbs have to agree in person and number
• Many languages have agreement systems that are far more complex than this (e.g., gender).
04/19/23 CPSC503 Winter 2009 24
Agreement
• This dog• Those dogs
• This dog eats• You have it• Those dogs eat
• *This dogs• *Those dog
• *This dog eat• *You has it• *Those dogs
eats
04/19/23 CPSC503 Winter 2009 25
Possible CFG Solution
• S -> NP VP• NP -> Det Nom• VP -> V NP• …
• SgS -> SgNP SgVP• PlS -> PlNp PlVP• SgNP -> SgDet SgNom• PlNP -> PlDet PlNom• PlVP -> PlV NP• SgVP3p ->SgV3p NP• …
Sg = singularPl = plural
OLD Grammar
NEW Grammar
04/19/23 CPSC503 Winter 2009 26
CFG Solution for Agreement
• It works and stays within the power of CFGs
• But it doesn’t scale all that well (explosion in the number of rules)
04/19/23 CPSC503 Winter 2009 27
Subcategorization
• *John sneezed the book• *I prefer United has a flight• *Give with a flight
• Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments (see first table)
04/19/23 CPSC503 Winter 2009 28
Subcategorization
• Sneeze: John sneezed
• Find: Please find [a flight to NY]NP
• Give: Give [me]NP[a cheaper fare]NP
• Help: Can you help [me]NP[with a flight]PP
• Prefer: I prefer [to leave earlier]TO-VP
• Told: I was told [United has a flight]S
• …
04/19/23 CPSC503 Winter 2009 29
So?
• So the various rules for VPs overgenerate.– They allow strings containing verbs
and arguments that don’t go together– For example:
•VP -> V NP therefore Sneezed the book•VP -> V S therefore go she will go there
04/19/23 CPSC503 Winter 2009 30
Possible CFG Solution
• VP -> V• VP -> V NP• VP -> V NP PP• …
• VP -> IntransV• VP -> TransV NP
• VP -> TransPPto NP PPto
• …
• TransPPto -> hand,give,..
This solution has the same problem as the one for agreement
OLD Grammar
NEW Grammar
04/19/23 CPSC503 Winter 2009 31
CFG for NLP: summary• CFGs cover most syntactic structure
in English.
• But there are problems (overgeneration)– That can be dealt with adequately,
although not elegantly, by staying within the CFG framework.
• There are simpler, more elegant, solutions that take us out of the CFG framework: LFG, XTAGS…Chpt 15 “Features and Unification”
04/19/23 CPSC503 Winter 2009 32
Dependency Grammars• Syntactic structure: binary relations
between words
• Links: grammatical function or very general semantic relation
• Abstract away from word-order variations (simpler grammars)
• Useful features in many NLP applications (for classification, summarization and NLG)
04/19/23 CPSC503 Winter 2009 33
Today 2/10
• English Syntax • Context-Free Grammar for English
– Rules– Trees– Recursion– Problems
• Start Parsing (if time left)
04/19/23 CPSC503 Winter 2009 34
Parsing with CFGs
Assign valid trees: covers all and only the elements of the input and has an S at
the top
Parser
I prefer a morning flight
flight
Nominal
Nominal
CFG
Sequence of words Valid parse trees
04/19/23 CPSC503 Winter 2009 35
Parsing as Search• S -> NP VP• S -> Aux NP VP• NP -> Det Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left,
arrive• Aux -> do, does
Search space of possible parse trees
CFG
defines
Parsing: find all trees that cover all and only the words in the input
04/19/23 CPSC503 Winter 2009 36
Constraints on Search
Parser
I prefer a morning flight
flight
Nominal
NominalCFG
(search space)
Sequence of words Valid parse trees
Search Strategies: • Top-down or goal-directed• Bottom-up or data-directed
04/19/23 CPSC503 Winter 2009 37
Top-Down Parsing• Since we’re trying to find trees
rooted with an S (Sentences) start with the rules that give us an S.
• Then work your way down from there to the words. flightInput
:
04/19/23 CPSC503 Winter 2009 38
Next step: Top Down Space
• When POS categories are reached, reject trees whose leaves fail to match all words in the input
…….. …….. ……..
04/19/23 CPSC503 Winter 2009 39
Bottom-Up Parsing• Of course, we also want trees that
cover the input words. So start with trees that link up with the words in the right way.
• Then work your way up from there.
flight
flight
flight
04/19/23 CPSC503 Winter 2009 40
Two more steps: Bottom-Up Space
flightflightflight
flightflight
flightflight
…….. …….. ……..
04/19/23 CPSC503 Winter 2009 41
Top-Down vs. Bottom-Up• Top-down
– Only searches for trees that can be answers
– But suggests trees that are not consistent with the words
• Bottom-up– Only forms trees consistent with the
words– Suggest trees that make no sense
globally
04/19/23 CPSC503 Winter 2009 42
So Combine Them
• Top-down: control strategy to generate trees
• Bottom-up: to filter out inappropriate parses
Top-down Control strategy:• Depth vs. Breadth first• Which node to try to expand next• Which grammar rule to use to expand a
node
(left-most)
(textual order)
04/19/23 CPSC503 Winter 2009 43
Top-Down, Depth-First, Left-to-Right Search
Sample sentence: “Does this flight include a meal?”
04/19/23 CPSC503 Winter 2009 47
Adding Bottom-up Filtering
The following sequence was a waste of time because an NP cannot generate a parse tree starting with an AUX
Aux Aux Aux Aux
04/19/23 CPSC503 Winter 2009 48
Bottom-Up FilteringCategory Left Corners
S Det, Proper-Noun, Aux, Verb
NP Det, Proper-Noun
Nominal Noun
VP Verb
Aux Aux Aux
04/19/23 CPSC503 Winter 2009 49
Problems with TD-BU-filtering
• Ambiguity• Repeated Parsing
• SOLUTION: Earley Algorithm (once again dynamic programming!)
04/19/23 CPSC503 Winter 2009 50
For Next Time
• Read Chapter 13 (Parsing)• Optional: Read Chapter 16 (Features and
Unification) – skip algorithms and implementation
04/19/23 CPSC503 Winter 2009 51
Grammars and Constituency• Of course, there’s nothing easy or obvious
about how we come up with right set of constituents and the rules that govern how they combine...
• That’s why there are so many different theories of grammar and competing analyses of the same data.
• The approach to grammar, and the analyses, adopted here are very generic (and don’t correspond to any modern linguistic theory of grammar).