+ All Categories
Home > Documents > Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002.

Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002.

Date post: 02-Jan-2016
Category:
Upload: harry-turner
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
23
Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002
Transcript

Natural Language Processing

Artificial Intelligence

CMSC 25000

February 28, 2002

Agenda

• Why NLP?– Goals & Applications

• Challenges: Knowledge & Ambiguity– Key types of knowledge

• Morphology, Syntax, Semantics, Pragmatics, Discourse

– Handling Ambiguity• Syntactic Ambiguity: Probabilistic Parsing

• Semantic Ambiguity: Word Sense Disambiguation

• Conclusions

Why Language?

• Natural Language in Artificial Intelligence– Language use as distinctive feature of human

intelligence– Infinite utterances:

• Diverse languages with fundamental similarities

• “Computational linguistics”

– Communicative acts• Inform, request,...

Why Language? Applications

• Machine Translation

• Question-Answering– Database queries to web search

• Spoken language systems

• Intelligent tutoring

Knowledge of Language

• What does it mean to know a language?– Know the words (lexicon)

• Pronunciation, Formation, Conjugation

– Know how the words form sentences• Sentence structure, Compositional meaning

– Know how to interpret the sentence• Statement, question,..

– Know how to group sentences• Narrative coherence, dialogue

Word-level Knowledge

• Lexicon: – List of legal words in a language– Part of speech:

• noun, verb, adjective, determiner

• Example:– Noun -> cat | dog | mouse | ball | rock– Verb -> chase | bite | fetch | bat – Adjective -> black | brown | furry | striped | heavy– Determiner -> the | that | a | an

Word-level Knowledge: Issues

• Issue 1: Lexicon Size– Potentially HUGE!– Controlling factor: morphology

• Store base forms (roots/stems)– Use morphologic process to generate / analyze

– E.g. Dog: dog(s); sing: sings, sang, sung, singing, singer,..

• Issue 2: Lexical ambiguity– rock: N/V; dog: N/V; – “Time flies like a banana”

Sentence-level Knowledge: Syntax• Language models

– More than just words: “banana a flies time like”– Formal vs natural: Grammar defines language

ChomskyHierarchy

RecursivelyEnumerable

=Any

Context = AB->BASensitiveContext A-> aBc

Free

Regular S->aS Expression a*b*

nnn cbannba

Syntactic Analysis: Grammars

• Natural vs Formal languages– Natural languages have degrees of acceptability

• ‘It ain’t hard’; ‘You gave what to whom?’

• Grammar combines words into phrases– S-> NP VP– NP -> {Det} {Adj} N– VP -> V | V NP | V NP PP

Syntactic Analysis: Parsing

• Recover phrase structure from sentence– Based on grammar

S

NP VP

Det Adj N V NP

Det Adj N

The black cat chased the furry mouse

Syntactic Analysis: Parsing

• Issue 1: Complexity• Solution 1: Chart parser - dynamic

programming– O( )

• Issue 2: Structural ambiguity– ‘I saw the man on the hill with the telescope’

• Is the telescope on the hill?’

• Solution 2 (partial): Probabilistic parsing

2Gn

Semantic Analysis

• Grammatical = Meaningful– “Colorless green ideas sleep furiously”

• Compositional Semantics– Meaning of a sentence is meaning of subparts– Associate semantic interpretation with syntactic– E.g. Nouns are variables (themselves): cat,mouse

• Adjectives: unary predicates: Black(cat), Furry(mouse)• Verbs: multi-place: VP: x chased(x,Furry(mouse))• Sentence ( x chased(x, Furry(mouse))Black(cat)

– chased(Black(cat),Furry(mouse))

Semantic Ambiguity

• Examples:– I went to the bank-

• of the river• to deposit some money

– He banked • at First Union• the plane

• Interpretation depends on– Sentence (or larger) topic context

– Syntactic structure

Pragmatics & Discourse

• Interpretation in context– Act accomplished by utterance

• “Do you have the time?”, “Can you pass the salt?”

• Requests with non-literal meaning

– Also, includes politeness, performatives, etc

• Interpretation of multiple utterances– “The cat chased the mouse. It got away.”– Resolve referring expressions

Natural Language Understanding

Input Tokenization/Morphology Parsing

SemanticAnalysis

Pragmatics/Discourse

Meaning

• Key issues:– Knowledge

• How acquire this knowledge of language?– Hand-coded? Automatically acquired?

– Ambiguity• How determine appropriate interpretation?

– Pervasive, preference-based

Handling Syntactic Ambiguity

• Natural language syntax • Varied, has DEGREES of acceptability

• Ambiguous

• Probability: framework for preferences– Augment original context-free rules: PCFG– Add probabilities to transitions

NP -> NNP -> Det NNP -> Det Adj NNP -> NP PP

0.2

0.65

0.10

VP -> VVP -> V NPVP -> V NP PP

0.45

0.45

0.10

S -> NP VPS -> S conj S

0.85

0.15

0.05

PP -> P NP1.0

PCFGs

• Learning probabilities– Strategy 1: Write (manual) CFG,

• Use treebank (collection of parse trees) to find probabilities

– Strategy 2: Use larger treebank (+ linguistic constraint)• Learn rules & probabilities (inside-outside algorithm)

• Parsing with PCFGs– Rank parse trees based on probability– Provides graceful degradation

• Can get some parse even for unusual constructions - low value

Parse Ambiguity

• Two parse trees

S

NP VP

N V NP PP

Det N P NPDet N

I saw the man with the telescope

S

NP VP

N V NP

NP PP Det N P NP

Det N

I saw the man with the telescope

Parse Probabilities

– T(ree),S(entence),n(ode),R(ule)– T1 = 0.85*0.2*0.1*0.65*1*0.65 = 0.007– T2 = 0.85*0.2*0.45*0.05*0.65*1*0.65 = 0.003

• Select T1

• Best systems achieve 92-93% accuracy

Tn

nrpSTP ))((),(

Semantic Ambiguity

• “Plant” ambiguity– Botanical vs Manufacturing senses

• Two types of context– Local: 1-2 words away– Global: several sentence window

• Two observations (Yarowsky 1995)– One sense per collocation (local)– One sense per discourse (global)

Learn Disambiguators

• Initialize small set of “seed” cases

• Collect local context information– “collocations”

• E.g. 2 words away from “production”, 1 word from “seed”

• Contexts = rules

• Make decision list= rules ranked by mutual info

• Iterate: Labeling via DL, collecting contexts

• Label all entries in discourse with majority sense– Repeat

Disambiguate

• For each new unlabeled case,– Use decision list to label

• > 95% accurate on set of highly ambiguous– Also used for accent restoration in e-mail

Natural Language Processing

• Goals: Understand and imitate distinctive human capacity

• Myriad applications: MT, Q&A, SLS• Key Issues:

– Capturing knowledge of language• Automatic acquisition current focus: linguistics+ML

– Resolving ambiguity, managing preference• Apply (probabilistic) knowledge

• Effective in constrained environment


Recommended