Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 223 times |
Download: | 0 times |
1
Natural Language Processing
Information RetrievalSpeech Recognition
Syntactic ParsingSemantic Interpretation
CSE 592 Applications of AIWinter 2003
2
Example Applications
• Spelling and grammar checkers
• Finding information on the WWW
• Spoken language control systems: banking, shopping
• Classification systems for messages, articles
• Machine translation tools
3
The Dream
4
Information Retrieval
(Thanks to Adam Carlson)
5
Motivation and Outline
• Background– Definitions
• The Problem– 100,000+ pages
• The Solution– Ranking docs– Vector space– Probabilistic
approaches
• Extensions– Relevance feedback, clustering, query expansion, etc.
6
What is Information Retrieval
• Given a large repository of documents, how do I get at the ones that I want– Examples: Lexus/Nexus, Medical reports,
AltaVista
• Different from databases– Unstructured (or semi-structured) data– Information is (typically) text– Requests are (typically) word-based
7
Information Retrieval Task
• Start with a set of documents• User specifies information need
– Keyword query, Boolean expression, high-level description
• System returns a list of documents– Ordered according to relevance
• Known as the ad-hoc retrieval problem
8
Measuring Performance
• Precision– Proportion of selected
items that are correct
• Recall– Proportion of target
items that were selected
• Precision-Recall curve– Shows tradeoff
tn
fp tp fn
System returned these
Actual relevant docs
fptp
tp
fntp
tp
Precision
Recall
9
Basic IR System
• Use word overlap to determine relevance– Word overlap alone is inaccurate
• Rank documents by similarity to query
• Computed using Vector Space Model
10
Vector Space Model
• Represent documents as a matrix– Words are rows– Documents are columns– Cell i,j contains the number of times word i
appears in document j– Similarity between two documents is the
cosine of the angle between the vectors representing those words
11
Vector Space Example
a: System and human system engineering testing of EPS
b: A survey of user opinion of computer system response time
c: The EPS user interface management system d: Human machine interface for ABC computer
applications e: Relation of user perceived response time to
error measurement f: The generation of random, binary, ordered
trees g: The intersection graph of paths in trees h: Graph minors IV: Widths of trees and well-
quasi-ordering i: Graph minors: A survey
a b c d e f g h IInterface 0 0 1 0 0 0 0 0 0User 0 1 1 0 1 0 0 0 0System 2 1 1 0 0 0 0 0 0Human 1 0 0 1 0 0 0 0 0Computer 0 1 0 1 0 0 0 0 0Response 0 1 0 0 1 0 0 0 0Time 0 1 0 0 1 0 0 0 0EPS 1 0 1 0 0 0 0 0 0Survey 0 1 0 0 0 0 0 0 1Trees 0 0 0 0 0 1 1 1 0Graph 0 0 0 0 0 0 1 1 1Minors 0 0 0 0 0 0 0 1 1
12
Vector Space Example cont.
system
interfaceuser
a
c
b
||||)cos(
BA
BAAB
a b cInterface 0 0 1User 0 1 1System 2 1 1
13
Similarity in Vector Space
n
iiAA
1
2||
nnBABABABA ...2211
BA
BAAB
)cos(
Measures word overlap
Normalizes for different length vectors
Other m
etrics
exist
14
Answering a Query UsingVector Space
• Represent query as vector
• Compute distances to all documents
• Rank according to distance
• Example– “computer system”
Query a b c d e f g h IInterface 0 0 0 1 0 0 0 0 0 0User 0 0 1 1 0 1 0 0 0 0System 1 2 1 1 0 0 0 0 0 0Human 0 1 0 0 1 0 0 0 0 0Computer 1 0 1 0 1 0 0 0 0 0Response 0 0 1 0 0 1 0 0 0 0Time 0 0 1 0 0 1 0 0 0 0EPS 0 1 0 1 0 0 0 0 0 0Survey 0 0 1 0 0 0 0 0 0 1Trees 0 0 0 0 0 0 1 1 1 0Graph 0 0 0 0 0 0 0 1 1 1Minors 0 0 0 0 0 0 0 0 1 1
15
Common Improvements
• The vector space model– Doesn’t handle morphology (eat, eats, eating)– Favors common terms
• Possible fixes– Stemming
• Convert each word to a common root form
– Stop lists– Term weighting
16
Handling Common Terms
• Stop list– List of words to ignore
• “a”, “and”, “but”, “to”, etc.
• Term weighting– Words which appear everywhere aren’t very
good discriminators – give higher weight to rare words
17
tf * idf
)/log(* kikik nNtfw
log
Tcontain that in documents ofnumber the
collection in the documents ofnumber total
in T termoffrequency document inverse
document in T termoffrequency
document in term
nNidf
Cn
CN
Cidf
Dtf
DkT
kk
kk
kk
ikik
ik
18
Inverse Document Frequency
• IDF provides high values for rare words and low values for common words
41
10000log
698.220
10000log
301.05000
10000log
010000
10000log
For a collectionof 10000 documents
19
Probabilistic IR
• Vector space model robust in practice• Mathematically ad-hoc
– How to generalize to more complex queries?(intel or microsoft) and (not stock)
• Alternative approach: model problem as finding documents with highest probability of being relevant to the query– Requires making some simplifying assumptions about
underlying probability distributions– In certain cases can be shown to yield same results as
vector space model
20
Probability Ranking Principle
For a given query Q, find the documents D that maximize the odds that the document is relevant (R):
( | , ) ( | )( | , )
( | , ) ( | )
P r D Q P r DP Q D r
P r D Q P r D
21
Probability Ranking Principle
For a given query Q, find the documents D that maximize the odds that the document is relevant (R):
( | , ) ( | )( | , )
( | , ) ( | )
P r D Q P r DP Q D r
P r D Q P r D
Probability of document relevance to any query – i.e., the inherent quality of the document
22
Probability Ranking Principle
For a given query Q, find the documents D that maximize the odds that the document is relevant (R):
( | , ) ( | )( | , )
( | , ) ( | )
P r D Q P r DP Q D r
P r D Q P r D
Probability that if document is indeed relevant, then the query is in fact Q
But where do we get that number?
23
Bayesian nets for text retrieval
d1 d2
w1 w3
c1 c3
q1 q2
q0
w2
c2
DocumentNetwork
QueryNetwork
Documents
Words
Concepts
Query operators(AND/OR/NOT)
Information need
24
Bayesian nets for text retrieval
d1 d2
w1 w3
c1 c3
q1 q2
q0
w2
c2
DocumentNetwork
QueryNetwork
Documents
Words
Concepts
Query operators(AND/OR/NOT)
Information need
Computed once for entire collection
25
Bayesian nets for text retrieval
d1 d2
w1 w3
c1 c3
q1 q2
q0
w2
c2
DocumentNetwork
QueryNetwork
Documents
Words
Concepts
Query operators(AND/OR/NOT)
Information need
Computed for each query
26
Conditional Probability Tables
P(d) = prior probability document d is relevant Uniform model: P(d) = 1 / Number docs In general, document quality P(r | d)
P(w | d) = probability that a random word from document d is w Term frequency
P(c | w) = probability that a given document word w has same meaning as a query word c Thesarus
P(q | c1, c2, …) = canonical form of operators AND, OR, NOT, etc.
27
Example
Hamlet Macbeth
reason double
reason two
OR NOT
AND
trouble
trouble
DocumentNetwork
QueryNetwork
User Query
28
Details
Set head q0 of user query to “true” Compute posterior probability P(D | q0) “User information need” doesn’t have to
be a query - can be a user profile, e.g., other documents user has read
Instead of just words, can include phrases, inter-document links
Link matrices can be modified over time. User feedback The promise of “personalization”
29
Extensions
• Meet demands of web-based systems
• Modified ranking functions for the web
• Relevance feedback
• Query expansion
• Document clustering
• Latent Semantic Indexing
• Other IR tasks
30
IR on the Web
• Query AltaVista with “Java”– Almost 107 pages found
• Avoiding latency– User wants (initial) results fast
• Solution– Rank documents using word-overlap– Use special data structure - inverted index
31
Improved Ranking on the Web
• Not just arbitrary documents
• Can use HTML tags and other properties– Query term in <TITLE></TITLE>– Query term in <IMG>, <HREF>, etc. tag– Check date of document (prefer recent docs)– PageRank (Google)
32
PageRank
• Idea: Good pages link to other good pages– Round 1: count in-links Problems?– Round 2: sum weighted in-links– Round 3: and again, and again…
• Implementation: Repeated random walk on snapshot of the web– weight frequency
visited
33
Relevance Feedback
• System returns initial set of documents
• User identifies relevant documents
• System refines query to get documents more like those identified by user– Add words common to relevant docs– Reposition query vector closer to relevant docs
• Lather, rinse, repeat…
34
Query Expansion
• Given query, add words to improve recall– Workaround for synonym problem
• Example– boat boat OR ship
• Can involve user feedback or not
• Can use thesaurus or other online source– WordNet
35
Document Clustering
• Group similar documents– Similar means “close in vector space”
• If a document is relevant, return whole cluster
• Can be combined with relevance feedback
• GROUPERhttp://www.cs.washington.edu/research/clustering
36
Clustering Algorithms
• K-means
• Hierarchical Agglomerative Clustering
Initialize k cluster centersLoop
Assign all document to closest centerMove cluster centers to better fit assignment
Until little movement
Initialize each document to a singleton clusterLoop
Merge two closest clustersUntil k clusters exist
Clusters
Cluster centers
Many ways to measure
distance between cluste
rs
37
Latent Semantic Indexing
• Creates modified vector space
• Captures transitive co-occurrence information– If docs A & B don’t share any words, with
each other, but both share lots of words with doc C, then A & B will be considered similar
• Simulates query expansion and document clustering (sort of)
38
Variations on a Theme
• Text Categorization– Assign each document to a category
– Example: automatically put web pages in Yahoo hierarchy
• Routing & Filtering– Match documents with users
– Example: news service that allows subscribers to specify “send news about high-tech mergers”
39
Speech Recognition
TO BE COMPLETED
40
Syntactic ParsingSemantic Interpretation
TO BE COMPLETED
41
NLP Research Areas
• Morphology: structure of words• Syntactic interpretation (parsing): create a parse
tree of a sentence.• Semantic interpretation: translate a sentence into
the representation language.– Pragmatic interpretation: incorporate current situation
into account.
– Disambiguation: there may be several interpretations. Choose the most probable
42
Some Difficult Examples
• From the newspapers:– Squad helps dog bite victim.– Helicopter powered by human flies.– Levy won’t hurt the poor.– Once-sagging cloth diaper industry saved by full
dumps.
• Ambiguities:– Lexical: meanings of ‘hot’, ‘back’.– Syntactic: I heard the music in my room.– Referential: The cat ate the mouse. It was ugly.
43
Parsing
• Context-free grammars:
EXPR -> NUMBEREXPR -> VARIABLEEXPR -> (EXPR + EXPR)EXPR -> (EXPR * EXPR)
• (2 + X) * (17 + Y) is in the grammar.• (2 + (X)) is not.• Why do we call them context-free?
44
Using CFG’s for Parsing
• Can natural language syntax be captured using a context-free grammar?– Yes, no, sort of, for the most part, maybe.
• Words:– nouns, adjectives, verbs, adverbs.– Determiners: the, a, this, that– Quantifiers: all, some, none– Prepositions: in, onto, by, through– Connectives: and, or, but, while.– Words combine together into phrases: NP, VP
45
An Example Grammar
• S -> NP VP• VP -> V NP• NP -> NAME• NP -> ART N• ART -> a | the• V -> ate | saw• N -> cat | mouse• NAME -> Sue | Tom
46
Example Parse
• The mouse saw Sue.
47
Ambiguity
• S -> NP VP• VP -> V NP • VP -> V NP NP• NP -> N• NP -> N N• NP -> Det NP• Det -> the• V -> ate | saw | bought• N -> cat | mouse |biscuits | Sue | Tom
“Sue bought the cat biscuits”
48
Example: Chart Parsing
• Three main data structures: a chart, a key list, and a set of edges
• Chart:
1 32
1
4
2
3
4
Starting pointslengt
h
Name of terminal or non-terminal
49
Key List and Edges
• Key list: Push down stack of chart entries– “the” “box” “floats”
• Edges: rules that can be applied to chart entries to build up larger entries
1 32
1
4
2
3
4length
the
box
floats
detthe o
50
Chart Parsing Algorithm
• Loop while entries in key list– 1. Remove entry from key list
– 2. If entry already in chart, • Add edge list
• Break
– 3. Add entry from key list to chart
– 4. For all rules that begin with entry’s type, add an edge for that rule
– 5. For all edges that need the entry next, add an extended edge (see algorithm on right)
– 6. If the edge is finished, add an entry to the key list with type, start point, length, and edge list
• To extend an edge with chart entry c
– Create a new edge e’
– Set start (e’) to start (e)
– Set end(e’) to end(e)
– Set rule(e’) to rule(e) with “o” moved beyond c.
– Set the righthandside(e’) to the righthandside(e)+c
51
Try it
• S -> NP VP
• VP -> V
• NP -> Det N
• Det -> the
• N -> box
• V -> floats
52
Semantic Interpretation
• Our goal: to translate sentences into a logical form.
• But: sentences convey more than true/false:– It will rain in Seattle tomorrow.– Will it rain in Seattle tomorrow?
• A sentence can be analyzed by:– propositional content, and– speech act: tell, ask, request, deny, suggest
53
Propositional Content
• We develop a logic-like language for representing propositional content:– Word-sense ambiguity – Scope ambiguity
• Proper names --> objects (John, Alon)• Nouns --> unary predicates (woman, house)• Verbs -->
– transitive: binary predicates (find, go)– intransitive: unary predicates (laugh, cry)
• Quantifiers: most, some• Example: Mary: Loves(John, Mary)
54
From Syntax to Semantics
• ADD SLIDES ON SEMANTIC INTERPRETATION
55
Word Sense Disambiguation
• ADD SLIDES!
56
Statistical NLP
• Consider the problem of tagging part-of-speech:– “The box floats”– “The” Det; “Box” N; “Floats” V;
• Given a sentence w(1,n), where w(i) is the i-th word, we want to find tags t(i) assigned to each word w(i)
57
The Equations
• Find the t(1,n) that maximizes– P[t(1,n)|w(1,n)]=P[w(1,n)|t(1,n)]/P(w(1,n))– So, only need to maximize P[w(1,n)|t(1,n)]
• Assume that – A word depends only on previous tag– A tag depends only on previous tag– We have:
• P[w(j)|w(1,j-1),t(1,j)]=P[w(j)|t(j)], and• P[t(j)|w(1,j-1),t(1,j-1)] = P(t(j)|t(j-1)]
– Thus, want to maximize• P[w(n)|t(n-1)]*P[t(n+1)|t(n)]*P[w(n-1)|t(n-2)]*P[t(n)|t(n-1)]…
58
Example• “The box floats”: given a corpus (a training set)
– Assignment one:• T(1)=det, T(2) = V, T(3)=V
• P(V|det) is rather low, so is P(V|V). Thus is less likely compared to
– Assignment two: • T(t)=det, T(2) = N; t(3) = V
• P(N|det) is high, and P(V|N) is high, thus is more likely!
– In general, can use Hidden Markov Models to find probabilities
det N Vbox
floats
the
59
Experiments
• Charniak and Colleagues did some experiments on a collection of documents called the “Brown Corpus”, where tags are assigned by hand.
• 90% of the corpus are used for training and the other 10% for testing
• They show they can get 95% correctness with HMM’s.
• A really simple algorithm: assign t to w by the highest probability tag P(t|w) 91% correctness!
60
Natural Language Summary
• Parsing:– context free grammars with features.
• Semantic interpretation:– Translate sentences into logic-like language– Use additional domain knowledge for word-
sense disambiguation.– Use context to disambiguate references.