Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | axel-meyers |
View: | 30 times |
Download: | 0 times |
04/19/23 CPSC503 Winter 2007 1
CPSC 503Computational Linguistics
Computational Lexical SemanticsLecture 12
Giuseppe Carenini
04/19/23 CPSC503 Winter 2007 2
Today 22/10
Three well-defined Semantic Task• Word Sense Disambiguation
– Corpus and Thesaurus
• Word Similarity– Thesaurus and Corpus
• Semantic Role Labeling
04/19/23 CPSC503 Winter 2007 3
WSD example: table + ?? -> [1-6]
The noun "table" has 6 senses in WordNet.1. table, tabular array -- (a set of data …)2. table -- (a piece of furniture …)3. table -- (a piece of furniture with tableware…)4. mesa, table -- (flat tableland …)5. table -- (a company of people …)6. board, table -- (food or meals …)
04/19/23 CPSC503 Winter 2007 4
WSD methods
•Machine Learning – Supervised– Unsupervised
•Dictionary / Thesaurus (Lesk)
04/19/23 CPSC503 Winter 2007 5
Supervised ML Approaches to WSD
MachineLearning
Classifier
TrainingData
((word + context1) sense1)……((word + contextn) sensen)
sense(word + context)
04/19/23 CPSC503 Winter 2007 6
Training Data Example
..after the soup she had bass with a big salad…
((word + context) sense)i
context
Examples, • One of 8 possible senses for “bass” in WordNet• One of the 2 key distinct senses for “bass” in
WordNet
sense
04/19/23 CPSC503 Winter 2007 7
WordNet Bass: music vs. fishThe noun ``bass'' has 8 senses in WordNet1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic
music)3. bass, basso - (an adult male singer with …)4. sea bass, bass - (flesh of lean-fleshed saltwater fish
of the family Serranidae)5. freshwater bass, bass - (any of various North
American lean-fleshed ………)6. bass, bass voice, basso - (the lowest adult male
singing voice)7. bass - (the member with the lowest range of a
family of musical instruments)8. bass -(nontechnical name for any of numerous
edible marine and freshwater spiny-finned fishes)
04/19/23 CPSC503 Winter 2007 8
Representations for Context
• GOAL: Informative characterization of the window of text surrounding the target word
• Supervised ML requires a very simple representation for the training data:
vectors of feature/value pairs
• TASK: Select relevant linguistic information, encode them as a feature vector
04/19/23 CPSC503 Winter 2007 9
Relevant Linguistic Information(1)
• Collocational: info about the words that appear in specific positions to the right and left of the target word
• Example text (WSJ)– An electric guitar and bass player stand off
to one side not really part of the scene, …
Assume a window of +/- 2 from the target
[guitar, NN, and, CJC, player, NN, stand, VVB]
[word in position -n, part-of-speech position -n, …word in position +n, part-of-speech position +n,]
Typically words and their POS
04/19/23 CPSC503 Winter 2007 10
Relevant Linguistic Information(2)
Co-occurrence: info about the words that occur anywhere in the window regardless of position
• Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band))
Vector for one case: [c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar),
c(band)]• Example text (WSJ)
– An electric guitar and bass player stand off to one side not really part of the scene, …
[0,0,0,1,0,0,0,0,0,0,1,0]
04/19/23 CPSC503 Winter 2007 11
Training Data Examples
[guitar, NN, and, CJC, player, NN, stand, VVB, 0] [0,0,0,1,0,0,0,0,0,0,1,0,0]
Let’s assume: bass-music encoded as 0bass-fish encoded as 1
[a, AT0, sea, CJC, to, PRP, me, PNP, 1]
[play, VVB, the, AT0, with, PRP, others, PNP, 0]
[……… ]
[1,0,0,0,0,0,0,0,0,0,0,0,1]
[1,0,0,0,0,0,0,0,0,0,0,1,1]
[…………………..]
• Inputs to classifiers
[guitar, NN, and, CJC, could, VM0, be, VVI] [1,1,0,0,0,1,0,0,0,0,0,0]
04/19/23 CPSC503 Winter 2007 12
ML for Classifiers
MachineLearning
Training Data:•Co-occurrence•Collocational
Classifier
• Naïve Bayes• Decision lists• Decision trees• Neural nets• Support vector machines• Nearest neighbor
methods…
04/19/23 CPSC503 Winter 2007 13
Naïve Bayes
)|( argmaxs VsPSs
P(V)
P(s)P(V|s)
Ss argmaxs
n
jj
SssvPsP
1 )|()( argmaxs
n
jj svPsVP
1)|()|(
Independence
04/19/23 CPSC503 Winter 2007 14
Naïve Bayes: EvaluationExperiment comparing different
classifiers [Mooney 96]• Naïve Bayes and Neural Network
achieved highest performance• 73% in assigning one of six senses
to line • Is this good?
• Simplest Baseline: “most frequent sense”• Celing: human inter-annotator agreement
– 75%-80% on refined sense distinctions (wordnet)
– Closer to 90% for binary distinctions
04/19/23 CPSC503 Winter 2007 15
Bootstrapping• What if you don’t have enough data
to train a system…
More DataMoreClassifiedData
MachineLearning
SmallTrainingData
Classifier
seeds
04/19/23 CPSC503 Winter 2007 16
Bootstrapping: how to pick the seeds
E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense
• Hand-labeling (Hearst 1991):– Likely correct– Likely to be prototypical
• One sense per collocation (Yarowsky 1995):
• One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense
04/19/23 CPSC503 Winter 2007 17
Unsupervised Methods [Schutze ’98]
MachineLearning
(Clustering)TrainingData
K Clusters ci
(word + vector)1
……(word + vector)n
Hand-labeling
(c1 sense1)……
(word + vector) senseVector/cluster
Similarity
04/19/23 CPSC503 Winter 2007 18
Agglomerative Clustering
• Assign each instance to its own cluster• Repeat
– Merge the two clusters that are more similar
• Until (specified # of clusters is reached)
• If there are too many training instances ->random sampling
04/19/23 CPSC503 Winter 2007 19
Problems
• Given these general ML approaches, how many classifiers do I need to perform WSD robustly– One for each ambiguous word in the
language
• How do you decide what set of tags/labels/senses to use for a given word?– Depends on the application
04/19/23 CPSC503 Winter 2007 20
WDS: Dictionary and Thesaurus Methods
Most common: Lesk method• Choose the sense whose
dictionary gloss shares most words with the target word’s neighborhood
• Exclude stop-wordsDef: Words in gloss for a sense
is called the signature
04/19/23 CPSC503 Winter 2007 21
Lesk: ExampleTwo SENSES for channel
S1: (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rainwater into a series of channels under the street"
S2: (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" …..
“ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain .”
04/19/23 CPSC503 Winter 2007 22
Corpus LeskBest performer• If a corpus with annotated senses is
available• For each sense: add all the words in
the sentences containing that sense to the signature for that sense
CORPUS……“most streets closed to the TV station were flooded because the main <S1> channel </S1> was clogged by heavy rain.….. ?
04/19/23 CPSC503 Winter 2007 23
WSD: More Recent Trends
• Better ML techniques (e.g., Combining Classifiers)
• Combining ML and Lesk
• Other Languages
• Building better/larger corpora
04/19/23 CPSC503 Winter 2007 24
Today 22/10
• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling
04/19/23 CPSC503 Winter 2007 25
Word SimilarityActually relation between two sensesSimilarity vs. Relatednesssun vs. moon – mouth vs. food – hot vs.
cold
Applications?
• Thesaurus methods: measure distance in online thesauri (e.g., Wordnet)
• Distributional methods: finding if the two words appear in similar contexts
04/19/23 CPSC503 Winter 2007 26
WS: Thesaurus Methods(1)• Path-length based sim on hyper/hypo
hierarchies),(log),(sim 2121path ccpathlencc
• Information content word similarity (not all edges are equal)
N
ccount
c csubsensesci
i
)(
)(
)(P)(log)(I cPcC ),( 21 ccLCSprobability
Information Lowest Common Subsumer
)),((log),(sim 2121resnik ccLCSPcc
04/19/23 CPSC503 Winter 2007 27
WS: Thesaurus Methods(2)• One of best performers – Jiang-
Conrath distance
))(log)((log)),((log2),(d 212121 cPcPccLCSPccistJC
• This is a measure of distance. Reciprocal for similarity!
• See also Extended Lesk
04/19/23 CPSC503 Winter 2007 28
WS: Distributional Methods• Do not have any thesauri for target
language• If you have thesaurus, still
– Missing domain-specific (e.g., technical words)– Poor hyponym knowledge (for V) and nothing for
Adj and Adv– Difficult to compare senses from different
hierarchies• Solution: extract similarity from corpora
• Basic idea: two words are similar if they appear in similar contexts
04/19/23 CPSC503 Winter 2007 29
WS Distributional Methods (1)
• Simple Context: feature vector
Example: fi how many times wi appeared in the neighborhood of w
Stop list
),...,,( 21 Nfffw
• More Complex Context: feature matrix
aij how many times wi appeared in the neighborhood of w and was related to w by the syntactic relation rj
][ijA
04/19/23 CPSC503 Winter 2007 30
WS Distributional Methods (2)
• More informative values (referred to as weights or measure of association in the literature)
• Point-wise Mutual Information
)()(
),(log),( 2 fPwP
fwPfwassocPMI
• t-test
)()(
)()(),(),(
fPwP
fPwPfwPfwassoct testt
04/19/23 CPSC503 Winter 2007 31
WS Distributional Methods (3)• Similarity between vectors
Not sensitive to extreme values
)cos(),(cos
wv
wv
w
w
v
vwvsim ine
v
w
N
iii
N
iii
Jaccard
wv
wvwvsim
1
1
),max(
),min(),(
Normalized
(weighted) number of overlapping features
04/19/23 CPSC503 Winter 2007 32
WS Distributional Methods (4)
• Best combination overall– t-test for weights– Jaccard (or Dice) for vector
similarity
04/19/23 CPSC503 Winter 2007 33
Today 22/10
• Word Sense Disambiguation• Word Similarity• Semantic Role Labeling
04/19/23 CPSC503 Winter 2007 34
Semantic Role LabelingTypically framed as a classification
problem [Gildea, Jurfsky 2002]1. Assign parse tree to input2. Find all predicate-bearing words
(PropBank, FrameNet)3. For each predicate:
determine for each synt. constituent which role (if any) it plays with respect to the predicate
Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others
04/19/23 CPSC503 Winter 2007 35
Semantic Role Labeling: Example
[issued, NP, Examiner, NNP, NPSVPVBD, active, before, …..]
04/19/23 CPSC503 Winter 2007 36
Next Time
• Discourse and Dialog • Overview of Chapters 21 and 24