Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | clemence-howard |
View: | 213 times |
Download: | 0 times |
1
Using Binsto Empirically Estimate Term Weights
for Text Categorization
Carl Sable (Columbia University)
Kenneth W. Church (AT&T)
2
Binning Overview
I. Task and Corpus: Multimedia news documents
II. Related Work: –Naïve Bayes–Smoothing & Speech Recognition–Binning in Information Retrieval
III. Our Proposal:–Use bins for Text Categorization
IV. Results and Evaluation:–Binning: rarely hurts, sometimes helps
V. Reuters:–Standard benchmark evaluation
VI. Conclusions: Robust version of Naïve Bayes
3
Outdoor Indoor
4
Clues for Indoor/Outdoor:Text (as opposed to Vision)
Denver Summit of Eight leaders begin their first official meeting in the Denver Public Library, June 21.
Villagers look at the broken tail-end of the Fokker 28 Biman Bangladesh Airlines jet December 23, a day after it crash-landed near the town of Sylhet, in northeastern Bangladesh.
5
Event Categories
Politics Struggle
Disaster CrimeOther
6
Manual Categorization Tool
7
Related Work
• Naïve Bayes
• Jelinek, 1998– Smoothing techniques for Speech Recognition– Deleted Interpolation (binning)
• Umemura and Church, 2000– Applied binning to Information Retrieval
)|()(maxarg
i
jijCc
cwPcPcj
8
Bin System:Naïve Bayes + Smoothing
• Binning: based on smoothing in speech recognition
• Not enough training data to estimate weights (log likelihood ratios) for each word– But there would be enough training data if we group
words with similar “features” into a common “bin”
• Estimate a single weight for each bin– This weight is assigned to all words in the bin
• Credible estimates even for small counts (zeros)
9
Intuition WordIndoor Freq
Outdoor Freq IDF Burstiness
Clearly Indoor
conference 15 0 2.5 0
bed 1 0 4.5 0
Clearly Outdoor
airplane 0 2 5.4 1
earthquake 0 4 4.6 1
UnclearGore 1 1 4.5 1
ceremony 5 6 3.9 0
10
“airplane”
• Sparse data• First half of training set:
– “airplane” appears in• 2 outdoor documents
• 0 indoor documents
• Infinitely more likely to be outdoor???• Assign “airplane” to bins of words with similar
features (e.g., IDF, burstiness, counts)
11
Lambdas: Weights• First half of training set: Assign words to bins• Second half of training set: Calibrate
– Average weights over words in bin
binword ||
)(||
1)|(docswordDF
binbinobsP
)|(log2bin binobsP
12
Lambdas for “airplane”:14 times more likely to be outdoor than indoor
410*11.2)binindoor |obs( P
310*90.2)binoutdoor |obs( P
78.3)binoutdoor |obs(
)binindoor |obs(log2
P
P
13
Binning Credible Log Likelihood Ratios
Intuition Word LambdaIndoor Freq
Outdoor Freq IDF Burstiness
Clearly Indoor
conference 5.9 15 0 2.5 0
bed 4.6 1 0 4.5 0
Clearly Outdoor
airplane -3.8 0 2 5.4 1
earthquake -4.9 0 4 4.6 1
UnclearGore 0.7 1 1 4.5 1
ceremony -0.3 5 6 3.9 0
14
Evaluation
• Mutually exclusive categories
• Performance measured by overall accuracy:
sprediction total#
spredictioncorrect #Accuracy
15
Bins: Robust Version of Naïve BayesPerformance is often similar,
but can be much better
70.0%
75.0%
80.0%
85.0%
90.0%
81.0%
83.0%
85.0%
87.0%
89.0%
Bins
Naïve Bayes
Indoor/Outdoor Events: Politics, Struggle, Disaster, Crime, Other
16
Bins: Robust Version of Naïve BayesPerforms well against other alternatives
70.0%
75.0%
80.0%
85.0%
90.0%
81.0%
83.0%
85.0%
87.0%
89.0%Bins
Naïve Bayes
Rocchio 1
KNN
PrInd
SVM
MaxEnt
Rocchio 2
Density
Indoor/Outdoor Events: Politics, Struggle, Disaster, Crime, Other
17
Reuters http://www.research.att.com/~lewis/reuters21578.html
• Common corpus for comparing methods– Over 10,000 articles, 90 topic categories
• Modified method to output multiple cats for each doc– One category per document
• Indoor/outdoor & politics/struggle/disaster/crime/other
– Multiple (0 or more) categories per document • Reuters
Doc #5 grain, wheat, corn, barley, oat, sorghum
Doc # 9earn
Doc # 448gold, acq, platinum
18
Evaluation for Reuters:Accuracy Precision/Recall (F)
• Accuracy is misleading when documents are assigned multiple categories
• Use precision & recall instead
• F-measure: combines precision & recall
• Macro-averaging vs. micro-averaging– Macro: average over categories
– Micro: average over documents
• Macro usually lower– Since small categories are hard
p = a / (a + b)
r = a / (a + c)
Contingency Table:
rp
r*p*2F1
“yes” is correct
“no” is correct
Assigned “yes” a b
Assigned “no” c d
19
Bins: Robust Version of Naïve BayesPerformance is often similar,
but can be much better
Reuters: Micro-F1
79%
81%
83%
85%
87%
NB Bin
Macro-F1
35%
40%
45%
50%
55%
20
Bins: Robust Version of Naïve Bayes
Reuters: Micro-F1
79%
81%
83%
85%
87%
SVM KNN LSF NNet NB Bin
Macro-F1
35%
40%
45%
50%
55%
21
Conclusions
• Binning: Robust version of Naïve Bayes– Often helps, rarely hurts– Smoothing: borrowed from Speech Recognition– Reliable log-likelihood ratios even for small counts:
• airplane: 2 outdoor docs, 0 indoor docs – 14 times more likely to be outdoor than indoor
• Three Evaluations– Indoor vs. Outdoor (mutually exclusive categories)– Events (mutually exclusive categories)– Reuters (many-to-many)