Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | marsden-salazar |
View: | 23 times |
Download: | 3 times |
Rare Category Detection
Jingrui HeMachine Learning Department
Carnegie Mellon University
Joint work with Jaime Carbonell
11/17/2008 Machine Learning Lunch 2
What’s Rare Category Detection
Start de-novo Very skewed classes
Majority classes Minority classes
Labeling oracle Goal
Discover minority classes with a few label requests
11/17/2008 Machine Learning Lunch 3
Comparison with Outlier Detection Rare classes
A group of points Clustered Non-separable from the
majority classes
Outliers A single point Scattered Separable
11/17/2008 Machine Learning Lunch 4
Comparison with Active Learning Rare category
detection Initial condition: NO
labeled examples
Goal: discover the minority classes with the least label requests
Active learning
Initial condition: labeled examples from each class
Goal: improve the performance of the current classifier with the least label requests
11/17/2008 Machine Learning Lunch 5
ApplicationsNetwork intrusion detection
Astronomy
Fraud detection
Spam image detection
11/17/2008 Machine Learning Lunch 6
The Big Picture
UnbalancedUnlabeledData Set
RareCategoryDetection
Learning inUnbalanced
Settings
Classifier
RawData
Spatial
Relational
Temporal
FeatureExtraction
11/17/2008 Machine Learning Lunch 7
Outline Problem definition Related work Rare category detection for spatial data
Prior-dependent rare category detection Prior-free rare category detection
Conclusion
11/17/2008 Machine Learning Lunch 8
Related Work Pelleg & Moore 2004
Mixture model Different selection criteria
Fine & Mansour 2006 Generic consistency algorithm Upper bounds and lower bounds
Papadimitriou et al 2003 LOCI algorithm for groups of outliers
Separable orNear-separable
-20 -15 -10 -5 0 5 10 15 20
-15
-10
-5
0
5
10
15
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
-15 -10 -5 0 5 10 15
-10
-5
0
5
10
15
11/17/2008 Machine Learning Lunch 9
Outline Problem definition Related work Rare category detection for spatial data
Prior-dependent rare category detection Prior-free rare category detection
Conclusion
11/17/2008 Machine Learning Lunch 10
Notations Unlabeled examples: , m Classes: m-1 rare classes: One majority class: ,
Goal: find at least ONE example from each rare class by requesting a few labels
1, , nS x x 1, ,iy m
2 , , mp p2 c m
dix
1 cp p
11/17/2008 Machine Learning Lunch 11
Assumptions The distribution of the majority class is
sufficiently smooth Examples from the minority classes form
compact clusters in the feature space
-6 -4 -2 0 2 4 60
0.05
0.1
0.15
0.2
0.25
11/17/2008 Machine Learning Lunch 12
Overview of the Algorithms Nearest-neighbor-based methods
Methodology: local density differential sampling
Intuition: select examples according to the change in local density
11/17/2008 Machine Learning Lunch 13
Two Classes: NNDB1. Calculate class-specific radius r
2. , , ix S ,i iNN x r x x x r ,i in NN x r
3.
,max
j ii i j
x NN x trs n n
4. Query argmaxix S ix s
5. Rare class?x
Increase t by 1
6. Output
No
Yes
x
11/17/2008 Machine Learning Lunch 14
NNDB: Calculate Class-Specific Radius
Number of examples from the minority class:
, calculate the distance between and its nearest neighbor
The class-specific radius:
2 2p K np
ix S ixthK
Kir
1minn Ki ir r
11/17/2008 Machine Learning Lunch 15
NNDB: Calculate Nearest Neighbors
120 140 160 180 200 220
120
130
140
150
160
170
180
190
200
r
,i iNN x r x x x r
,i in NN x r
120 140 160 180 200 220
120
130
140
150
160
170
180
190
200
11/17/2008 Machine Learning Lunch 16
NNDB: Calculate the Scores
120 140 160 180 200 220
120
130
140
150
160
170
180
190
200
tr
,max
j ii i j
x NN x trs n n
Query argmaxix S ix s
11/17/2008 Machine Learning Lunch 17
NNDB: Pick the Next Candidate
120 140 160 180 200 220
120
130
140
150
160
170
180
190
200
1t r
Increase t by 1
, 1max
j ii i j
x NN x t rs n n
Query argmaxix S ix s
11/17/2008 Machine Learning Lunch 18
Why NNDB Works Theoretically
Theorem 1 [He & Carbonell 2007]: under certain conditions, with high probability, after a few iteration steps, NNDB queries at least one example whose probability of coming from the minority class is at least 1/3
Intuitively The score measures the change in local density
is
120 140 160 180 200 220
120
130
140
150
160
170
180
190
200
11/17/2008 Machine Learning Lunch 19
Multiple Classes: ALICE m-1 rare classes: One majority class: ,1 cp p
2 , , mp p2 c m
1. For each rare class c,
2. We have found examples from class c
2 c m
No
Yes
1c c
3. Run NNDB with prior cp
11/17/2008 Machine Learning Lunch 20
Why ALICE Works Theoretically
Theorem 2 [He & Carbonell 2008]: under certain conditions, with high probability, in each outer loop of ALICE, after a few iteration steps in NNDB, ALICE queries at least one example whose probability of coming from one minority class is at least 1/3
11/17/2008 Machine Learning Lunch 21
Implementation Issues ALICE
Problem: repeatedly sampling from the same rare class
MALICE Solution: relevance feedback
Class-specific radius
11/17/2008 Machine Learning Lunch 22
Results on Synthetic Data Sets
-3 -2 -1 0 1 2 3 4
-1
0
1
2
3
4
5
11/17/2008 Machine Learning Lunch 23
Summary of Real Data Sets Abalone
4177 examples 7-dimensional features 20 classes Largest class: 16.50% Smallest class: 0.34%
Shuttle 4515 examples 9-dimensional features 7 classes Largest class: 75.53% Smallest class: 0.13%
11/17/2008 Machine Learning Lunch 24
Results on Real Data SetsAbalone Shuttle
MALICEMALICE
InterleaveInterleave
Random sampling Random sampling
11/17/2008 Machine Learning Lunch 25
Imprecise priors
0 50 100 150 200 2500
5
10
15
20
Number of Selected Examples
Cla
sses
Dis
cove
red
-5%-10%-20%0+5%+10%+20%
Abalone Shuttle
0 20 40 60 80 1001
2
3
4
5
6
7
Number of Selected Examples
Cla
sses
Dis
cove
red
-5%-10%-20%0+5%+10%+20%
11/17/2008 Machine Learning Lunch 26
Outline Problem definition Related work Rare category detection for spatial data
Prior-dependent rare category detection Prior-free rare category detection
Conclusion
11/17/2008 Machine Learning Lunch 27
Overview of the Algorithm Density-based method
Methodology: specially designed exponential families
Intuition: select examples according to the change in local density
Difference from NNDB (ALICE): NO prior information needed
11/17/2008 Machine Learning Lunch 28
Specially Designed ExponentialFamilies [Efron & Tibshirani 1996]
Favorable compromise between parametric and nonparametric density estimation
Estimated density
xtxgxg T100 exp
Carrier density
Normalizing parameter
parameter vector1p
vector of sufficient statistics1p
11/17/2008 Machine Learning Lunch 29
SEDER Algorithm Carrier density: kernel density estimator To decouple the estimation of different
parameters Decompose Relax the constraint such that
Tdxxxt221 ,,
d
j
j
1 00
jx
jjjjij
ji
j
jdxx
xx1exp
2exp
2
1 2
102
2
11/17/2008 Machine Learning Lunch 30
Parameter Estimation Theorem 3 [To appear]: the maximum likelihood
estimate and of and satisfy the following conditions:
where
dj ,,1
n
kn
i j
ji
jkj
i
n
i
jjij
ji
jkj
in
k
jk
xx
xExx
x1
1 2
2
0
1
2
2
2
0
1
2
2ˆexp
2ˆexp
j1 j
i0j1̂ j
i0̂
jx
jjjjij
ji
j
j
jjji dxx
xxxxE
2
102
222 ˆˆexp
2exp
2
1
11/17/2008 Machine Learning Lunch 31
Parameter Estimation cont. Let
:
where ,
212
111
jjj
b
: positive parameterjb
dj ,,1A
ACBBb j
2
4ˆ2
n
kn
i j
ji
jk
n
i
jij
ji
jk
xx
xxx
nA
1
1 2
2
1
2
2
2
2exp
2exp
1
2jB
n
k
jkxn
C1
21
in most cases
1ˆ jb
11/17/2008 Machine Learning Lunch 32
Scoring Function The estimated density
Scoring function: norm of the gradient
where
n
i
d
j jj
ji
jj
jjbb
xbx
bnxg
1 1 2
2
2exp
2
11~
d
l ll
n
i
li
llkki
k
b
xbxxDs
1 22
2
1
d
j jj
ji
jj
jjib
xbx
bnxD
1 2
2
2exp
2
11
11/17/2008 Machine Learning Lunch 33
Results on Synthetic Data Sets
11/17/2008 Machine Learning Lunch 34
Summary of Real Data Sets
Data Set
n d m Largest Class
Smallest Class
Ecoli 336 7 6 42.56% 2.68%
Glass 214 9 6 35.51% 4.21%
Page Blocks 5473 10 5 89.77% 0.51%
Abalone 4177 7 20 16.50% 0.34%
Shuttle 4515 9 7 75.53% 0.13%
Moderately Skewed
Extremely Skewed
11/17/2008 Machine Learning Lunch 35
Moderately Skewed Data Sets
Ecoli Glass
MALICE
MALICE
11/17/2008 Machine Learning Lunch 36
Extremely Skewed Data Sets
Page Blocks Abalone
Shuttle
MALICE
MALICE
MALICE
11/17/2008 Machine Learning Lunch 37
Conclusion Rare category detection
Open challenge Lack of effective methods
Nearest-neighbor-based methods Prior-dependent Local density differential sampling
Density-based method Prior-free Specially designed exponential families
Thank You!