Kullback-Leibler BoostingCe Liu, Hueng-Yeung ShumMicrosoft Research AsiaCVPR 2003
Presented by Derek Hoiem
RealBoost Review Start with some candidate feature set Initialize training sample weights Loop:
Add feature to minimize error bound Reweight training examples, giving more weight to
misclassified examples Assign weight to weak classifier according to
weighted error of training samples Exit loop after N features have been added
The Basic Idea of KLBoosting
Similar to RealBoost except:Features are general linear projectionsGenerates optimal features Uses KL divergence to select featuresFiner tuning on coefficients
Linear Features
KLBoosting:
VJ Adaboost:
What makes a feature good?
KLBoosting:
RealBoost:Minimize upper bound on classification error
Creating the feature set
Sequential 1-D OptimizationBegin with large initial set of features (linear
projections)Choose top L features according to KL-Div Initial feature = weighted sum of L featuresSearch for optimal feature in directions of L
features
Example
Initial feature set:
xx
xx
x
x
x
x
Example
Top two features (by KL-Div):
xx
xx
x
x
x
x
w1 w2
Example
Initial feature (weighted combo by KL):
xx
xx
x
x
x
x
w1 w2f0
Example
Optimize over w1
xx
xx
x
x
x
x
w1 w2f1
f1= f0 + B* w1
B = -a1..a1
Example
Optimize over w2
xx
xx
x
x
x
x
w1 w2f2
f2= f1 + B* w2
B = -a2..a2
(and repeat…)
Creating the feature set
First three features
Selecting the first feature
Creating feature set
Classification
= ½ in RealBoost
Parameter Learning
With each added feature k:Set first a1..ak-1 to current optimal value Set ak to 0Minimize recognition error on training:
Solve using greedy algorithm
KLBoost vs AdaBoost
1024 candidate features for AdaBoost
Face detection: candidate features
52,400 2,800450
Face detection: training samples
8760 faces + mirror images 2484 non-face images 1.34B patches Cascaded classifier allows bootstrapping
Face detection: final features
top ten
global semantic
global not semantic
local
Results
xx
xx
8 85 853
Schneiderman (2003)
Test time: .4 sec per 320x240 image
Comments
Training time?
Which improves performance:Generating optimal features?KL feature selection?Optimizing alpha coefficients?