Download - Kullback-Leibler Boosting

Kullback-Leibler BoostingCe Liu, Hueng-Yeung ShumMicrosoft Research AsiaCVPR 2003

Presented by Derek Hoiem

RealBoost Review Start with some candidate feature set Initialize training sample weights Loop:

Add feature to minimize error bound Reweight training examples, giving more weight to

misclassified examples Assign weight to weak classifier according to

weighted error of training samples Exit loop after N features have been added

The Basic Idea of KLBoosting

Similar to RealBoost except:Features are general linear projectionsGenerates optimal features Uses KL divergence to select featuresFiner tuning on coefficients

Linear Features

KLBoosting:

VJ Adaboost:

What makes a feature good?

KLBoosting:

RealBoost:Minimize upper bound on classification error

Creating the feature set

Sequential 1-D OptimizationBegin with large initial set of features (linear

projections)Choose top L features according to KL-Div Initial feature = weighted sum of L featuresSearch for optimal feature in directions of L

features

Example

Initial feature set:

xx

xx

x

x

x

x

Example

Top two features (by KL-Div):

xx

xx

x

x

x

x

w1 w2

Example

Initial feature (weighted combo by KL):

xx

xx

x

x

x

x

w1 w2f0

Example

Optimize over w1

xx

xx

x

x

x

x

w1 w2f1

f1= f0 + B* w1

B = -a1..a1

Example

Optimize over w2

xx

xx

x

x

x

x

w1 w2f2

f2= f1 + B* w2

B = -a2..a2

(and repeat…)

Creating the feature set

First three features

Selecting the first feature

Creating feature set

Classification

= ½ in RealBoost

Parameter Learning

With each added feature k:Set first a1..ak-1 to current optimal value Set ak to 0Minimize recognition error on training:

Solve using greedy algorithm

KLBoost vs AdaBoost

1024 candidate features for AdaBoost

Face detection: candidate features

52,400 2,800450

Face detection: training samples

8760 faces + mirror images 2484 non-face images 1.34B patches Cascaded classifier allows bootstrapping

Face detection: final features

top ten

global semantic

global not semantic

local

Results

xx

xx

8 85 853

Schneiderman (2003)

Test time: .4 sec per 320x240 image

Comments

Training time?

Which improves performance:Generating optimal features?KL feature selection?Optimizing alpha coefficients?