+ All Categories
Home > Documents > Boosting: Algorithms and Applications

Boosting: Algorithms and Applications

Date post: 09-Apr-2022
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
50
Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE
Transcript
Page 1: Boosting: Algorithms and Applications

Boosting: Algorithms

and Applications

Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision

ANU 2nd

Semester, 2008

Chunhua Shen, NICTA/RSISE

Page 2: Boosting: Algorithms and Applications

BoostingDefinition of Boosting:

Boosting refers to the general problem of producing a very accurate prediction rule by combining rough and moderately inaccurate rules.

Boosting proceduresGiven a set of labeled training examples On each round

The booster devises a distribution (importance) over the example setThe booster requests a weak hypothesis/classifier/learnerwith low error

Upon convergence, the booster combine the weak hypothesis into a single prediction rule.

Page 3: Boosting: Algorithms and Applications

Boosting (Freund & Schapire, 1997)

Page 4: Boosting: Algorithms and Applications

Boosting: 1st

iteration

Page 5: Boosting: Algorithms and Applications

Boosting: Update Distribution

Page 6: Boosting: Algorithms and Applications

Boosting as Entropy ProjectionMinimizing relative entropy to last distribution s.t. linear constraints

Page 7: Boosting: Algorithms and Applications

Boosting: 2nd

Hypothesis

Page 8: Boosting: Algorithms and Applications

Boosting: 3rd

Hypothesis

Page 9: Boosting: Algorithms and Applications

Boosting: 4th

Hypothesis

Page 10: Boosting: Algorithms and Applications

All hypotheses

Page 11: Boosting: Algorithms and Applications

AdaBoost

Page 12: Boosting: Algorithms and Applications

Properties of AdaBoost

Adaboost adjusts adaptively the errors of the weak hypotheses by weak learner.Unlike the conventional boosting algorithm, the prior error need not be known ahead of time.The update rule reduces the probability assigned to those examples on which the hypothesis makes a good predictions and increases the probability of the examples on which the prediction is poor.

Page 13: Boosting: Algorithms and Applications

Multi-class Extensions

The previous discussion is restricted to binary classification problems. The traing data could have any number of labels, which is a multi-class problems.The multi-class case (AdaBoost.M1) requires the accuracy of the weak hypothesis greater than ½. This condition in the multi-class is stronger than that in the binary classification cases

Page 14: Boosting: Algorithms and Applications

Detection Pedestrian Using Patterns of Motion and Appearance

Paul Viola, Michael J. Jones, Daniel SnowIEEE ICCV

Page 15: Boosting: Algorithms and Applications

The System

A pedestrian detection system using image intensity information and motion information with the detectors trained by AdaBoost.The first approach combining both the appearance and motion information in a single detector. Advantages:

High efficiencyHigh detection rate & low false positive rate

Page 16: Boosting: Algorithms and Applications

Rectangle Filters

Measuring the difference between region averages at various scales, orientations and aspect ratios.However, this information is limited and needs to be boosted to perform accurate classification

Page 17: Boosting: Algorithms and Applications

Motion Information

Information about the direction of motion can be extracted from the difference between shifted versions of the second image in time with the first imageMotion filters (direction, shear, magnitude) operate on 5 images:

( )( )( )( )( )↓−=

→−=←−=↑−=

−=Δ

+

+

+

+

+

1

1

1

1

1

tt

tt

tt

tt

tt

IIabsD

IIabsRIIabsLIIabsU

IIabs

Page 18: Boosting: Algorithms and Applications

An Example

Page 19: Boosting: Algorithms and Applications

Appearance Filter

Appearance Filter is rectangular filters that operate on the first input image

( )tm If φ=

Page 20: Boosting: Algorithms and Applications

Integral Image

The integral image at location x,y contains the sum of the pixels above and to the left of x,y, inclusive:

where is the integral image and is the original image

where s(x,y) is the cumulative row sum

( ) ( )∑≤′≤′

′′=yyxx

yxiyxii,

,,

( )yxii , ( )yxi ,

( ) ( ) ( )( ) ( ) ( )yxsyxiiyxii

yxiyxsyxs,,1,

,1,,+−=+−=

Page 21: Boosting: Algorithms and Applications

Training Filters

The rectangle filters can have any size, aspect ratio or position as long as they fit in the detection window; therefore, there are quite a number of possible motion and appearance filters, from which a learning algorithm selects to build classifiers.

Page 22: Boosting: Algorithms and Applications

Training Process

The training process uses AdaBoost to select a subset of features (F) which minimize the weighted error, to construct the classifier.In each round, the learning algorithm chooses a set of filters from motion and appearance filters.Also picks the optimal threshold (t) for each feature as well as the linear weights The outputs of AdaBoost is a linear combination of the selected features.

Page 23: Boosting: Algorithms and Applications

Training Process

A cascade architecture is used to raise the efficiency of the system.The true and false positives passed at the current stage will be used in the next stage of the cascade. The goal is to reduce the false positive rate faster than the detection rate.

Page 24: Boosting: Algorithms and Applications

Strong Classifier

Weak Classifier 1

0.9 0.5 0.30.7

Weight

=Strong Classifier

Weak Classifier 2

Weak Classifier 3

Weak Classifier 4

0.9 + 0.7 + 0.3 = 1.9 > 1.0 (threshold)

Weak Classifier 1

0.9 0.5 0.30.7

Weight

=Weak Classifier 2

Weak Classifier 3

Weak Classifier 4

0.5 + 0.3 = 0.8 < 1.0 (threshold)

Classifier 1Classifier 1

Classifier 2Classifier 2

……

Overview of the Cascaded Structure

Page 25: Boosting: Algorithms and Applications

Experiments

Each classifier in the cascade is trained using the original positive examples and the same number of false positives from the previous stage or negative examples at the first stage.The resulting classifier of previous stage is used as the input of the current stage and build a new classifier with lower false positive rateThe detection threshold is set using a validation set of image pairs.

Page 26: Boosting: Algorithms and Applications

Training samples

A small sample of positive training examples: A pair of image patterns comprise a single example for training

Page 27: Boosting: Algorithms and Applications

Training the cascade

A large number of motion and appearance filters for training the dynamic pedestriansFewer number of appearance filters for training the static pedestrians

Page 28: Boosting: Algorithms and Applications

Training results

The first five filters learned for the dynamic pedestrian detector. The six images used in the motion and appearance representation are shown for each filter

The first five filters learned for the static pedestrian detector

Page 29: Boosting: Algorithms and Applications

Testing

Detection results of the dynamic detector

Page 30: Boosting: Algorithms and Applications

Testing

Detection results of the static detector

Page 31: Boosting: Algorithms and Applications

Pedestrian Detection Using Boosting and Covariance Features

Sakrapee Paisitkriangkrai, Chunhua Shen, and Jian Zhang, IEEE T-CSVT

Page 32: Boosting: Algorithms and Applications

Covariance FeaturesThe image is divided into small overlapped regions.Each pixel in the region is converted to an eight-dimensional feature vector

[ ] ⎥⎦

⎤⎢⎣

⎡−

−=−−= ∑ ∑ ∑

k k kkkkkYX YX

nYX

nYXEYX 1

11))((),cov( μμ

⎥⎥⎥

⎢⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛+

=−

YYXXX

YYX

YX

IIII

II

IIyx

yxF122 tan

),(

Covariance matrix is calculated from

To improve the calculation time, technique which employs integral image has been applied. In other words, we compute the integral image of

∑∑∑k

kkk

kk

k YXYX

Page 33: Boosting: Algorithms and Applications

Experimental Results

0 0.002 0.004 0.006 0.008 0.010.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Feature Comparsion

False Positive Rate

Det

ectio

n R

ate

COV, RBF SVM (g=0.01)HoG, RBF SVM (g=0.01)HoG, Quadratic SVMCOV, Quadratic SVMHoG, Linear SVMCOV, Linear SVM

Page 34: Boosting: Algorithms and Applications

RemarksAlthough, covariance features with non-linear SVM outperform many state-of-the-art techniques, it has the following disadvantages:

The block size used in SVM is fixed (7x7 pixels), which meansunable to capture human body parts with other rectangular shapes e.g. human limbs, torso, etc.Parameter tuning process in SVM is rather tedious.High computation time of non-linear SVM.

Building a new, simpler pedestrian detector usingcovariance featuresAdaBoost with weighted Fisher linear discriminant analysis (WLDA) based weak classifierscascaded structure.

Page 35: Boosting: Algorithms and Applications

Linear Discriminant

Analysis (LDA)

MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example

Page 36: Boosting: Algorithms and Applications

Linear Discriminant

Analysis (LDA)

MotivationProject data onto a line (Rn R1) such that patterns become well separated (in a least square sense).Two-dimensional example

Best separation between two classes

Page 37: Boosting: Algorithms and Applications

Covariance features with LDA

ObservationsIt is possible to achieve a 5% test error rate using either 25 covariance features or 100 Haar-like features

⎥⎥⎥⎥

⎢⎢⎢⎢

]var[...],cov[],cov[.........

]var[],cov[]var[

YYYYYY IIYIX

YYXX

Combine

covariance features with LDA and compare it against haar-like features.

Page 38: Boosting: Algorithms and Applications
Page 39: Boosting: Algorithms and Applications
Page 40: Boosting: Algorithms and Applications

Combine multi-dimensional covariance features with weighted LDA

Trained the new features on AdaBoostframework for faster speed and high accuracy.Apply multiple layer boosting with heterogeneous features on cascaded structure

Components

#1

0.9 0.5 0.30.7Weight

=Strong Classifier

# 2 # 3 # 40.9 + 0.7 + 0.3 =

1.9 > 1.0 (threshold)

Page 41: Boosting: Algorithms and Applications

Architecture

Calculate region covariance matrix and

stack upper triangle of the matrix into a vector (Rn)

Apply Weighted Fisher Linear Discriminant

(Rn R1)

AdaBoost selects best weak learner with respect to the

weighted error

Update the sample weights

Training dataset

A complete set of rectangular filters (Weak classifiers )

Test the predefined objective

Hit rate: 99.5%False Pos: 50%

F

T

Strong Classifier

Architecture of the pedestrian detection system using boosted covariance features.

Page 42: Boosting: Algorithms and Applications

ObservationsObservations –

Covariance featuresThe combined covariance features represent a distinct part of the human body.The 1st covariance feature represents human legs (two parallel vertical bars)The 2nd covariance feature captures the information of the head and the human body

Compare with Haar

features1.

The

1st

haar

feature represents human head/shoulder contour

2.

The

2nd

haar

feature represents human left leg.

Page 43: Boosting: Algorithms and Applications

Experimental Results

The proposed boosted covariance detector achieves about ten times fasterdetection speed than the conventional covariance detector (Tuzel et al. 2007). On a 360 x 288 pixels image, our system can process at around 4 frames per second. This is the first real-time covariance feature based pedestrian detector.

Page 44: Boosting: Algorithms and Applications

Experimental Results

Page 45: Boosting: Algorithms and Applications

Face Detection ApplicationsFace Detection ApplicationsSummary: Viola & Jones’ Face Detector

Use Integral image for efficient feature extractionUse AdaBoost for feature selectionApply cascade classifier for efficient non-faces elimination

Pros:Fast and robust face detectorThe system can be run in real-time

Cons:Training stage is time consuming (1~2 weeks) depending on number of training samples and number of features usedRequire a lot of face training samples

Discussions:Performance of face detection depends crucially on the features that are used to represent the objectsGood features not only result in better generalization abilities but also require smaller training database.

Page 46: Boosting: Algorithms and Applications

Face Detection ApplicationsProposed work

Similar to previous experiment, we apply covariance features to face detection

The differences between our work and Viola & Jones’ framework:We use covariance features We adopt the weighted FDA as weak classifiers

To show the better classification capability, we have trained a boosted classifier on the banana dataset with multidimensional decision stump and FDA as weak classifiers.

200 400 600 800 1000

0.1

0.15

0.2

0.25

0.3

0.35

0.4

# of weak classifiers (multidimensional stump)

Err

or

Train errorTest error

50 100 150 2000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

# of weak classifiers (Fisher discriminant analysis)

Err

or

Train errorTest error

Page 47: Boosting: Algorithms and Applications

Observations / Experimental resultsROC curves show that covariance features significantly outperform Haar-like wavelet features when the training database size is small.As the number of samples grows, the performance difference between the

two techniques decreases.

0 50 100 150 200 250 300 350 4000.65

0.7

0.75

0.8

0.85

0.9

ROC curve on MIT + CMU test set (250 faces)

Number of False Positives

Cor

rect

Det

ectio

n R

ate

COV Features (250 faces)Haar Features (250 faces)

0 50 100 150 200 250 300 350 4000.65

0.7

0.75

0.8

0.85

0.9

ROC curve on MIT + CMU test set (500 faces)

Number of False Positives

Cor

rect

Det

ectio

n R

ate

COV Features (500 faces)Haar Features (500 faces)

ROC curves for our algorithm on the MIT+CMU

test set.

Page 48: Boosting: Algorithms and Applications

Experimental Results

Some detection results of our face detectors trained using 250 frontal faces on MIT + CMU test images

Page 49: Boosting: Algorithms and Applications

Summary

BoostingAdaBoostAdaBoost for pedestrian detection using Haar features and dynamic temporal informationAdaBoost for pedestrian detection using new covariance featuresFace detection using new covariance features

Page 50: Boosting: Algorithms and Applications

Questions?


Recommended