Date post: | 10-Apr-2018 |
Category: |
Documents |
Upload: | claudia-larray |
View: | 216 times |
Download: | 0 times |
of 41
8/8/2019 Boosting and Applications Yuan
1/41
Boosting Algorithm and Its
Application
Dan Yuan
Jan 2005
8/8/2019 Boosting and Applications Yuan
2/41
Gambling Strategies
Rules-of-thumb from gambling experts
Maximizing the advantages using these
rules-of-thumb. How to combine these rules-of-thumb into a
highly accurate prediction rule.
8/8/2019 Boosting and Applications Yuan
3/41
Boosting
Definition of Boosting:Boosting refers to the general problem of producing a veryaccurate prediction rule by combining rough andmoderately inaccurate rules-of-thumb.
Boosting procedures
Given a set of labeled training examples ,whereis the label associated with instance
On each round ,
The booster devises a distribution (importance) over theexample set
The booster requests a weak hypothesis (rule-of-thumb) withlow error
AfterTrounds, the booster combine the weak hypothesisinto a single prediction rule.
Niyx ii .1, ! iy
ix
Tt ,,1 .!
tD
thtI
8/8/2019 Boosting and Applications Yuan
4/41
Conventional Boosting Algorithm
The intuitive idea
Altering the distribution over the domain in a way that
increases the probability of the harder parts of the
space, thus forcing the weak learner to generate new
hypotheses that make less mistakes on these parts.
Disadvantages
Needs to know the prior knowledge of accuracies of
the weak hypotheses
The performance bounds depends only on the
accuracy of the least accurate weak hypothesis
8/8/2019 Boosting and Applications Yuan
5/41
Adaboost
The framework
The learner receives examples chosen randomly
according to some fixed but unknown distribution on
The learner finds a hypothesis which is consistent withmost of the samples
The algorithm
Input variables
P: The distribution where the training examples sampling from
D: The distribution over all the training samples
WeakLearn: A weak learning algorithm to be boosted
T: The specified number of iterations
Niyx ii .1, !
P YXv
Nimostforyxhiif
ee! 1
fh
8/8/2019 Boosting and Applications Yuan
6/41
Adaboost (contd)
8/8/2019 Boosting and Applications Yuan
7/41
Advantages of adaboost
Adaboost adjusts adaptively the errors of theweak hypotheses by WeakLearn.
Unlike the conventional boosting algorithm,
the prior error need not be known ahead oftime.
The update rule reduces the probabilityassigned to those examples on which thehypothesis makes a good predictions andincreases the probability of the examples onwhich the prediction is poor.
8/8/2019 Boosting and Applications Yuan
8/41
The error bound
Suppose the weak learning algorithm WeakLearn, when
called by Adaboost, generates hypotheses with errors .
Then the error of the final hypothesis output by
Adaboost is bounded above by
Note that the errors generated by WeakLearn are not uniform,
and the final error depends on the error of all of the weak
hypotheses. Recall that the errors of the previous boostingalgorithms depend only on the maximal error of the weakest
hypothesis and ignored the advantages that can be gained
from the hypotheses whose errors are smaller.
TII ,,1 .
? AiifDi yxh {! ~PrI fh
!
eT
t
tt
1
12 III
8/8/2019 Boosting and Applications Yuan
9/41
The error bound (contd)
Alternative Formulation
if
where is the Kullback-Leiblerdivergence
Also, we can assume that the errors of all the
hypotheses are , thenwhich means when the number of iterations goestowards infinity, the upper bound of the finalhypothesis error approaches zero.
ttKI !
2
1
e
!!e
!!!!
T
t
t
T
t
t
T
t
t
T
t
tt KL1
2
11
2
1
2exp2/1||2/1exp4112 KKKIII
!b
aa
baabaKL
11
ln)1(ln||
KI ! 2
1t
2
2exp KIT
e
8/8/2019 Boosting and Applications Yuan
10/41
The generalization error
The generalization error
Evaluation of the error of the final hypothesis outside
the training set.
The goal:
Making the generalization error close to the empirical error
on the training set.
one natural way of achieving this is to restrict the weaklearner to choose its hypotheses from some simple
functions and restrict T, the number of weak hypothesis.
? AyxhfPyxg ! ~,PrI
8/8/2019 Boosting and Applications Yuan
11/41
The generalization error (contd)
The choice of the class of weak hypothesis is
specific to the real learning problem and at
least should reflect the knowledge about the
properties of the unknown concept.
Using an upper bound on the VC-dimension
of the concept class for the choice ofT
8/8/2019 Boosting and Applications Yuan
12/41
The Vapniks Theorem
Stating how close the empirical error and
generalization error would be.
The generalization error
The empirical error from Nexamples
For any we have that
? AyxhfPyxg
!~,P
r
I
_ aN
yxhih
ii {!:
I
0"H
HII e
-
"
N
TdOhhHh g:Pr
8/8/2019 Boosting and Applications Yuan
13/41
Minimization of generalization error
Assume be the hypothesis generated by running
AdaBoost forTiterations, by combining the observed
empirical error of with the given bounds, we can
compute an upper bound on the generalization error of
for all T. Then, selecting the hypothesis that minimizes
the guaranteed upper bound.
Cross-validation for choosing T
T
fh
T
fhT
f
h
8/8/2019 Boosting and Applications Yuan
14/41
Multi-class Extensions
The previous discussion is restricted to binary
classification problems. The set Ycould have
any number of labels, which is a multi-class
problems.
The multi-class case (AdaBoost.M1) requires
the accuracy of the weak hypothesis greater
than . This condition in the multi-class isstronger than that in the binary classification
cases
8/8/2019 Boosting and Applications Yuan
15/41
AdaBoost.M1
The algorithm
8/8/2019 Boosting and Applications Yuan
16/41
Error Upper Bound of Adaboost.M1
Like the binary classification case, the error of the
final hypothesis is also bounded.
! eT
t
tt
1
12 III
8/8/2019 Boosting and Applications Yuan
17/41
Adaboost.M2
Introducing the degree of belief into all the labels
rather than a single label output.
For instance, measures the degree to which it is
believed that y is the correct label associated with x Replacing the original prediction error with the
pseudo-loss which can focus the learner on the
labels that are hardest to discriminate.
yxh ,
8/8/2019 Boosting and Applications Yuan
18/41
The pseudo-loss
For a fixed training example ,we use a given
hypothesis to keep asking k-1 questions for
Whichis the label of xi, y or yi?
The probability of choosing the incorrect answer y tothe question is
The weighted average probability (pseudo-loss) of
answering all the k-1 questions is
Where q is called label weighting function and summed to 1
yxhyxhiii,,1
2
1
ii yx ,
iyy {
!
{
yxhyiqyxhihploss iyy
iiq
i
,,,121,
8/8/2019 Boosting and Applications Yuan
19/41
The pseudo-loss (contd)
The weak learners goal is to minimize the expected
pseudo-loss for a given distribuation D and weighting
function q
As we can see , by manipulating both the distribution on
instances, and the label weighting function q, the boosting
algorithm forces the weak learner to focus not only on the
hard instances, but also on the incorrect class labels that arehardest to eliminate.
? AihplossEhploss qDiqD ,: ~, !
8/8/2019 Boosting and Applications Yuan
20/41
The algorithmAdaBoost.M2
8/8/2019 Boosting and Applications Yuan
21/41
Error Upper Bound of Adaboost.M2
Like the previous case, the error of the final
hypothesis is also bounded.
!e
T
tttk 1 12)1(
III
8/8/2019 Boosting and Applications Yuan
22/41
Detection Pedestrian Using Patterns
ofMotion and Appearance
Paul Viola, Michael J. Jones, Daniel Snow
8/8/2019 Boosting and Applications Yuan
23/41
The System
A pedestrian detection system using image
intensity information and motion information
with the detectors trained by AdaBoost.
The first approach combining both the
appearance and motion information in a
single detector.
Advantages: High efficiency
High detection rate & low false positive rate
8/8/2019 Boosting and Applications Yuan
24/41
Rectangle Filters
Measuring the difference between region averages
at various scales, orientations and aspect ratios.
However, this information is limited and needs to be
boosted to perform accurate classification
8/8/2019 Boosting and Applications Yuan
25/41
Motion information
Information about the direction of motion can be
extracted from the difference between shifted
versions of the second image in time with the first
image Motion filters (direction, shear, magnitude) operate
on 5 images: q!
p!
n!
o!
!(
1
1
1
1
1
tt
tt
tt
tt
tt
IIabsD
IIabsR
IIabsL
IIabsU
IIabs
8/8/2019 Boosting and Applications Yuan
26/41
An example
?
8/8/2019 Boosting and Applications Yuan
27/41
Motion Direction and Shear Filters
Motion Direction Filter
is single box rectangular sum
These filters extract information related to the
likelihood that a particular region is moving in a
given direction
Motion Shear Filter
Using the rectangle filters
_ a
SrrfDRLUS
iii (!
,,,
ir
jJ
Sf jj J!
8/8/2019 Boosting and Applications Yuan
28/41
Motion Magnitude Filter and Appearance Filter
Motion Magnitude Filter
is single box rectangular sum within the
detection window
Appearance Filter is rectangular filters that operate
on the first input image
ir
_ a SrfDRLUS
kk ! ,,,
tm If J!
8/8/2019 Boosting and Applications Yuan
29/41
Integral Image
The integral image at location x,y contains the sum
of the pixels above and to the left of x,y, inclusive:
where is the integral image and is the
original image
where s(x,y) is the cumulative row sum
eded
dd!yyxx
yxiyxii,
,,
yxii , yxi ,
yxsyxiiyxii
yxiyxsyxs
,,1,
,1,,
!
!
8/8/2019 Boosting and Applications Yuan
30/41
Scale-invariance
Scale-invariance is achieved during the training
process.
A pyramid of different scales is built with a base
resolution.
!
p!
n!
o!
!(
1
1
1
1
1
tl
tll
tl
tll
tl
tll
tl
tll
tl
tll
IIabs
IIabs
IIabs
IIabs
IIabs
8/8/2019 Boosting and Applications Yuan
31/41
Training Filters
The rectangle filters can have any size,
aspect ratio or position as long as they fit in
the detection window; therefore, there are
quite a number of possible motion andappearance filters, from which a learning
algorithm selects to build classifiers.
8/8/2019 Boosting and Applications Yuan
32/41
The Classifier (contd)
A classifier, C, is a thresholded sum of features:
A feature ,F, is simply a thresholded filter that outputs
one of the votes
where is a feature threshold and is one of
the motion or appearance filters. The real-valued
and are computed during AdaBoost learning.
"(
! !else
DRLUIFifIIC
N
i titt
0
,,,,,1, 11
U
"(
!else
tDRLUIfifIIF
iti
ttiF
E ,,,,,, 1
it if
E
?
8/8/2019 Boosting and Applications Yuan
33/41
Training Process
The training process uses AdaBoost to select a
subset of features (F) which minimize the weighted
error, to construct the classifier.
In each round, the learning algorithm chooses a setof filters from motion and appearance filters.
Also picks the optimal threshold (t) for each feature
as well as the votes and
The outputs of AdaBoost is a linear combination ofthe selected features.
E F
8/8/2019 Boosting and Applications Yuan
34/41
Training Process
A cascade architecture is used to raise the
efficiency of the system.
The true and false positives passed at the
current stage will be used in the next stage of
the cascade. The goal is to reduce the false
positive rate faster than the detection rate.
8/8/2019 Boosting and Applications Yuan
35/41
Experiments
Each classifier in the cascade is trained usingthe original positive examples and the samenumber of false positives from the previous
stage or negative examples at the first stage. The resulting classifier of previous stage is
used as the input of the current stage andbuild a new classifier with lower false positive
rate The detection threshold is set using a
validation set of image pairs.
8/8/2019 Boosting and Applications Yuan
36/41
Training samples
A small sample of
positive training
examples. A pair of
image patternscomprise a single
example for training
8/8/2019 Boosting and Applications Yuan
37/41
Training the cascade
A large number of motion and appearance
filters for training the dynamic pedestrians
Fewer number of appearance filters for
training the static pedestrians
8/8/2019 Boosting and Applications Yuan
38/41
Training results
The first five filters learned for
the dynamic pedestrian
detector. The six images used
in the motion and appearance
representation are shown foreach filter
The first five filters learned for
the static pedestrian detector
8/8/2019 Boosting and Applications Yuan
39/41
Testing
Detection for the dynamic detector
Detection for the dynamic detector
8/8/2019 Boosting and Applications Yuan
40/41
Testing
Detection for static detector
8/8/2019 Boosting and Applications Yuan
41/41
Thanks