+ All Categories
Home > Documents > Boosting and Applications Yuan

Boosting and Applications Yuan

Date post: 10-Apr-2018
Category:
Upload: claudia-larray
View: 216 times
Download: 0 times
Share this document with a friend

of 41

Transcript
  • 8/8/2019 Boosting and Applications Yuan

    1/41

    Boosting Algorithm and Its

    Application

    Dan Yuan

    Jan 2005

  • 8/8/2019 Boosting and Applications Yuan

    2/41

    Gambling Strategies

    Rules-of-thumb from gambling experts

    Maximizing the advantages using these

    rules-of-thumb. How to combine these rules-of-thumb into a

    highly accurate prediction rule.

  • 8/8/2019 Boosting and Applications Yuan

    3/41

    Boosting

    Definition of Boosting:Boosting refers to the general problem of producing a veryaccurate prediction rule by combining rough andmoderately inaccurate rules-of-thumb.

    Boosting procedures

    Given a set of labeled training examples ,whereis the label associated with instance

    On each round ,

    The booster devises a distribution (importance) over theexample set

    The booster requests a weak hypothesis (rule-of-thumb) withlow error

    AfterTrounds, the booster combine the weak hypothesisinto a single prediction rule.

    Niyx ii .1, ! iy

    ix

    Tt ,,1 .!

    tD

    thtI

  • 8/8/2019 Boosting and Applications Yuan

    4/41

    Conventional Boosting Algorithm

    The intuitive idea

    Altering the distribution over the domain in a way that

    increases the probability of the harder parts of the

    space, thus forcing the weak learner to generate new

    hypotheses that make less mistakes on these parts.

    Disadvantages

    Needs to know the prior knowledge of accuracies of

    the weak hypotheses

    The performance bounds depends only on the

    accuracy of the least accurate weak hypothesis

  • 8/8/2019 Boosting and Applications Yuan

    5/41

    Adaboost

    The framework

    The learner receives examples chosen randomly

    according to some fixed but unknown distribution on

    The learner finds a hypothesis which is consistent withmost of the samples

    The algorithm

    Input variables

    P: The distribution where the training examples sampling from

    D: The distribution over all the training samples

    WeakLearn: A weak learning algorithm to be boosted

    T: The specified number of iterations

    Niyx ii .1, !

    P YXv

    Nimostforyxhiif

    ee! 1

    fh

  • 8/8/2019 Boosting and Applications Yuan

    6/41

    Adaboost (contd)

  • 8/8/2019 Boosting and Applications Yuan

    7/41

    Advantages of adaboost

    Adaboost adjusts adaptively the errors of theweak hypotheses by WeakLearn.

    Unlike the conventional boosting algorithm,

    the prior error need not be known ahead oftime.

    The update rule reduces the probabilityassigned to those examples on which thehypothesis makes a good predictions andincreases the probability of the examples onwhich the prediction is poor.

  • 8/8/2019 Boosting and Applications Yuan

    8/41

    The error bound

    Suppose the weak learning algorithm WeakLearn, when

    called by Adaboost, generates hypotheses with errors .

    Then the error of the final hypothesis output by

    Adaboost is bounded above by

    Note that the errors generated by WeakLearn are not uniform,

    and the final error depends on the error of all of the weak

    hypotheses. Recall that the errors of the previous boostingalgorithms depend only on the maximal error of the weakest

    hypothesis and ignored the advantages that can be gained

    from the hypotheses whose errors are smaller.

    TII ,,1 .

    ? AiifDi yxh {! ~PrI fh

    !

    eT

    t

    tt

    1

    12 III

  • 8/8/2019 Boosting and Applications Yuan

    9/41

    The error bound (contd)

    Alternative Formulation

    if

    where is the Kullback-Leiblerdivergence

    Also, we can assume that the errors of all the

    hypotheses are , thenwhich means when the number of iterations goestowards infinity, the upper bound of the finalhypothesis error approaches zero.

    ttKI !

    2

    1

    e

    !!e

    !!!!

    T

    t

    t

    T

    t

    t

    T

    t

    t

    T

    t

    tt KL1

    2

    11

    2

    1

    2exp2/1||2/1exp4112 KKKIII

    !b

    aa

    baabaKL

    11

    ln)1(ln||

    KI ! 2

    1t

    2

    2exp KIT

    e

  • 8/8/2019 Boosting and Applications Yuan

    10/41

    The generalization error

    The generalization error

    Evaluation of the error of the final hypothesis outside

    the training set.

    The goal:

    Making the generalization error close to the empirical error

    on the training set.

    one natural way of achieving this is to restrict the weaklearner to choose its hypotheses from some simple

    functions and restrict T, the number of weak hypothesis.

    ? AyxhfPyxg ! ~,PrI

  • 8/8/2019 Boosting and Applications Yuan

    11/41

    The generalization error (contd)

    The choice of the class of weak hypothesis is

    specific to the real learning problem and at

    least should reflect the knowledge about the

    properties of the unknown concept.

    Using an upper bound on the VC-dimension

    of the concept class for the choice ofT

  • 8/8/2019 Boosting and Applications Yuan

    12/41

    The Vapniks Theorem

    Stating how close the empirical error and

    generalization error would be.

    The generalization error

    The empirical error from Nexamples

    For any we have that

    ? AyxhfPyxg

    !~,P

    r

    I

    _ aN

    yxhih

    ii {!:

    I

    0"H

    HII e

    -

    "

    N

    TdOhhHh g:Pr

  • 8/8/2019 Boosting and Applications Yuan

    13/41

    Minimization of generalization error

    Assume be the hypothesis generated by running

    AdaBoost forTiterations, by combining the observed

    empirical error of with the given bounds, we can

    compute an upper bound on the generalization error of

    for all T. Then, selecting the hypothesis that minimizes

    the guaranteed upper bound.

    Cross-validation for choosing T

    T

    fh

    T

    fhT

    f

    h

  • 8/8/2019 Boosting and Applications Yuan

    14/41

    Multi-class Extensions

    The previous discussion is restricted to binary

    classification problems. The set Ycould have

    any number of labels, which is a multi-class

    problems.

    The multi-class case (AdaBoost.M1) requires

    the accuracy of the weak hypothesis greater

    than . This condition in the multi-class isstronger than that in the binary classification

    cases

  • 8/8/2019 Boosting and Applications Yuan

    15/41

    AdaBoost.M1

    The algorithm

  • 8/8/2019 Boosting and Applications Yuan

    16/41

    Error Upper Bound of Adaboost.M1

    Like the binary classification case, the error of the

    final hypothesis is also bounded.

    ! eT

    t

    tt

    1

    12 III

  • 8/8/2019 Boosting and Applications Yuan

    17/41

    Adaboost.M2

    Introducing the degree of belief into all the labels

    rather than a single label output.

    For instance, measures the degree to which it is

    believed that y is the correct label associated with x Replacing the original prediction error with the

    pseudo-loss which can focus the learner on the

    labels that are hardest to discriminate.

    yxh ,

  • 8/8/2019 Boosting and Applications Yuan

    18/41

    The pseudo-loss

    For a fixed training example ,we use a given

    hypothesis to keep asking k-1 questions for

    Whichis the label of xi, y or yi?

    The probability of choosing the incorrect answer y tothe question is

    The weighted average probability (pseudo-loss) of

    answering all the k-1 questions is

    Where q is called label weighting function and summed to 1

    yxhyxhiii,,1

    2

    1

    ii yx ,

    iyy {

    !

    {

    yxhyiqyxhihploss iyy

    iiq

    i

    ,,,121,

  • 8/8/2019 Boosting and Applications Yuan

    19/41

    The pseudo-loss (contd)

    The weak learners goal is to minimize the expected

    pseudo-loss for a given distribuation D and weighting

    function q

    As we can see , by manipulating both the distribution on

    instances, and the label weighting function q, the boosting

    algorithm forces the weak learner to focus not only on the

    hard instances, but also on the incorrect class labels that arehardest to eliminate.

    ? AihplossEhploss qDiqD ,: ~, !

  • 8/8/2019 Boosting and Applications Yuan

    20/41

    The algorithmAdaBoost.M2

  • 8/8/2019 Boosting and Applications Yuan

    21/41

    Error Upper Bound of Adaboost.M2

    Like the previous case, the error of the final

    hypothesis is also bounded.

    !e

    T

    tttk 1 12)1(

    III

  • 8/8/2019 Boosting and Applications Yuan

    22/41

    Detection Pedestrian Using Patterns

    ofMotion and Appearance

    Paul Viola, Michael J. Jones, Daniel Snow

  • 8/8/2019 Boosting and Applications Yuan

    23/41

    The System

    A pedestrian detection system using image

    intensity information and motion information

    with the detectors trained by AdaBoost.

    The first approach combining both the

    appearance and motion information in a

    single detector.

    Advantages: High efficiency

    High detection rate & low false positive rate

  • 8/8/2019 Boosting and Applications Yuan

    24/41

    Rectangle Filters

    Measuring the difference between region averages

    at various scales, orientations and aspect ratios.

    However, this information is limited and needs to be

    boosted to perform accurate classification

  • 8/8/2019 Boosting and Applications Yuan

    25/41

    Motion information

    Information about the direction of motion can be

    extracted from the difference between shifted

    versions of the second image in time with the first

    image Motion filters (direction, shear, magnitude) operate

    on 5 images: q!

    p!

    n!

    o!

    !(

    1

    1

    1

    1

    1

    tt

    tt

    tt

    tt

    tt

    IIabsD

    IIabsR

    IIabsL

    IIabsU

    IIabs

  • 8/8/2019 Boosting and Applications Yuan

    26/41

    An example

    ?

  • 8/8/2019 Boosting and Applications Yuan

    27/41

    Motion Direction and Shear Filters

    Motion Direction Filter

    is single box rectangular sum

    These filters extract information related to the

    likelihood that a particular region is moving in a

    given direction

    Motion Shear Filter

    Using the rectangle filters

    _ a

    SrrfDRLUS

    iii (!

    ,,,

    ir

    jJ

    Sf jj J!

  • 8/8/2019 Boosting and Applications Yuan

    28/41

    Motion Magnitude Filter and Appearance Filter

    Motion Magnitude Filter

    is single box rectangular sum within the

    detection window

    Appearance Filter is rectangular filters that operate

    on the first input image

    ir

    _ a SrfDRLUS

    kk ! ,,,

    tm If J!

  • 8/8/2019 Boosting and Applications Yuan

    29/41

    Integral Image

    The integral image at location x,y contains the sum

    of the pixels above and to the left of x,y, inclusive:

    where is the integral image and is the

    original image

    where s(x,y) is the cumulative row sum

    eded

    dd!yyxx

    yxiyxii,

    ,,

    yxii , yxi ,

    yxsyxiiyxii

    yxiyxsyxs

    ,,1,

    ,1,,

    !

    !

  • 8/8/2019 Boosting and Applications Yuan

    30/41

    Scale-invariance

    Scale-invariance is achieved during the training

    process.

    A pyramid of different scales is built with a base

    resolution.

    !

    p!

    n!

    o!

    !(

    1

    1

    1

    1

    1

    tl

    tll

    tl

    tll

    tl

    tll

    tl

    tll

    tl

    tll

    IIabs

    IIabs

    IIabs

    IIabs

    IIabs

  • 8/8/2019 Boosting and Applications Yuan

    31/41

    Training Filters

    The rectangle filters can have any size,

    aspect ratio or position as long as they fit in

    the detection window; therefore, there are

    quite a number of possible motion andappearance filters, from which a learning

    algorithm selects to build classifiers.

  • 8/8/2019 Boosting and Applications Yuan

    32/41

    The Classifier (contd)

    A classifier, C, is a thresholded sum of features:

    A feature ,F, is simply a thresholded filter that outputs

    one of the votes

    where is a feature threshold and is one of

    the motion or appearance filters. The real-valued

    and are computed during AdaBoost learning.

    "(

    ! !else

    DRLUIFifIIC

    N

    i titt

    0

    ,,,,,1, 11

    U

    "(

    !else

    tDRLUIfifIIF

    iti

    ttiF

    E ,,,,,, 1

    it if

    E

    ?

  • 8/8/2019 Boosting and Applications Yuan

    33/41

    Training Process

    The training process uses AdaBoost to select a

    subset of features (F) which minimize the weighted

    error, to construct the classifier.

    In each round, the learning algorithm chooses a setof filters from motion and appearance filters.

    Also picks the optimal threshold (t) for each feature

    as well as the votes and

    The outputs of AdaBoost is a linear combination ofthe selected features.

    E F

  • 8/8/2019 Boosting and Applications Yuan

    34/41

    Training Process

    A cascade architecture is used to raise the

    efficiency of the system.

    The true and false positives passed at the

    current stage will be used in the next stage of

    the cascade. The goal is to reduce the false

    positive rate faster than the detection rate.

  • 8/8/2019 Boosting and Applications Yuan

    35/41

    Experiments

    Each classifier in the cascade is trained usingthe original positive examples and the samenumber of false positives from the previous

    stage or negative examples at the first stage. The resulting classifier of previous stage is

    used as the input of the current stage andbuild a new classifier with lower false positive

    rate The detection threshold is set using a

    validation set of image pairs.

  • 8/8/2019 Boosting and Applications Yuan

    36/41

    Training samples

    A small sample of

    positive training

    examples. A pair of

    image patternscomprise a single

    example for training

  • 8/8/2019 Boosting and Applications Yuan

    37/41

    Training the cascade

    A large number of motion and appearance

    filters for training the dynamic pedestrians

    Fewer number of appearance filters for

    training the static pedestrians

  • 8/8/2019 Boosting and Applications Yuan

    38/41

    Training results

    The first five filters learned for

    the dynamic pedestrian

    detector. The six images used

    in the motion and appearance

    representation are shown foreach filter

    The first five filters learned for

    the static pedestrian detector

  • 8/8/2019 Boosting and Applications Yuan

    39/41

    Testing

    Detection for the dynamic detector

    Detection for the dynamic detector

  • 8/8/2019 Boosting and Applications Yuan

    40/41

    Testing

    Detection for static detector

  • 8/8/2019 Boosting and Applications Yuan

    41/41

    Thanks


Recommended