+ All Categories
Home > Documents > Viola Jones

Viola Jones

Date post: 02-Apr-2018
Category:
Upload: sumeet-saurav
View: 230 times
Download: 0 times
Share this document with a friend

of 29

Transcript
  • 7/27/2019 Viola Jones

    1/29

    Robust Real-time Face

    Detectionby

    Paul Viola and Michael Jones, 2002

    Presentation by Kostantina Palla & Alfredo Kalaitzis

    School of InformaticsUniversity of Edinburgh

    February 20, 2009

  • 7/27/2019 Viola Jones

    2/29

    Overview

    Robust very high Detection Rate (True-Positive

    Rate) & very low False-Positive Rate always.

    Real Time For practical applications at least 2

    frames per second must be processed.

    Face Detection not recognition. The goal is to

    distinguish faces from non-faces (face detection is the

    first step in the identification process)

  • 7/27/2019 Viola Jones

    3/29

    Three goals & a conlcusion

    1. Feature Computation: what features? And howcan they be computed as quickly as possible

    2. Feature Selection: select the most discriminatingfeatures

    3. Real-timeliness: must focus on potentiallypositive areas (that contain faces)

    4. Conclusion: presentation of results anddiscussion of detection issues.

    How did Viola & Jones deal with these challenges?

  • 7/27/2019 Viola Jones

    4/29

    1. Feature Computation

    The Integral image representation

    2. Feature SelectionThe AdaBoost training algorithm

    3. Real-timeliness

    A cascade of classifiers

    Three solutions

  • 7/27/2019 Viola Jones

    5/29

    Features Can a simple feature (i.e. a value) indicate

    the existence of a face?

    All faces share some similar properties

    The eyes region is darker than theupper-cheeks.

    The nose bridge region is brighter thanthe eyes.

    That is useful domain knowledge

    Need for encoding of Domain Knowledge:

    Locat ion - Size:eyes & nose bridgeregion

    Value:darker / brighter

    Overview| Integral Image | AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    6/29

    Rectangle features Rectangle features:

    Value = (pixels in black area) - (pixels in white area)

    Three types: two-, three-, four-rectangles,Viola&Jones used two-rectangle features

    For example: the difference in brightnessbetween the white &black rectangles overa specific area

    Each feature is related to a speciallocation in the sub-window

    Each feature may have any size

    Why not pixels instead of features? Features encode domain knowledge

    Feature based systems operate faster

    Overview | Integral Image | AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    7/29

    Integral Image Representation(also check back-up slide #1)

    Given a detection resolution of 24x24(smallest sub-window), the set ofdifferent rectangle features is

    ~160,000 !

    Need for speed

    Introducing Integral ImageRepresentation

    Definition: The integral image at

    location (x,y), is the sum of thepixels above and to the left of(x,y), inclusive

    The Integral image can be computedin a single pass and only once foreach sub-window!

    ' , '

    formal definition:

    , ', '

    Recursive definition:

    , , 1 ,

    , 1, ,

    x x y y

    ii x y i x y

    s x y s x y i x y

    ii x y ii x y s x y

    Overview | Integral Image | AdaBoost | Cascade

    y

    x

  • 7/27/2019 Viola Jones

    8/29

    back-up slide #1

    Overview | Integral Image | AdaBoost | Cascade

    0 1 1 1

    1 2 2 3

    1 2 1 1

    1 3 1 0

    IMAGE

    0 1 2 3

    1 4 7 11

    2 7 11 16

    3 11 16 21

    INTEGRAL IMAGE

  • 7/27/2019 Viola Jones

    9/29

  • 7/27/2019 Viola Jones

    10/29

    Three goals1. Feature Computation: features must be

    computed as quickly as possible

    2. Feature Selection: select the mostdiscriminating features

    3. Real-timeliness: must focus on potentially

    positive image areas (that contain faces)

    How did Viola & Jones deal with these challenges?

    Overview | Integral Image | AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    11/29

    Feature selection Problem: Too many features

    In a sub-window (24x24) there are~160,000 features (all possiblecombinations of orientation, locationand scale of these feature types)

    impractical to compute all of them(computationally expensive)

    We have to select a subset of relevantfeatures which are informative - tomodel a face Hypothesis: A very small subset of

    features can be combined to form aneffective classifier

    How? AdaBoost algorithm

    Overview | Integral Image| AdaBoost | Cascade

    Relevant feature Irrelevant feature

  • 7/27/2019 Viola Jones

    12/29

    AdaBoost

    Stands for Adaptive boost

    Constructs a strong classifier as a

    linear combination of weighted simpleweak classifiers

    Overview | Integral Image| AdaBoost | Cascade

    Strong

    classifier

    Weak classifier

    WeightImage

  • 7/27/2019 Viola Jones

    13/29

    AdaBoost - Characteristics Features as weak classifiers

    Each single rectangle feature may be regardedas a simple weak classifier

    An iterative algorithmAdaBoost performs a series of trials, each time

    selecting a new weak classifier

    Weights are being applied over the set ofthe example images

    During each iteration, each example/imagereceives a weight determining its importance

    Overview | Integral Image| AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    14/29

    AdaBoost - Getting the idea

    Given: example images labeled +/-

    Initially, all weights set equally

    Repeat T times

    Step 1: choose the most efficient weak classifier that will be acomponent of the final strong classifier (Problem! Remember the hugenumber of features)

    Step 2: Update the weights to emphasize the examples which wereincorrectly classified

    This makes the next weak classifier to focus on harder examples

    Final (strong) classifier is a weighted combination of the T weak classifiers

    Weighted according to their accuracy

    Overview | Integral Image| AdaBoost | Cascade

    otherwise

    xxh

    T

    t

    T

    t ttth

    02

    1)(1

    )( 1 1

    (pseudo-code at back-up slide #2)

  • 7/27/2019 Viola Jones

    15/29

    AdaBoostFeature SelectionProblem

    On each round, large set of possible weak classifiers (each simpleclassifier consists of a single feature) Which one to choose?

    choose the most efficient (the one that best separates theexamples the lowest error)

    choice of a classifier corresponds to choice of a feature

    At the end, the strong classifier consists of T features

    Conclusion

    AdaBoost searches for a small number of good classifiers features(feature selection)

    adaptively constructs a final strong classifier taking into account thefailures of each one of the chosen weak classifiers (weight appliance)

    AdaBoost is used to both select a small set of features and train astrong classifier

    Overview | Integral Image| AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    16/29

  • 7/27/2019 Viola Jones

    17/29

    Now we have a good face detector We can build a 200-feature

    classifier!

    Experiments showed that a 200-feature classifier achieves:

    95% detection rate 0.14x10-3 FP rate (1 in 14084)

    Scans all sub-windows of a384x288 pixel image in 0.7seconds (on Intel PIII 700MHz)

    The more the better (?) Gain in classifier performance

    Lose in CPU time

    Verdict: good & fast, but notenough Competitors achieve close to 1 in

    a1.000.000 FP rate!

    0.7 sec / frame IS NOT real-time.

    Overview | Integral Image| AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    18/29

    Three goals1. Feature Computation: features must be

    computed as quickly as possible

    2. Feature Selection: select the mostdiscriminating features

    3. Real-timeliness: must focus on potentially

    positive image areas(that contain faces)

    How did Viola & Jones deal with these challenges?

    Overview | Integral Image| AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    19/29

    The attentional cascade

    On average only 0.01% of all sub-windows are positive (are faces)

    Status Quo: equal computation time isspent on all sub-windows

    Must spend most time only onpotentially positive sub-windows.

    A simple 2-feature classifier canachieve almost 100% detection ratewith 50% FP rate.

    That classifier can act as a 1st layer ofa series to filter out most negativewindows

    2nd

    layer with 10 features can tackleharder negative-windows whichsurvived the 1stlayer, and so on

    A cascade of gradually more complexclassifiers achieves even betterdetection rates.

    Overview | Integral Image| AdaBoost | Cascade

    On average, much fewer

    features are computed per

    sub-window (i.e. speed x 10)

  • 7/27/2019 Viola Jones

    20/29

    Training a cascade of classifiers

    Overview | Integral Image| AdaBoost | Cascade

    Strong classifier definition:

    otherwise

    xxh

    T

    tt

    T

    ttth

    02

    1)(1

    )(11 ,

    where )1

    log(t

    t , 1 tt

    t

    Keep in mind: Competitors achieved 95% TP rate,10-6 FP rate

    These are the goals. Final cascade must do better!

    Given the goals, to design a cascade we must choose:

    Number of layers in cascade (strong classifiers)

    Number of features of each strong classifier (the T in definition)

    Threshold of each strong classifier (the in definition)

    Optimization problem: Can we find optimum combination?

    T

    t t12

    1

    TREMENDOUSLY

    DIFFICULT

    PROBLEM

  • 7/27/2019 Viola Jones

    21/29

    A simple framework for cascade training

    Overview | Integral Image| AdaBoost | Cascade

    Do not despair. Viola & Jones suggested a heuristic algorithm forthe cascade training: (pseudo-code at backup slide # 3) does not guarantee optimality

    but produces a effective cascade that meets previous goals

    Manual Tweaking: overall training outcome is highly depended on users choices

    select fi (Maximum Acceptable False Positive rate / layer)

    select di (Minimum Acceptable True Positive rate / layer)

    select Ftarget (Target Overall FP rate)

    possible repeat trial & error process for a given training set

    Until Ftarget is met: Add new layer:

    Until fi , di rates are met for this layer Increase feature number & train new strong classifier with AdaBoost

    Determine rates of layer on validation set

  • 7/27/2019 Viola Jones

    22/29

    backup slide #3User selects values forf, the maximum acceptable false positive rate per layer and d,

    the minimum acceptable detection rate per layer.

    User selects target overall false positive rateFtarget.P= set of positive examplesN= set of negative examplesF

    0= 1.0;D

    0= 1.0; i = 0

    WhileFi >Ftargeti++

    ni = 0;Fi=Fi-1

    whileFi >fxFi-1oni++

    oUsePandNto train a classifier with nifeatures using AdaBoost

    oEvaluate current cascaded classifier on validation set to determineFiandDioDecrease threshold for the ith classifier until the current cascaded classifier has

    a detection rate of at least dxDi-1 (this also affectsFi)

    N= IfFi >Ftargetthen evaluate the current cascaded detector on the set of non-face

    images and put any false detections into the setN.

    Overview | Integral Image| AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    23/29

  • 7/27/2019 Viola Jones

    24/29

    Training

    Set(sub-windows)

    Integral

    Representation

    Feature

    computation

    AdaBoostFeature Selection

    Cascade trainerTesting phaseTraining phase

    Strong Classifier 1

    (cascade stage 1)

    Strong Classifier N

    (cascade stage N)

    Classifier cascade

    framework

    Strong Classifier 2

    (cascade stage 2)

    FACE IDENTIFIED

    Overview | Integral Image| AdaBoost | Cascade

  • 7/27/2019 Viola Jones

    25/29

    pros Extremely fast feature computation

    Efficient feature selection

    Scale and location invariant detector Instead of scaling the image itself (e.g. pyramid-filters), we scale the

    features.

    Such a generic detection scheme can be trained for detection ofother types of objects (e.g. cars, hands)

    and cons Detector is most effective only on frontal images of faces

    can hardly cope with 45o face rotation Sensitive to lighting conditions

    We might get multiple detections of the same face, due tooverlapping sub-windows.

  • 7/27/2019 Viola Jones

    26/29

    Results(detailed results at back-up slide #4)

  • 7/27/2019 Viola Jones

    27/29

    Results (Cont.)

  • 7/27/2019 Viola Jones

    28/29

    Viola & Jones prepared their final Detector cascade: 38 layers, 6060 total features included

    1st classifier- layer, 2-features 50% FP rate, 99.9% TP rate

    2nd classifier- layer, 10-features 20% FP rate, 99.9% TP rate

    next 2 layers 25-features each, next 3 layers 50-features each

    and so on

    Tested on the MIT+MCU test set

    a 384x288 pixel image on an PC (dated 2001) took about 0.067seconds

    Detector 10 31 50 65 78 95 167 422

    Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9% 94.1%

    Rowley-Baluja-Kanade 83.2% 86.0% - - 89.2% 89.2% 90.1% 89.9%

    Schneiderman-Kanade - - - 94.4% - - - -

    Roth-Yang-Ajuha - - - - - - - -

    False detections

    Detection rates for various numbers of false positives on the MIT+MCU test set containing 130images and 507 faces (Viola & Jones 2002)

    backup slide #4

  • 7/27/2019 Viola Jones

    29/29

    Thank you for listening!


Recommended