CAP5415-ComputerVisionLecture20-FaceRecognition,Haar Features,
LocalBinaryPatterns,andBoosting
Lecture20
1
Reminders• ProjectDeadlines/Return
– 6December,from1pm– 4.30pm– 8December,from1pm– 4.30pm– Location:HEC221– 5minutesPowerpoint presentation+Demo– Return
• Your.ppt/pdfpresentationandcode.
2
Lecture20
FaceDetectionandRecognition
3
Lecture20
Detection Recognition “Sally”
• Whywasn’tMassachusetss BomberidentifiedbytheMassachusettsDepartmentofMotorVehiclessystemfromthevideosurveillanceimages?
• HewasenrolledinMADMVDatabase!
4
Lecture20
DMV Face Recognition
System ?SlideCreditstoAnimetrics,Dr.MarcValliant,VP&CTO
• Today’sFRtechnologywillreliablyfindcontrolledfacialphoto inamugshot databaseofcontrolleddatabase.
5
Lecture20
SlideCreditstoAnimetrics,Dr.MarcValliant,VP&CTO
ControlledFacialPhoto• Today’sFRtechnologywillreliablyfindcontrolledfacialphotoinamugshot databaseofcontrolleddatabase.
• However,thereareconfoundingvariablesinuncontrolledfacialphotos
6
Lecture20
SlideCreditstoAnimetrics,Dr.MarcValliant,VP&CTO
ControlledFacialPhoto• Today’sFRtechnologywillreliablyfindcontrolledfacialphotoinamugshot databaseofcontrolleddatabase.
• However,thereareconfoundingvariablesinuncontrolledfacialphotos– Resolution (not enough pixels)– Facial Pose – angulated– Illumination– Occluded facial areas
7
Lecture20
SlideCreditstoAnimetrics,Dr.MarcValliant,VP&CTO
FurtherDifficulties
8
Lecture20
Threegoals
9
Lecture20
FeatureComputation
• features must be computed as quickly as possible
FeatureSelection
• select the most discriminating features
Realtimeliness
• must focus on potentially positive image areas (that contain faces)
FaceDetection• Beforefacerecognitioncanbeappliedtoageneralimage,the
locationsandsizesofanyfacesmustbefirstfound.
• Rowley,Baluja,Kanade (1998) 10
Lecture20
FaceDetection/RecognitionusingMobileDevices
11
Lecture20
Facedetection(cameraautomaticallyAdjustthefocusbasedondetectedFaces)
Auto-loginwithrecognizedfaces
FaceDetection
Feature-basedEye,mouth,..
Template-basedAAM,…
Appearance-BasedPatches,…
12
Lecture20
Someoftherepresentativeworks
13
Lecture20
Rectangle(Haar-like)Features
14
Lecture20
“Rectangle filters”
Value =
∑ (pixels in white area) –∑ (pixels in black area)
FastComputationwithIntegralImages
15
Lecture20
• This can quickly be computed in one pass through the image
(x,y)( ) ( )
( ) ( ) ( )( ) ( ) ( )
' , '
formal definition:, ', '
Recursive definition:, , 1 ,
, 1, ,
x x y yii x y i x y
s x y s x y i x y
ii x y ii x y s x y
≤ ≤
=
= − +
= − +
∑
0 1 1 11 2 2 31 2 1 11 3 1 0
IMAGE
0 1 2 31 4 7 112 7 11 163 11 16 21
INTEGRAL IMAGE
FeatureSelection
16
Lecture20
• For a 24x24 detection region, the number of possible rectangle features is ~160,000!
FeatureSelection
17
Lecture20
• For a 24x24 detection region, the number of possible rectangle features is ~160,000! PCA
LocalBinaryPatterns(LBP):AlternativeFeatures
• Gray-scaleinvarianttexturemeasure• Derivedfromlocalneighborhood• Powerfultexturedescriptor• Computationallysimple• Robustagainstmonotonicgray-scalechanges
18
Lecture20
LocalBinaryPatterns(LBP):AlternativeFeatures
19
Lecture20
(LBPfromdynamic/videotexture)
SimpleFRforMobileDevices
20
Lecture20
(LBP:localbinarypatterns)
PrincipalComponentAnalysis(PCA)
21
Lecture20
• Mappingfromtheinputsintheoriginald-dimensionalspacetoanew(k<d)-dimensionalspace,withminimumlossofinformation.
PrincipalComponentAnalysis(PCA)
22
Lecture20
• Mappingfromtheinputsintheoriginald-dimensionalspacetoanew(k<d)-dimensionalspace,withminimumlossofinformation.
• PCAisanunsupervisedmethod, itdoesnotuseoutput information.
PrincipalComponentAnalysis(PCA)
23
Lecture20
• Mappingfromtheinputsintheoriginald-dimensionalspacetoanew(k<d)-dimensionalspace,withminimumlossofinformation.
• PCAisanunsupervisedmethod, itdoesnotuseoutput information.
PCAcentersthesampleandthenrotatestheaxestolineupwiththedirectionsofhighestvariance.
PrincipalComponentAnalysis(PCA)
• Theprojectionofx onthedirectionof wis
24
Lecture20
z = w
Tx
Originalaxes
**
***
*
* **
****
*
*
*
** ***
*
**
Datapoints
FirstprincipalcomponentSecondprincipalcomponent
PrincipalComponentAnalysis(PCA)
• Theprojectionofx onthedirectionof wis
• Theprincipalcomponentusw1 suchthatthesample,afterprojectiononw1,ismostspreadoutsothatthedifferencebetweenthesamplepointsbecomesmostapparent.
25
Lecture20
z = w
Tx
PrincipalComponentAnalysis(PCA)
• Theprojectionofx onthedirectionof wis
• Theprincipalcomponentusw1 suchthatthesample,afterprojectiononw1,ismostspreadoutsothatthedifferencebetweenthesamplepointsbecomesmostapparent.
• Tohaveuniquesolution,
26
Lecture20
z = w
Tx
||w1|| = 1
PrincipalComponentAnalysis(PCA)
• Theprojectionofx onthedirectionof wis
• Theprincipalcomponentusw1 suchthatthesample,afterprojectiononw1,ismostspreadoutsothatthedifferencebetweenthesamplepointsbecomesmostapparent.
• Tohaveuniquesolution,• with
27
Lecture20
z = w
Tx
||w1|| = 1
z1 = w1Tx
Cov(x) = ⌃
PrincipalComponentAnalysis(PCA)
• Theprojectionofx onthedirectionof wis
• Theprincipalcomponentusw1 suchthatthesample,afterprojectiononw1,ismostspreadoutsothatthedifferencebetweenthesamplepointsbecomesmostapparent.
• Tohaveuniquesolution,• with• Then,
28
Lecture20
z = w
Tx
||w1|| = 1
z1 = w1Tx
Cov(x) = ⌃V ar(z1) = w1
T⌃w1
PrincipalComponentAnalysis(PCA)
• Theprojectionofx onthedirectionof wis
• Theprincipalcomponentusw1 suchthatthesample,afterprojectiononw1,ismostspreadoutsothatthedifferencebetweenthesamplepointsbecomesmostapparent.
• Tohaveuniquesolution,• with• Then,• SEEKw1 suchthatVar(z1)ismaximized!
29
Lecture20
z = w
Tx
||w1|| = 1
z1 = w1Tx
Cov(x) = ⌃V ar(z1) = w1
T⌃w1
SolutionofPCA• WriteitasaLagrangeproblem,takederivativesw.r.t tow,then
wheremisthesamplemean
(=D diagonal)(S:spectraldecomp.)
30
Lecture20
z = W
T (x�m)
Cov(z) = WTSW
XTX = WDWT
SolutionofPCA
• Letussaywewanttoreducedimensionalitytok<d,wetakethefirstkcolumnsofW(withthehighesteigenvalues).
31
Lecture20
XTX = WDWT
SolutionofPCA
• Letussaywewanttoreducedimensionalitytok<d,wetakethefirstkcolumnsofW(withthehighesteigenvalues).
i=1,…k,t=1,...,N
32
Lecture20
XTX = WDWT
zti = w
Ti x
t
(XTX)wi = �iwi
X = USVT
U = evec(XXT )
V = evec(XTXT )
S2 = eval(XXT )
Screeplot:AbilityofPCstoexplainvariationindata
• Enough PCs (principal components) to have a cumulative variance explained by the PCs thatis >50-70%
• Kaiser criterion: keep PCs with eigenvalues >1
33
Lecture20
λ
λN
Recap:PCAcalculationsincartoon
34
Lecture20
StepsinPCA:#1CalculateAdjustedDataSet
…ndims
datasamples
DataSet:D Meanvalues:M
-…
AdjustedDataSet:A
=Mi iscalculatedbytakingthemeanofthevaluesindimensioni
Recap:PCAcalculationsincartoon
35
Lecture20
StepsinPCA:#2CalculateCo-variancematrix,C,fromAdjustedDataSet,A
Co-varianceMatrix:C
n
n
Cij =cov(i,j)
Note:Sincethemeansofthedimensions intheadjusteddataset,A,are0,thecovariancematrixcansimplybewrittenas:
C=AAT/(n-1)
Recap:PCAcalculationsincartoon
36
Lecture20
StepsinPCA:#3CalculateeigenvectorsandeigenvaluesofC
Eigenvectors
Eigenvalues
Ifsomeeigenvaluesare0orverysmall,wecanessentially discardthoseeigenvaluesandthecorrespondingeigenvectors,hencereducingthedimensionality ofthenewbasis.
Eigenvectors
Eigenvalues
xx
MatrixEMatrixE
Recap:PCAcalculationsincartoon
37
Lecture20
StepsinPCA:#4Transformingdatasettothenewbasis
F=ETA
where:• Fisthetransformeddataset• ET isthetransposeoftheEmatrixcontainingtheeigenvectors• Aistheadjusteddataset
Notethatthedimensions ofthenewdataset,F,arelessthanthedatasetA
TorecoverAfromF:
(ET)-1F=(ET)-1ETA(ET)TF=AEF=A
*Eisorthogonal,thereforeE-1 =ET
HolisticFR:Eigenfaces
38
Lecture20
Eigenfaces,fisherfaces,tensorfaces…..
GaborFeature-basedFR• EarlierFRmethodsaremostlyfeature-based.• Themostsuccessfulfeature-basedFRistheelasticbunchgraph
matchingsystemwithGaborfiltercoefficientsasfeatures:
39
Lecture20
(scale)
GaborFeatures
40
Lecture20
Scale(5) Orientation(8)
PCAonFaces:“Eigenfaces”
41
Lecture20
Averageface
Firstprincipalcomponent
Othercomponents
Forallexceptaverage,“gray” =0,“white” >0,“black” <0
Eigenfaces example
42
Lecture20
Trainingfaces
Eigenfaces example
43
Lecture20
Top eigenvectors: u1,…uk
Mean: μ
Applicationtofaces
44
Lecture20
• Representing faces onto this basis
Facereconstruction:
SimplestApproachtoFR
45
Lecture20
− The simplest approach is to think of it as a template matching problem
− Problems arise when performing recognition in a high-dimensional space.
− Significant improvements can be achieved by first mapping the data into a lower dimensionality space.
FRusingeigenfaces
46
Lecture20
FRusingeigenfaces
47
Lecture20
− The distance er is called distance within face space (difs)
− The Euclidean distance can be used to compute er, however, the Mahalanobis distance has shown to work better:
2
1|| || ( )
Kk k
i iiw w
=
Ω−Ω = −∑
Mahalanobis distance
Euclidean distance
FaceDetection
48
Lecture20
(iPhoto)
FaceDetection
49
Lecture20
NikonS60
FaceDetection
50
Lecture20
NikonS60finds12faces…
TheViola/JonesFaceDetector• Aseminalapproachtoreal-timeobjectdetection• Trainingisslow,butdetectionisveryfast• Keyideas
– Integralimages forfastfeatureevaluation– Boosting forfeatureselection– Attentional cascade forfastrejectionofnon-facewindows
51
Lecture20
P.ViolaandM.Jones.Rapidobjectdetectionusingaboostedcascadeofsimplefeatures. CVPR2001.
P.ViolaandM.Jones.Robustreal-timefacedetection. IJCV57(2),2004.
TheViola/JonesFaceDetector-Training
• Initially,weighteachtrainingexampleequally• Ineachboostinground:
• Findtheweaklearner thatachievesthelowestweighted trainingerror• Raisetheweightsoftrainingexamplesmisclassifiedbycurrentweak
learner
• Computefinalclassifieraslinearcombinationofallweaklearners(weightofeachlearnerisdirectlyproportionaltoitsaccuracy)• Exactformulasforre-weightingandcombiningweaklearnersdependon
theparticularboostingscheme (e.g.,AdaBoost)
52
Lecture20
P.ViolaandM.Jones.Rapidobjectdetectionusingaboostedcascadeofsimplefeatures. CVPR2001.
P.ViolaandM.Jones.Robustreal-timefacedetection. IJCV57(2),2004.
TheViola/JonesFaceDetector-Testing
53
Lecture20
• Firsttwofeaturesselectedbyboosting:
Thisfeaturecombinationcanyield100%detectionrateand50%falsepositiverate
TheViola/JonesFaceDetector-Testing
54
Lecture20
• A200-featureclassifiercanyield95%detectionrateandafalsepositiverateof1in14084
Notgoodenough!
Attentional Cascade
55
Lecture20
FACEIMAGESUB-WINDOW
Classifier 1T
Classifier 3T
F
NON-FACE
TClassifier 2
T
F
NON-FACE
F
NON-FACE
•Westartwithsimpleclassifierswhichrejectmanyofthenegativesub-windowswhiledetectingalmostallpositivesub-windows•Positiveresponsefromthefirstclassifiertriggerstheevaluationofasecond(morecomplex)classifier,andsoon•Anegativeoutcomeatanypointleadstotheimmediaterejectionofthesub-window
vsfalse neg determined by
% False Pos
% D
etec
tion
0 50
0
100
Receiveroperatingcharacteristic
CascadedClassifiers(Boosting)
56
Lecture20
input
Base-learners
Output
BoostingforFR
57
Lecture20
Weak Classifier 1
BoostingforFR
58
Lecture20
WeightsIncreased
BoostingforFR
59
Lecture20
Weak Classifier 2
BoostingforFR
60
Lecture20
WeightsIncreased
BoostingforFR
61
Lecture20
Weak Classifier 3
BoostingforFR
62
Lecture20
Final classifier is a combination of weak classifiers
AdaBoost Algorithm
63
Lecture20
• Given ,• Initialize• For
– For each classifier that minimizes the error with respect to the distribution
• is the weighted error rate of classifier– If , then stop – Choose , typically – Update
• where is a normalized factor (choose so that Dt+1 will sum_x=1)
1 1( , ),..., ( , )m mx y x y , { 1, 1}i ix X y Y∈ ∈ = − +
11( ) , 1,..., ,D i i mm
= =
1,...,t T=: { 1, 1}th X → − +
tD
argmint
t th H
h ε∈
= ( )[ ( )]t t i t iD i y h xε = ≠∑
0.5tε ≥t Rα ∈ 11 ln
2t
tt
εαε−=
tε th
1( ) exp( ( ))( ) t t i t i
tt
D i y h xD iZα
+−=
tZ
BoostingforFR
64
Lecture20
• Define weak learners based on rectangle features
1( ) ( )
T
t tt
H x sign a h x=
⎛ ⎞= ⎜ ⎟⎝ ⎠∑
Boosting&SVM• Advantages of boosting
– Integrates classification with feature selection– Complexity of training is linear instead of
quadratic in the number of training examples– Flexibility in the choice of weak learners, boosting
scheme– Testing is fast– Easy to implement
• Disadvantages– Needs many training examples– Often does not work as good as SVM
65
Lecture20
References&SliceCredits
66
Lecture20
• Animetrics,Dr.MarcValliant,VP&CTO• M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive
Neuroscience, vol. 3, no. 1., 1991.• Y.FreundandR.Schapire,Ashortintroductiontoboosting,JournalofJapanese
SocietyforArtificialIntelligence,14(5):771-780,September,1999.• S.Li, et al. Handbook of Face Recognition, Springer.• Paul A. Viola and Michael J. Jones, Intl. J. Computer Vision
57(2), 137–154, 2004, (originally in CVPR’2001)• Some slides adapted from Bill Freeman, MIT 6.869, April 2005)• Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a
Statistical View of Boostinghttp://www-stat.stanford.edu/~hastie/Papers/boost.ps