Pattern Recognition
Lecture 1 - Overview
Jim Rehg
School of Interactive ComputingGeorgia Institute of TechnologyAtlanta, Georgia USA
June 12, 2007
J. M. Rehg © 20072
Goal
Learn a function that maps features x to predictions C, given a dataset D = {Ck , xk}
Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)
J. M. Rehg © 20073
Example: Skin Detection in Web Images
Images containing people are interestingMost images with people in them contain visible skinSkin can be detected in images based on its color.Goal: Automatic detection of “adult” images
DEC Cambridge Research Lab, 1998
J. M. Rehg © 20074
Physics of Skin Color
Skin color is due to melanin and hemoglobin.Hue (normalized color) of skin is largely invariant
across the human population.Saturation of skin color varies with concentration of
melanin and hemoglobin (e.g. lips).Detailed color models exist for melanoma
identification using calibrated illumination.But observed skin color will be effected by lighting,
image acquisition device, etc.
J. M. Rehg © 20075
Skin Classification Via Statistical Inference
Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with
Application to Skin Detection”, IJCV, 2001.Model color distribution in skin and nonskin cases
Estimate p(RGB | skin) and p(RBG | nonskin)Decision rule: f : RGB {“skin”, “nonskin”}
Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB)Data set D
12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl
1 billion hand-labeled pixels in training set
J. M. Rehg © 20076
Some Example Photos
Example skin images
Example non-skin images
J. M. Rehg © 20077
Manually Labeling Skin and Nonskin
Labeled skin pixels are segmented by hand:
Labeled nonskin pixels are easily obtained from images without people
J. M. Rehg © 20078
Skin Color Modeling Using Histograms
Feature space design Standard RGB color space - easily available, efficient
Histogram probability model
P(RBG | skin) P(RBG | nonskin)
J. M. Rehg © 20079
Skin Color Histogram
Segmented skin regions produce a histogram in RGB space showing thedistribution of skin colors. Three views of the same skin histogram are shown:
J. M. Rehg © 200710
Non-Skin Color Histogram
Three views of the same non-skin histogram showing the distribution ofnon-skin colors:
J. M. Rehg © 200711
Decision Rule
Rule)(Bayes )(
)1()1|()|1(
RGBp
CpCRGBpRGBCp
Class labels: “skin” C=1 “nonskin” C=0
Equivalently:
otherwise 0
)|0C()|1C( 1)(
RGBpRGBpRGBf
1 )|0C(
)|1C(
RGBp
RGBp
><
f =1
f = 0
J. M. Rehg © 200712
Likelihood Ratio Test
1 )0()0C|(
)1()1C|(
CpRGBp
CpRGBp ><
f =1
f = 0
)1(
)0(
)0C|(
)1C|(
Cp
Cp
RGBp
RGBp ><
f =1
f = 0
The ratio of class priors is usually treated as a parameter(threshold) which is adjusted to trade-off between types of errors
J. M. Rehg © 200713
Skin Classifier Architecture
Input Image
P(RBG | skin)
P(RBG | nonskin)
)0C|(
)1C|(
RGBp
RGBp ><
f =1
f = 0
Output“skin”
J. M. Rehg © 200714
Measuring Classifier QualityGiven a testing set T = {Cj , xj} that was not used for
training, apply the classifier to obtain predictions
Testing set partitioned into four categories
},,1;)(ˆ{ NjxfC jj
N
jjjFN
N
jjjTN
N
jjjFP
N
jjjTP
CCIN
CCIN
CCIN
CCIN
1
1
1
1
)]0ˆ(&)1[( :Negatives False
)]0ˆ(&)0[( :Negatives True
)]1ˆ(&)0[( : PositivesFalse
)]1ˆ(&)1[( : PositivesTrue
Indicator functionfor boolean B:
other 0
true 1][
BBI
J. M. Rehg © 200715
Measuring Classifier Quality
N
jjN
N
jjP CINCIN
11
]0[ and ]1[ :Define
FPTNNFNTPP NNNNNN and Then
P
TPR N
Nd : RateDetection
A standard convention is to report
Fraction of positive examples classified correctly
N
FPR N
Nf : Rate PositiveFalse
Fraction of negative examples classified incorrectly
J. M. Rehg © 200716
Trading Off Types of Errors
Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect dR = 1 and fR = 1
Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct dR = 0 and fR = 0
)0C|(
)1C|(
RGBp
RGBp ><
f =1
f = 0
0
J. M. Rehg © 200717
ROC CurveD
etec
tion
Rat
e d
R
False Positive Rate fR
0 10.50.25 0.750
0.25
0.5
0.75
1
0
Each sample point on ROCcurve is obtained by scoringT with a particular
Generating ROCcurve does notrequire classifierretraining
J. M. Rehg © 200718
ROC CurveD
etec
tion
Rat
e d
R
False Positive Rate fR
0 10.50.25 0.750
0.25
0.5
0.75
1
0
A fair way to com- pare two classifiers is to show their ROC curves for the same T
ROC stands for“Receiver Oper-ating Characteristic”and was originallydeveloped for tuning radar receivers
J. M. Rehg © 200719
Scalar Measures of Classifier PerformanceD
etec
tion
Rat
e d
R
False Positive Rate fR
0 10.50.25 0.750
0.25
0.5
0.75
1
Equal Error Rate
Area under theROC curve
J. M. Rehg © 200720
ROC Curve Summary
ROC curve gives “application independent” measure of classifier performance
Performance reports based on a single point on the ROC curve are generally meaningless
Several possible scalar “summaries” Area under the ROC curve Equal error rate
Compute ROC by iterating over the values of Compute the detection and false positive rates on the
testing set for each value of and plot the resulting point.
J. M. Rehg © 200721
Example Results
Skin examples:
Nonskin examples:
J. M. Rehg © 200722
Skin Detector Performance
Extremely good results considering only color of single pixel is being used.
Best published results (at the time)
One of the largest datasets used in a vision model (nearly 1 billion labeled pixels).
False Positive Rate fR
Det
ecti
on R
ate
dR
But why does it work so well ???
J. M. Rehg © 200723
Analyzing the color distributions
2D color histogram for photos on the webprojected onto a slice through the 3Dhistogram: Surface plot of the 2D histogram:
Why does it work so well?
J. M. Rehg © 200724
Contour Plots
Full color model (includes skin and non-skin):
J. M. Rehg © 200725
Contour Plots Continued
Non-skin model: Skin model:
Skin color distribution is surprisingly well-separatedfrom the background distribution of color in web images
J. M. Rehg © 200727
Adult Image Detection
SkinDetector
Image
Observation: Adult images usually contain large areas of skin
Output of skin detector can be used to create feature vector for an image
Adult image classifier trained on feature vectors Exploring joint image/text analysis
SkinFeatures
Neural netClassifier
TextFeatures
ClassifierHTML
Adult?
J. M. Rehg © 200729
More Examples
Classified as notadult
Classified as not adult
Classified as not adult
Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin
J. M. Rehg © 200730
Performance of Adult Image Detector
J. M. Rehg © 200731
Adult Image Detection Results
Two sets of html pages collected.Crawl A: Adult sites (2365 pages, 11323 images).Crawl B: Non-adult sites (2692 pages, 13973 images).
image-based text-based combined “OR” detector detector detector
----------------- ------------- ------------------- % of adult images rated correctly (set A): 85.8% 84.9% 93.9%
% of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%
J. M. Rehg © 200732
Computational Cost Analysis
General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image = .078 sec
Skin Color Based Adult Image Detector Time to classify = .043 sec Implies 23 images/sec throughput
J. M. Rehg © 200738
Summary of Skin Detection Example
What are the factors that made skin detection successful?
Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable).
Low dimensionality makes adequate data collection feasible and classifier design a non-issue.
Intrinisic dimensions are clear a priori– Concentration of nonskin model along grey line is
completely predictable from the design of perceptual color spaces
J. M. Rehg © 200739
Perspectives on Pattern Recognition
Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods:
Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …
J. M. Rehg © 200740
Statistical Perspective
Statistical Inference Approach Probability model p(C , x | ), where is vector of
parameters estimated from D using statistical inference Decision rule is derived from p(C , x | ) Two philosophical schools
– Frequentist Statistics– Bayesian Statistics
Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.
J. M. Rehg © 200741
Decision Theory Perspective
Three ways to obtain the decision rule f (x)Generative Modeling
Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages
– Use p(x) for novelty detection– Sample from p(x) to generate synthetic data and assess
model quality– Use p(C | x) to assess confidence in answer (reject region)– Easy to compose modules that output posterior probabilities
J. M. Rehg © 200742
Decision Rule
Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages
– The posterior is often much simpler than the likelihood function
– Posterior more directly related to the classification rule, may yield fewer prediction errors.