Pattern Recognition Lecture 1 - Overview

Pattern Recognition

Lecture 1 - Overview

Jim Rehg

School of Interactive ComputingGeorgia Institute of TechnologyAtlanta, Georgia USA

June 12, 2007

J. M. Rehg © 20072

Goal

Learn a function that maps features x to predictions C, given a dataset D = {Ck , xk}

Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)

J. M. Rehg © 20073

Example: Skin Detection in Web Images

Images containing people are interestingMost images with people in them contain visible skinSkin can be detected in images based on its color.Goal: Automatic detection of “adult” images

DEC Cambridge Research Lab, 1998

J. M. Rehg © 20074

Physics of Skin Color

Skin color is due to melanin and hemoglobin.Hue (normalized color) of skin is largely invariant

across the human population.Saturation of skin color varies with concentration of

melanin and hemoglobin (e.g. lips).Detailed color models exist for melanoma

identification using calibrated illumination.But observed skin color will be effected by lighting,

image acquisition device, etc.

J. M. Rehg © 20075

Skin Classification Via Statistical Inference

Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with

Application to Skin Detection”, IJCV, 2001.Model color distribution in skin and nonskin cases

Estimate p(RGB | skin) and p(RBG | nonskin)Decision rule: f : RGB {“skin”, “nonskin”}

Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB)Data set D

12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl

1 billion hand-labeled pixels in training set

J. M. Rehg © 20076

Some Example Photos

Example skin images

Example non-skin images

J. M. Rehg © 20077

Manually Labeling Skin and Nonskin

Labeled skin pixels are segmented by hand:

Labeled nonskin pixels are easily obtained from images without people

J. M. Rehg © 20078

Skin Color Modeling Using Histograms

Feature space design Standard RGB color space - easily available, efficient

Histogram probability model

P(RBG | skin) P(RBG | nonskin)

J. M. Rehg © 20079

Skin Color Histogram

Segmented skin regions produce a histogram in RGB space showing thedistribution of skin colors. Three views of the same skin histogram are shown:

J. M. Rehg © 200710

Non-Skin Color Histogram

Three views of the same non-skin histogram showing the distribution ofnon-skin colors:

J. M. Rehg © 200711

Decision Rule

Rule)(Bayes )(

)1()1|()|1(

RGBp

CpCRGBpRGBCp

Class labels: “skin” C=1 “nonskin” C=0

Equivalently:

otherwise 0

)|0C()|1C( 1)(

RGBpRGBpRGBf

1 )|0C(

)|1C(

RGBp

RGBp

><

f =1

f = 0

J. M. Rehg © 200712

Likelihood Ratio Test

1 )0()0C|(

)1()1C|(

CpRGBp

CpRGBp ><

f =1

f = 0

)1(

)0(

)0C|(

)1C|(

Cp

Cp

RGBp

RGBp ><

f =1

f = 0

The ratio of class priors is usually treated as a parameter(threshold) which is adjusted to trade-off between types of errors

J. M. Rehg © 200713

Skin Classifier Architecture

Input Image

P(RBG | skin)

P(RBG | nonskin)

)0C|(

)1C|(

RGBp

RGBp ><

f =1

f = 0

Output“skin”

J. M. Rehg © 200714

Measuring Classifier QualityGiven a testing set T = {Cj , xj} that was not used for

training, apply the classifier to obtain predictions

Testing set partitioned into four categories

},,1;)(ˆ{ NjxfC jj

N

jjjFN

N

jjjTN

N

jjjFP

N

jjjTP

CCIN

CCIN

CCIN

CCIN

1

1

1

1

)]0ˆ(&)1[( :Negatives False

)]0ˆ(&)0[( :Negatives True

)]1ˆ(&)0[( : PositivesFalse

)]1ˆ(&)1[( : PositivesTrue

Indicator functionfor boolean B:

other 0

true 1][

BBI

J. M. Rehg © 200715

Measuring Classifier Quality

N

jjN

N

jjP CINCIN

11

]0[ and ]1[ :Define

FPTNNFNTPP NNNNNN and Then

P

TPR N

Nd : RateDetection

A standard convention is to report

Fraction of positive examples classified correctly

N

FPR N

Nf : Rate PositiveFalse

Fraction of negative examples classified incorrectly

J. M. Rehg © 200716

Trading Off Types of Errors

Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect dR = 1 and fR = 1

Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct dR = 0 and fR = 0

)0C|(

)1C|(

RGBp

RGBp ><

f =1

f = 0

0

J. M. Rehg © 200717

ROC CurveD

etec

tion

Rat

e d

R

False Positive Rate fR

0 10.50.25 0.750

0.25

0.5

0.75

1

0

Each sample point on ROCcurve is obtained by scoringT with a particular

Generating ROCcurve does notrequire classifierretraining

J. M. Rehg © 200718

ROC CurveD

etec

tion

Rat

e d

R


0 10.50.25 0.750

0.25

0.5

0.75

1

0

A fair way to com- pare two classifiers is to show their ROC curves for the same T

ROC stands for“Receiver Oper-ating Characteristic”and was originallydeveloped for tuning radar receivers

J. M. Rehg © 200719

Scalar Measures of Classifier PerformanceD

etec

tion

Rat

e d

R


0 10.50.25 0.750

0.25

0.5

0.75

1

Equal Error Rate

Area under theROC curve

J. M. Rehg © 200720

ROC Curve Summary

ROC curve gives “application independent” measure of classifier performance

Performance reports based on a single point on the ROC curve are generally meaningless

Several possible scalar “summaries” Area under the ROC curve Equal error rate

Compute ROC by iterating over the values of Compute the detection and false positive rates on the

testing set for each value of and plot the resulting point.

J. M. Rehg © 200721

Example Results

Skin examples:

Nonskin examples:

J. M. Rehg © 200722

Skin Detector Performance

Extremely good results considering only color of single pixel is being used.

Best published results (at the time)

One of the largest datasets used in a vision model (nearly 1 billion labeled pixels).


Det

ecti

on R

ate

dR

But why does it work so well ???

J. M. Rehg © 200723

Analyzing the color distributions

2D color histogram for photos on the webprojected onto a slice through the 3Dhistogram: Surface plot of the 2D histogram:

Why does it work so well?

J. M. Rehg © 200724

Contour Plots

Full color model (includes skin and non-skin):

J. M. Rehg © 200725

Contour Plots Continued

Non-skin model: Skin model:

Skin color distribution is surprisingly well-separatedfrom the background distribution of color in web images

J. M. Rehg © 200727

Adult Image Detection

SkinDetector

Image

Observation: Adult images usually contain large areas of skin

Output of skin detector can be used to create feature vector for an image

Adult image classifier trained on feature vectors Exploring joint image/text analysis

SkinFeatures

Neural netClassifier

TextFeatures

ClassifierHTML

Adult?

J. M. Rehg © 200729

More Examples

Classified as notadult

Classified as not adult

Classified as not adult

Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin

J. M. Rehg © 200730

Performance of Adult Image Detector

J. M. Rehg © 200731

Adult Image Detection Results

Two sets of html pages collected.Crawl A: Adult sites (2365 pages, 11323 images).Crawl B: Non-adult sites (2692 pages, 13973 images).

image-based text-based combined “OR” detector detector detector

----------------- ------------- ------------------- % of adult images rated correctly (set A): 85.8% 84.9% 93.9%

% of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%

J. M. Rehg © 200732

Computational Cost Analysis

General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image = .078 sec

Skin Color Based Adult Image Detector Time to classify = .043 sec Implies 23 images/sec throughput

J. M. Rehg © 200738

Summary of Skin Detection Example

What are the factors that made skin detection successful?

Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable).

Low dimensionality makes adequate data collection feasible and classifier design a non-issue.

Intrinisic dimensions are clear a priori– Concentration of nonskin model along grey line is

completely predictable from the design of perceptual color spaces

J. M. Rehg © 200739

Perspectives on Pattern Recognition

Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods:

Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …

J. M. Rehg © 200740

Statistical Perspective

Statistical Inference Approach Probability model p(C , x | ), where is vector of

parameters estimated from D using statistical inference Decision rule is derived from p(C , x | ) Two philosophical schools

– Frequentist Statistics– Bayesian Statistics

Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.

J. M. Rehg © 200741

Decision Theory Perspective

Three ways to obtain the decision rule f (x)Generative Modeling

Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages

– Use p(x) for novelty detection– Sample from p(x) to generate synthetic data and assess

model quality– Use p(C | x) to assess confidence in answer (reject region)– Easy to compose modules that output posterior probabilities

J. M. Rehg © 200742

Decision Rule

Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages

– The posterior is often much simpler than the likelihood function

– Posterior more directly related to the classification rule, may yield fewer prediction errors.

Date post:	30-Jan-2016
Category:	Documents
Upload:	sera
View:	29 times
Download:	0 times

Pattern Recognition Lecture 1 - Overview

Documents