+ All Categories
Home > Documents > Pattern Recognition Lecture 1 - Overview

Pattern Recognition Lecture 1 - Overview

Date post: 30-Jan-2016
Category:
Upload: sera
View: 29 times
Download: 0 times
Share this document with a friend
Description:
Pattern Recognition Lecture 1 - Overview. Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007. Goal. Learn a function that maps features x to predictions C , given a dataset D = { C k , x k } Elements of the problem - PowerPoint PPT Presentation
Popular Tags:
35
Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007
Transcript
Page 1: Pattern Recognition Lecture 1 - Overview

Pattern Recognition

Lecture 1 - Overview

Jim Rehg

School of Interactive ComputingGeorgia Institute of TechnologyAtlanta, Georgia USA

June 12, 2007

Page 2: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20072

Goal

Learn a function that maps features x to predictions C, given a dataset D = {Ck , xk}

Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)

Page 3: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20073

Example: Skin Detection in Web Images

Images containing people are interestingMost images with people in them contain visible skinSkin can be detected in images based on its color.Goal: Automatic detection of “adult” images

DEC Cambridge Research Lab, 1998

Page 4: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20074

Physics of Skin Color

Skin color is due to melanin and hemoglobin.Hue (normalized color) of skin is largely invariant

across the human population.Saturation of skin color varies with concentration of

melanin and hemoglobin (e.g. lips).Detailed color models exist for melanoma

identification using calibrated illumination.But observed skin color will be effected by lighting,

image acquisition device, etc.

Page 5: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20075

Skin Classification Via Statistical Inference

Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with

Application to Skin Detection”, IJCV, 2001.Model color distribution in skin and nonskin cases

Estimate p(RGB | skin) and p(RBG | nonskin)Decision rule: f : RGB {“skin”, “nonskin”}

Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB)Data set D

12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl

1 billion hand-labeled pixels in training set

Page 6: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20076

Some Example Photos

Example skin images

Example non-skin images

Page 7: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20077

Manually Labeling Skin and Nonskin

Labeled skin pixels are segmented by hand:

Labeled nonskin pixels are easily obtained from images without people

Page 8: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20078

Skin Color Modeling Using Histograms

Feature space design Standard RGB color space - easily available, efficient

Histogram probability model

P(RBG | skin) P(RBG | nonskin)

Page 9: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 20079

Skin Color Histogram

Segmented skin regions produce a histogram in RGB space showing thedistribution of skin colors. Three views of the same skin histogram are shown:

Page 10: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200710

Non-Skin Color Histogram

Three views of the same non-skin histogram showing the distribution ofnon-skin colors:

Page 11: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200711

Decision Rule

Rule)(Bayes )(

)1()1|()|1(

RGBp

CpCRGBpRGBCp

Class labels: “skin” C=1 “nonskin” C=0

Equivalently:

otherwise 0

)|0C()|1C( 1)(

RGBpRGBpRGBf

1 )|0C(

)|1C(

RGBp

RGBp

><

f =1

f = 0

Page 12: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200712

Likelihood Ratio Test

1 )0()0C|(

)1()1C|(

CpRGBp

CpRGBp ><

f =1

f = 0

)1(

)0(

)0C|(

)1C|(

Cp

Cp

RGBp

RGBp ><

f =1

f = 0

The ratio of class priors is usually treated as a parameter(threshold) which is adjusted to trade-off between types of errors

Page 13: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200713

Skin Classifier Architecture

Input Image

P(RBG | skin)

P(RBG | nonskin)

)0C|(

)1C|(

RGBp

RGBp ><

f =1

f = 0

Output“skin”

Page 14: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200714

Measuring Classifier QualityGiven a testing set T = {Cj , xj} that was not used for

training, apply the classifier to obtain predictions

Testing set partitioned into four categories

},,1;)(ˆ{ NjxfC jj

N

jjjFN

N

jjjTN

N

jjjFP

N

jjjTP

CCIN

CCIN

CCIN

CCIN

1

1

1

1

)]0ˆ(&)1[( :Negatives False

)]0ˆ(&)0[( :Negatives True

)]1ˆ(&)0[( : PositivesFalse

)]1ˆ(&)1[( : PositivesTrue

Indicator functionfor boolean B:

other 0

true 1][

BBI

Page 15: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200715

Measuring Classifier Quality

N

jjN

N

jjP CINCIN

11

]0[ and ]1[ :Define

FPTNNFNTPP NNNNNN and Then

P

TPR N

Nd : RateDetection

A standard convention is to report

Fraction of positive examples classified correctly

N

FPR N

Nf : Rate PositiveFalse

Fraction of negative examples classified incorrectly

Page 16: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200716

Trading Off Types of Errors

Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect dR = 1 and fR = 1

Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct dR = 0 and fR = 0

)0C|(

)1C|(

RGBp

RGBp ><

f =1

f = 0

0

Page 17: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200717

ROC CurveD

etec

tion

Rat

e d

R

False Positive Rate fR

0 10.50.25 0.750

0.25

0.5

0.75

1

0

Each sample point on ROCcurve is obtained by scoringT with a particular

Generating ROCcurve does notrequire classifierretraining

Page 18: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200718

ROC CurveD

etec

tion

Rat

e d

R

False Positive Rate fR

0 10.50.25 0.750

0.25

0.5

0.75

1

0

A fair way to com- pare two classifiers is to show their ROC curves for the same T

ROC stands for“Receiver Oper-ating Characteristic”and was originallydeveloped for tuning radar receivers

Page 19: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200719

Scalar Measures of Classifier PerformanceD

etec

tion

Rat

e d

R

False Positive Rate fR

0 10.50.25 0.750

0.25

0.5

0.75

1

Equal Error Rate

Area under theROC curve

Page 20: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200720

ROC Curve Summary

ROC curve gives “application independent” measure of classifier performance

Performance reports based on a single point on the ROC curve are generally meaningless

Several possible scalar “summaries” Area under the ROC curve Equal error rate

Compute ROC by iterating over the values of Compute the detection and false positive rates on the

testing set for each value of and plot the resulting point.

Page 21: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200721

Example Results

Skin examples:

Nonskin examples:

Page 22: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200722

Skin Detector Performance

Extremely good results considering only color of single pixel is being used.

Best published results (at the time)

One of the largest datasets used in a vision model (nearly 1 billion labeled pixels).

False Positive Rate fR

Det

ecti

on R

ate

dR

But why does it work so well ???

Page 23: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200723

Analyzing the color distributions

2D color histogram for photos on the webprojected onto a slice through the 3Dhistogram: Surface plot of the 2D histogram:

Why does it work so well?

Page 24: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200724

Contour Plots

Full color model (includes skin and non-skin):

Page 25: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200725

Contour Plots Continued

Non-skin model: Skin model:

Skin color distribution is surprisingly well-separatedfrom the background distribution of color in web images

Page 26: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200727

Adult Image Detection

SkinDetector

Image

Observation: Adult images usually contain large areas of skin

Output of skin detector can be used to create feature vector for an image

Adult image classifier trained on feature vectors Exploring joint image/text analysis

SkinFeatures

Neural netClassifier

TextFeatures

ClassifierHTML

Adult?

Page 27: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200729

More Examples

Classified as notadult

Classified as not adult

Classified as not adult

Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin

Page 28: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200730

Performance of Adult Image Detector

Page 29: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200731

Adult Image Detection Results

Two sets of html pages collected.Crawl A: Adult sites (2365 pages, 11323 images).Crawl B: Non-adult sites (2692 pages, 13973 images).

image-based text-based combined “OR” detector detector detector

----------------- ------------- ------------------- % of adult images rated correctly (set A): 85.8% 84.9% 93.9%

% of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%

Page 30: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200732

Computational Cost Analysis

General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image = .078 sec

Skin Color Based Adult Image Detector Time to classify = .043 sec Implies 23 images/sec throughput

Page 31: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200738

Summary of Skin Detection Example

What are the factors that made skin detection successful?

Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable).

Low dimensionality makes adequate data collection feasible and classifier design a non-issue.

Intrinisic dimensions are clear a priori– Concentration of nonskin model along grey line is

completely predictable from the design of perceptual color spaces

Page 32: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200739

Perspectives on Pattern Recognition

Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods:

Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …

Page 33: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200740

Statistical Perspective

Statistical Inference Approach Probability model p(C , x | ), where is vector of

parameters estimated from D using statistical inference Decision rule is derived from p(C , x | ) Two philosophical schools

– Frequentist Statistics– Bayesian Statistics

Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.

Page 34: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200741

Decision Theory Perspective

Three ways to obtain the decision rule f (x)Generative Modeling

Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages

– Use p(x) for novelty detection– Sample from p(x) to generate synthetic data and assess

model quality– Use p(C | x) to assess confidence in answer (reject region)– Easy to compose modules that output posterior probabilities

Page 35: Pattern Recognition Lecture 1 - Overview

J. M. Rehg © 200742

Decision Rule

Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages

– The posterior is often much simpler than the likelihood function

– Posterior more directly related to the classification rule, may yield fewer prediction errors.


Recommended