ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.

ECSE 6610

Pattern Recognition

Professor Qiang Ji

Spring, 2011

Pattern Recognition Overview

UnknownClassifier/Regressor

Feature extraction: extract the most discriminative features to concisely represent the original data, typically involving dimensionality reduction

Training/Learning: learn a mapping function that maps input to output

Classification/regression: map the input to a discrete output value for classification and to continuous output value for regression.

Feature extractionTraining

Raw DataFeatures Output

Values

Training

TestingRaw Data

FeaturesOutputValues

TrainingClassification/Regression

Feature extraction Learned

Classifier/Regressor

Training

Testing

Pattern Recognition Overview (cont’d)

• Supervised learning• Both input (feature) and output (class labels)

are provided• Unsupervised learning-only input is given

• Clustering• Dimensionality reduction• Density estimation

• Semi-supervised learning-some input has output labels and others do not have

Examples of Pattern Recognition Applications

• Computer/Machine Vision object recognition, activity recognition, image segmentation, inspection

• Medical Imaging Cell classification

• Optical Character Recognition Machine or hand written character/digit recognition

• Brain Computer Interface Classify human brain states from EEG signals

• Speech Recognition Speaker recognition, speech understanding, language translation

• Robotics Obstacle detection, scene understanding, navigation

Computer Vision Example: Facial Expression Recognition

Machine Vision Example

Example: Handwritten Digit Recognition

8

Probability Calculus

P(X ˅ Y)=P(X)+P(Y) - P(X ˄Y)

U is the sample space

X is a subset of the outcome or an event

,i.e, X and Y are mutually exclusive

9

Probability Calculus (cont’d)

• Conditional independence

• The Chain Rule

Given three events A, B, C

)|()|()|,(

)|(),|(

|

ZYPZXPZYXP

orZXPZYXP

ZYX

)()|(),|(),,( CPCBPCBAPCBAP

The Rules of Probability

• Sum Rule

• Product Rule

B

Y

ABPBACPACP

rulesumlconditionaand

YPYXPXP

yieldsrulesproductandsumtheCombining

)|(),|()|(

)()|()(

Bayes’ Theorem

posterior likelihood × prior

13

Bayes Rule

• Based on definition of conditional probability• p(Ai|E) is posterior probability given evidence E

• p(Ai) is the prior probability

• P(E|Ai) is the likelihood of the evidence given Ai

• p(E) is the probability of the evidence

å==

==

iii

iiiii

))p(AA|p(E

))p(AA|p(E

p(E)

))p(AA|p(EE)|p(A

p(B)

A)p(A)|p(B

p(B)

B)p(A,B)|p(A

A1

A2 A3 A4

A5A6

E

Bayesian Rule (cont’d)

H

HEEPEHP

HEEPEHP

EEP

HEEPEHPEEHP

),|()|(

),|()|(

)|(

),|()|(),|(

121

121

12

12121

Assume E1 and E2 are independent given H, the above equation maybe written as

H

HEPEHP

HEPEHPEEHP

)|()|(

)|()|(),|(

21

2121

where is the prior and is the likelihood of H given E2)|( 1EHP )|( 2 HEP

15

A Simple ExampleConsider two related variables:1. Drug (D) with values y or n2. Test (T) with values +ve or –ve

And suppose we have the following probabilities:P(D = y) = 0.001P(T = +ve | D = y) = 0.8P(T = +ve | D = n) = 0.01

These probabilities are sufficient to define a joint probability distribution.

Suppose an athlete tests positive. What is the probability that he has taken the drug?

074.09990010001080

001080)()|()()|(

)()|(

....

..nDPnDveTPyDPyDveTP

yDPyDveTPve)y|TP(D

Expectation (or Mean)

• For discrete RV X

• For continuous RV X

• Conditional Expectation

16

x

xpxXE )()(

x

dxxpxXE )()(

x

dxyxpxyXE )|()|(

Expectations

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Variance • The variance of a RV X

• Standard deviation

• Covariance of RVs X and Y,

• Chebyshev inequality

18

)()(]))([()( 222 XEXEXEXEXVar

)(XVarX

2

1)|)((|

kkXEXP x

)()()())]())(([(2 YEXEXYEYEYXEXEXY

Independence

• If X and Y are independent, then

20

)()()(

)()()(

YVarXVarYXVar

YEXEYXE

Probability Densities

p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a non-decreasing function.

Transformed Densities

Markus Svensén

This figure was taken from Solution 1.4 in the web-edition of the solutions manual for PRML, available at http://research.microsoft.com/~cmbishop/PRML. A more thorough explanation of what the figure shows is provided in the text of the solution.

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

=m mean vector=S covariance matrix

Minimum Misclassification Rate

Two types of mistakes:False positive (type 1)False negative (type 2)

The above is called Bayes error. Minimum Bayes error is achieved at x0

Generative vs Discriminative

Generative approach: ModelUse Bayes’ theorem

Discriminative approach: Model directly

Date post:	27-Dec-2015
Category:	Documents
Upload:	stephen-kelley
View:	219 times
Download:	0 times

ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.

Documents