Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | stephen-kelley |
View: | 219 times |
Download: | 0 times |
ECSE 6610
Pattern Recognition
Professor Qiang Ji
Spring, 2011
Pattern Recognition Overview
UnknownClassifier/Regressor
Feature extraction: extract the most discriminative features to concisely represent the original data, typically involving dimensionality reduction
Training/Learning: learn a mapping function that maps input to output
Classification/regression: map the input to a discrete output value for classification and to continuous output value for regression.
Feature extractionTraining
Raw DataFeatures Output
Values
Training
TestingRaw Data
FeaturesOutputValues
TrainingClassification/Regression
Feature extraction Learned
Classifier/Regressor
Training
Testing
Pattern Recognition Overview (cont’d)
• Supervised learning• Both input (feature) and output (class labels)
are provided• Unsupervised learning-only input is given
• Clustering• Dimensionality reduction• Density estimation
• Semi-supervised learning-some input has output labels and others do not have
Examples of Pattern Recognition Applications
• Computer/Machine Vision object recognition, activity recognition, image segmentation, inspection
• Medical Imaging Cell classification
• Optical Character Recognition Machine or hand written character/digit recognition
• Brain Computer Interface Classify human brain states from EEG signals
• Speech Recognition Speaker recognition, speech understanding, language translation
• Robotics Obstacle detection, scene understanding, navigation
Computer Vision Example: Facial Expression Recognition
Machine Vision Example
Example: Handwritten Digit Recognition
8
Probability Calculus
P(X ˅ Y)=P(X)+P(Y) - P(X ˄Y)
U is the sample space
X is a subset of the outcome or an event
,i.e, X and Y are mutually exclusive
9
Probability Calculus (cont’d)
• Conditional independence
• The Chain Rule
Given three events A, B, C
)|()|()|,(
)|(),|(
|
ZYPZXPZYXP
orZXPZYXP
ZYX
)()|(),|(),,( CPCBPCBAPCBAP
The Rules of Probability
• Sum Rule
• Product Rule
B
Y
ABPBACPACP
rulesumlconditionaand
YPYXPXP
yieldsrulesproductandsumtheCombining
)|(),|()|(
)()|()(
Bayes’ Theorem
posterior likelihood × prior
13
Bayes Rule
• Based on definition of conditional probability• p(Ai|E) is posterior probability given evidence E
• p(Ai) is the prior probability
• P(E|Ai) is the likelihood of the evidence given Ai
• p(E) is the probability of the evidence
å==
==
iii
iiiii
))p(AA|p(E
))p(AA|p(E
p(E)
))p(AA|p(EE)|p(A
p(B)
A)p(A)|p(B
p(B)
B)p(A,B)|p(A
A1
A2 A3 A4
A5A6
E
Bayesian Rule (cont’d)
H
HEEPEHP
HEEPEHP
EEP
HEEPEHPEEHP
),|()|(
),|()|(
)|(
),|()|(),|(
121
121
12
12121
Assume E1 and E2 are independent given H, the above equation maybe written as
H
HEPEHP
HEPEHPEEHP
)|()|(
)|()|(),|(
21
2121
where is the prior and is the likelihood of H given E2)|( 1EHP )|( 2 HEP
15
A Simple ExampleConsider two related variables:1. Drug (D) with values y or n2. Test (T) with values +ve or –ve
And suppose we have the following probabilities:P(D = y) = 0.001P(T = +ve | D = y) = 0.8P(T = +ve | D = n) = 0.01
These probabilities are sufficient to define a joint probability distribution.
Suppose an athlete tests positive. What is the probability that he has taken the drug?
074.09990010001080
001080)()|()()|(
)()|(
....
..nDPnDveTPyDPyDveTP
yDPyDveTPve)y|TP(D
Expectation (or Mean)
• For discrete RV X
• For continuous RV X
• Conditional Expectation
16
x
xpxXE )()(
x
dxxpxXE )()(
x
dxyxpxyXE )|()|(
Expectations
Conditional Expectation(discrete)
Approximate Expectation(discrete and continuous)
Variance • The variance of a RV X
• Standard deviation
• Covariance of RVs X and Y,
• Chebyshev inequality
18
)()(]))([()( 222 XEXEXEXEXVar
)(XVarX
2
1)|)((|
kkXEXP x
)()()())]())(([(2 YEXEXYEYEYXEXEXY
Independence
• If X and Y are independent, then
20
)()()(
)()()(
YVarXVarYXVar
YEXEYXE
Probability Densities
p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a non-decreasing function.
Transformed Densities
The Gaussian Distribution
Gaussian Mean and Variance
The Multivariate Gaussian
=m mean vector=S covariance matrix
Minimum Misclassification Rate
Two types of mistakes:False positive (type 1)False negative (type 2)
The above is called Bayes error. Minimum Bayes error is achieved at x0
Generative vs Discriminative
Generative approach: ModelUse Bayes’ theorem
Discriminative approach: Model directly