ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Memorial University of Newfoundland
Pattern RecognitionLecture 3
May 9, 2006
http://www.engr.mun.ca/~charlesr
Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM
EN-3026
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
9881 Project Deliverable #1
• Report topic due tonight via email!
• Two to three sentences regarding the nature of the
problem you will address in your project
• You will be penalized for missing deadlines
2
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Recap - Univariate Normal Distribution
3
Week 2 Graphics
Charles Robertson
May 7, 2005
Review
p(x) = p(x1, x2, ..., xn)
mean ! E[x] = µ
covariance ! E[(x" µ)(x" µ)T ] = !
!ij = E[(xi " µi)(xj " µj)]
Univariate normal distribution
p(x) =1#2"!
e!12 (x!µ
! )2
Multivariatep(x) =
1
(2")n2 |!|
12
e!12 (x!µ)T !!1(x!µ)
1
x
2.5% 2.5%
!
p(x)
µ + ! µ + 2!µ - !µ - 2! µ
FIGURE 2.7. A univariate normal distribution has roughly 95% of its area in the range|x ! µ| " 2! , as shown. The peak of the distribution has value p(µ) = 1/
#2"! . From:
Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyrightc$ 2001 by John Wiley & Sons, Inc.
Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright 2001 by John Wiley & Sons, Inc.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Recap - Multivariate Normal Distributions
4
Week 2 Graphics
Charles Robertson
May 6, 2006
1 Review
p(x) = p(x1, x2, ..., xn)
mean ! E[x] = µ
covariance ! E[(x" µ)(x" µ)T ] = !
!ij = E[(xi " µi)(xj " µj)]
Univariate normal distribution
p(x) =1#2"!
e!12 ( x!µ
! )2
Multivariate
p(x) =1
(2")n2 |!| 1
2e!
12 (x!µ)T !!1(x!µ)
1
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Transformations of Random Variables
5
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Distance Based Classification
• Distance based classification is the most common
type of pattern recognition technique
• Concepts are a basis for other classification
techniques
6
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
C2
C1
x1
x2
x
7
Distance Based Classification
?
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Distance Based Classification
• First we will look at choosing a class prototype
• A prototype is a sample or pattern which represents the
class
• Then we will look at how to calculate the distance
from a new pattern that we are trying to classify to the
class using the prototype
8
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Distance Based Classification
We use a pattern-to-class distance measure:
Week 2 Graphics
Charles RobertsonMay 8, 2005
1 Distance Based Classification
We use a pattern-to-class distance measure:
x!C1 i! d(x,C1) < d(x,C2)
We use a class prototype zi (a pattern):
d(x,C1) = d(x, z1)
During training have labeled samples such that
C1 = {xi|xi!C1; i = 1, 2, ..., N1}
C2 = {xj |xj!C2; i = 1, 2, ..., N2}
Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we
require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).
Consider a 2 dimensional example
Consider two well separated classes (very idealized compared to most practical problems).
2 Choosing a prototype
1. Sample Mean:
mi =1Ni
!
x!Ci
x
1
distance from x to C1
To find the distance, use a class prototype (pattern):
Week 2 Graphics
Charles RobertsonMay 8, 2005
1 Distance Based Classification
We use a pattern-to-class distance measure:
x!C1 i! d(x,C1) < d(x,C2)
We use a class prototype zi (a pattern):
d(x,C1) = d(x, z1)
During training have labeled samples such that
C1 = {xi|xi!C1; i = 1, 2, ..., N1}
C2 = {xj |xj!C2; i = 1, 2, ..., N2}
Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we
require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).
Consider a 2 dimensional example
Consider two well separated classes (very idealized compared to most practical problems).
2 Choosing a prototype
1. Sample Mean:
mi =1Ni
!
x!Ci
x
1
Week 2 Graphics
Charles RobertsonMay 8, 2005
1 Distance Based Classification
We use a pattern-to-class distance measure:
x!C1 i! d(x,C1) < d(x,C2)
We use a class prototype zi (a pattern):
d(x,C1) = d(x, z1)
During training have labeled samples such that
C1 = {xi|xi!C1; i = 1, 2, ..., N1}
C2 = {xj |xj!C2; i = 1, 2, ..., N2}
Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we
require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).
Consider a 2 dimensional example
Consider two well separated classes (very idealized compared to most practical problems).
2 Choosing a prototype
1. Sample Mean:
mi =1Ni
!
x!Ci
x
1
9
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
During training we have labeled samples, such that
Basic rule of thumb - require that N > 10d
Week 2 Graphics
Charles RobertsonMay 8, 2005
1 Distance Based Classification
We use a pattern-to-class distance measure:
x!C1 i! d(x,C1) < d(x,C2)
We use a class prototype zi (a pattern):
d(x,C1) = d(x, z1)
During training have labeled samples such that
C1 = {xi|xi!C1; i = 1, 2, ..., N1}
C2 = {xj |xj!C2; i = 1, 2, ..., N2}
Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we
require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).
Consider a 2 dimensional example
Consider two well separated classes (very idealized compared to most practical problems).
2 Choosing a prototype
1. Sample Mean:
mi =1Ni
!
x!Ci
x
1
Week 2 Graphics
Charles RobertsonMay 8, 2005
1 Distance Based Classification
We use a pattern-to-class distance measure:
x!C1 i! d(x,C1) < d(x,C2)
We use a class prototype zi (a pattern):
d(x,C1) = d(x, z1)
During training have labeled samples such that
C1 = {xi|xi!C1; i = 1, 2, ..., N1}
C2 = {xj |xj!C2; i = 1, 2, ..., N2}
Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we
require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).
Consider a 2 dimensional example
Consider two well separated classes (very idealized compared to most practical problems).
2 Choosing a prototype
1. Sample Mean:
mi =1Ni
!
x!Ci
x
1
(and features!)
10
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Consider a 2 feature example (d=2) with two well-separated classes. This is very idealized comparedwith real practical problems.
x1
x2
C1
C2
11
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Choosing a Prototype
1. Sample Mean
Week 2 Graphics
Charles RobertsonMay 8, 2005
1 Distance Based Classification
We use a pattern-to-class distance measure:
x!C1 i! d(x,C1) < d(x,C2)
We use a class prototype zi (a pattern):
d(x,C1) = d(x, z1)
During training have labeled samples such that
C1 = {xi|xi!C1; i = 1, 2, ..., N1}
C2 = {xj |xj!C2; i = 1, 2, ..., N2}
Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we
require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).
Consider a 2 dimensional example
Consider two well separated classes (very idealized compared to most practical problems).
2 Choosing a prototype
1. Sample Mean:
mi =1Ni
!
x!Ci
x
1
For class Ci
The advantage of the mean is that it minimizes the representation error of the class.
The mean probably does not correspond to the location of any collected sample.
Ci
12
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Minimizing representation error2. CHOOSING A PROTOTYPE 2
The mean minimizes representation error
err =!
x!Ci
|x! zi|2
=!
x!Ci
(x! zi)T (x! zi)
!err
!zi
= !2!
x!Ci
(x! zi)
To minimize, set it to 0.
!
x!Ci
(x! zi) = 0
!
x!Ci
(x)!Nizi = 0
zi =
!
x!Ci
x
Ni
Note that the mean square representation error is
MSE =1Ni
!
x!Ci
|x!mi|2
=1Ni
!
x!Ci
d!
j=1
(xj !mij)2
=1Ni
d!
j=1
!
x!Ci
(xj !mij)2
=1Ni
d!
j=1
NiSj
where Sj comes from the sample covariance matrix.
What is the sample covariance matrix?
2. CHOOSING A PROTOTYPE 2
The mean minimizes representation error
err =!
x!Ci
|x! zi|2
=!
x!Ci
(x! zi)T (x! zi)
!err
!zi
= !2!
x!Ci
(x! zi)
To minimize, set it to 0.
!
x!Ci
(x! zi) = 0
!
x!Ci
(x)!Nizi = 0
zi =
!
x!Ci
x
Ni
Note that the mean square representation error is
MSE =1Ni
!
x!Ci
|x!mi|2
=1Ni
!
x!Ci
d!
j=1
(xj !mij)2
=1Ni
d!
j=1
!
x!Ci
(xj !mij)2
=1Ni
d!
j=1
NiSj
where Sj comes from the sample covariance matrix.
What is the sample covariance matrix?
2. CHOOSING A PROTOTYPE 2
The mean minimizes representation error
err =!
x!Ci
|x! zi|2
=!
x!Ci
(x! zi)T (x! zi)
!err
!zi
= !2!
x!Ci
(x! zi)
To minimize, set it to 0.
!
x!Ci
(x! zi) = 0
!
x!Ci
(x)!Nizi = 0
zi =
!
x!Ci
x
Ni
Note that the mean square representation error is
MSE =1Ni
!
x!Ci
|x!mi|2
=1Ni
!
x!Ci
d!
j=1
(xj !mij)2
=1Ni
d!
j=1
!
x!Ci
(xj !mij)2
=1Ni
d!
j=1
NiSj
where Sj comes from the sample covariance matrix.
What is the sample covariance matrix?
2. CHOOSING A PROTOTYPE 2
The mean minimizes representation error
err =!
x!Ci
|x! zi|2
=!
x!Ci
(x! zi)T (x! zi)
!err
!zi
= !2!
x!Ci
(x! zi)
To minimize, set it to 0.
!
x!Ci
(x! zi) = 0
!
x!Ci
(x)!Nizi = 0
zi =
!
x!Ci
x
Ni
Note that the mean square representation error is
MSE =1Ni
!
x!Ci
|x!mi|2
=1Ni
!
x!Ci
d!
j=1
(xj !mij)2
=1Ni
d!
j=1
!
x!Ci
(xj !mij)2
=1Ni
d!
j=1
NiSj
where Sj comes from the sample covariance matrix.
What is the sample covariance matrix?
2. CHOOSING A PROTOTYPE 2
The mean minimizes representation error
err =!
x!Ci
|x! zi|2
=!
x!Ci
(x! zi)T (x! zi)
!err
!zi
= !2!
x!Ci
(x! zi)
To minimize, set it to 0.
!
x!Ci
(x! zi) = 0
!
x!Ci
(x)!Nizi = 0
zi =
!
x!Ci
x
Ni
Note that the mean square representation error is
MSE =1Ni
!
x!Ci
|x!mi|2
=1Ni
!
x!Ci
d!
j=1
(xj !mij)2
=1Ni
d!
j=1
!
x!Ci
(xj !mij)2
=1Ni
d!
j=1
NiSj
where Sj comes from the sample covariance matrix.
What is the sample covariance matrix?
2. CHOOSING A PROTOTYPE 2
The mean minimizes representation error
err =!
x!Ci
|x! zi|2
=!
x!Ci
(x! zi)T (x! zi)
!err
!zi
= !2!
x!Ci
(x! zi)
To minimize, set it to 0.
!
x!Ci
(x! zi) = 0
!
x!Ci
(x)!Nizi = 0
zi =
!
x!Ci
x
Ni
Note that the mean square representation error is
MSE =1Ni
!
x!Ci
|x!mi|2
=1Ni
!
x!Ci
d!
j=1
(xj !mij)2
=1Ni
d!
j=1
!
x!Ci
(xj !mij)2
=1Ni
d!
j=1
NiSj
where Sj comes from the sample covariance matrix.
What is the sample covariance matrix?
2. CHOOSING A PROTOTYPE 2
The mean minimizes representation error
err =!
x!Ci
|x! zi|2
=!
x!Ci
(x! zi)T (x! zi)
!err
!zi
= !2!
x!Ci
(x! zi)
To minimize, set it to 0.
!
x!Ci
(x! zi) = 0
!
x!Ci
(x)!Nizi = 0
zi =
!
x!Ci
x
Ni
Note that the mean square representation error is
MSE =1Ni
!
x!Ci
|x!mi|2
=1Ni
!
x!Ci
d!
j=1
(xj !mij)2
=1Ni
d!
j=1
!
x!Ci
(xj !mij)2
=1Ni
d!
j=1
NiSj
where Sj comes from the sample covariance matrix.
What is the sample covariance matrix?
which is the mean.
13
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
The sample covariance matrix is
3. DISTANCE MEASURES 3
S =1N
N!
k=1
(xk !m)(xk !m)T
or
S =1N
!
x!Ci
(x!m)(x!m)T
with i,j entry
sij =1N
N!
k=1
(xki !mi)(xkj !mj) =1N
N!
k=1
(xkixkj)!mimj
and
sj " sjj = the feature variances
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3. DISTANCE MEASURES 3
S =1N
N!
k=1
(xk !m)(xk !m)T
or
S =1N
!
x!Ci
(x!m)(x!m)T
with i,j entry
sij =1N
N!
k=1
(xki !mi)(xkj !mj) =1N
N!
k=1
(xkixkj)!mimj
and
sj " sjj = the feature variances
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
with ith,jth entry:
14
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Choosing a Prototype
2. Most Typical Sample
The sample which is most similar to the the class mean.
Choose ! ! ! ! such that
! ! ! ! ! ! is minimized.
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) = (n!
j=1
(xj ! zij)2)1/2
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) = (n!
j=1
(wj(xj ! zj))2)1/2
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) = (n!
j=1
(xj ! zij)2)1/2
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) = (n!
j=1
(wj(xj ! zj))2)1/2
Ci
15
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Choosing a Prototype
3. Nearest Neighbour
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) = (n!
j=1
(xj ! zij)2)1/2
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) = (n!
j=1
(wj(xj ! zj))2)1/2
Here x is a sample from the class, and y is the new samplewe are trying to classify. Thus the prototype depends on thelocation of the pattern we are classifying.
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) = (n!
j=1
(xj ! zij)2)1/2
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) = (n!
j=1
(wj(xj ! zj))2)1/2
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) = (n!
j=1
(xj ! zij)2)1/2
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) = (n!
j=1
(wj(xj ! zj))2)1/2
Ci
y
Nearest neighbour prototypes are sensitiveto noise and outliers in the training set.
16
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Choosing a Prototype
4. k-Nearest Neighbours
The pattern y is classified in the class of its k nearestneighbours from the training samples. The chosen distancedetermines how ‘near’ is defined.
This gives some protection against noise, but is more computationally expensive.
Ciy
Cjk = 3
17
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Summary
18
• First step of distance-based classification is to choose
a prototype
• Options include:
• Sample mean
• Most typical sample
• Nearest neighbour
• k-Nearest neighbours
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Distance Measures
Most familiar distance metric is the Euclidean distance:
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) =
"
#d!
j=1
(xj ! zij)2$
%1/2
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) = (d!
j=1
(wj(xj ! zj))2)1/2
There are many possibilities for distance measurements.Another example is the Manhattan distance:
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) =
"
#d!
j=1
(xj ! zij)2$
%1/2
Manhattan distance:
dM (x, zi) =d!
j=1
|xj ! zij |
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
19
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Anything that fulfills the following four properties can be a metric:
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) =
"
#d!
j=1
(xj ! zij)2$
%1/2
Manhattan distance:
dM (x, zi) =d!
j=1
|xj ! zij |
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
3. DISTANCE MEASURES 3
**TODO**
S =1N
d!
j=1
**TODO**
2. Most typical Sample
The sample in the set which is closes to the center of the class. Choose zi = x such that
x!Ci, d(x,mi) is minimized.
3. Nearest Neighbours
Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from
the original training set (all classes).
If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.
4. k-Nearest Neighbours
y is classified in the class of its k nearest neighbours from the training samples. Gives some
protection against noise, but is computationally intensive.
3 Distance Measures
Most familiar distance measurement is the Euclidean distance:
dE(x, zi) =
"
#d!
j=1
(xj ! zij)2$
%1/2
Manhattan distance:
dM (x, zi) =d!
j=1
|xj ! zij |
Many possibilities for distance measurements. Anything that fulfills following constraints can
be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry
d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle in-equality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
20
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Clearly the Euclidean distance is a metric, but so is a moregeneral weighted metric:
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
21
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Minimum Euclidean Distance (MED) Classifier
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
Given classes C1 and C2 with prototypes z1 and z2 respectively:
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
Equivalently:
22
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
C2C1
x1
x2
m1
m2
y
dE(y,m1) < dE(y,m2)
Classify pattern y to class C1
23
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Decision Boundaries
Given a prototype and a distance metric, it is possibleto find the decision boundary between classes.
2 4 6 8 1014
15
16
17
18
19
20
21
22width
lightness
salmon sea bass
FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The darkline could serve as a decision boundary of our classifier. Overall classification error onthe data shown is lower than if we use only one feature as in Fig. 1.3, but there willstill be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, PatternClassification. Copyright c! 2001 by John Wiley & Sons, Inc.
2 4 6 8 1014
15
16
17
18
19
20
21
22width
lightness
salmon sea bass
FIGURE 1.6. The decision boundary shown might represent the optimal tradeoff be-tween performance on the training set and simplicity of classifier, thereby giving thehighest accuracy on new patterns. From: Richard O. Duda, Peter E. Hart, and David G.Stork, Pattern Classification. Copyright c! 2001 by John Wiley & Sons, Inc.
Figures 1.4 and 1.6 From Pattern Classification by Duda, Hart, and Stork
24
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
For the MED, the decision boundary between the two classes isa straight line.
z1
z2
In general the decision boundary for the MED is a hyperplanewhich is a perpendicular bisector of the line joining theclass prototypes.
25
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
C2C1
x1
x2
m1
m2
26
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
MED Example
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4
3. Symmetry d(x, z) = d(z, x)
4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)
Clearly the euclidean distance is a metric, but so is a more general weighted metric:
dw(x, z) =
!
"d#
j=1
(wj(xj " zj))2$
%1/2
where the di!erence in jth feature is weighted by wj .
4 Minimum Euclidean Distance (MED) Classifier
x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)
What is the decision boundary?
The decision boundary is in general a hyperplane which is a perpendicular bisector of the line
joining the class prototypes (z1 and z2).
-¿ Prove in exercise ’3’ in the assignment.
Example
C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}
Classify point (4,3) using means as prototypes and MED metric.
-¿ Do on board.
Can we fix this by using a di!erent kind of prototype?
-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2
Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both
classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose
nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of
extra computation required
27
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
In the example, using the MED and the class mean as a prototypedoes not give a good classification.
Can we fix this by using a different kind of prototype?
Nearest neighbour:! Classifies pattern (4,2) in class C2.
k-Nearest neighbour:! k=1 ! same as nearest neighbour! k=2! a sample from both classes, needs more neighbours...! k=3! two samples are same distance!! k=4! now two samples from each class
! ! kNN can require a lot of computation
28
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
In general, the MED is sub-optimal for classes with unequalfeature variances, even if the features are uncorrelated.
One solution is to modify the metric!
29
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Equivariance Feature Weighting
Week 2 Graphics
Charles Robertson
May 8, 2005
1 Equivariance Feature Weighting
d2w(x, z) =
d!
j=1
(wj(xj ! zj))2
If the variance of feature j is !2j , then the variance of wjxj is
E[(wj(xj ! µj))2] = w2j !2
j
Thus choose wj = 1!j
to scale the features.
But we only have the sample mean and variance for the class, not the real mean and variance
of the class population.
Choose wj = 1sj
, where
s2j =
1N
!
C
(xj !mj)2
which is the average squared di!erence from the sample mean.
The metric then is
d2w(x, z) =
n!
j=1
(xj ! zj
sj)2
which is like measuring the distance using standard deviation units (on a per class basis).
This weighting is equivalent to a transformation of features
1
Weight each feature when calculating the distance.
Week 2 Graphics
Charles Robertson
May 8, 2005
1 Equivariance Feature Weighting
d2w(x, z) =
d!
j=1
(wj(xj ! zj))2
If the variance of feature j is !2j , then the variance of wjxj is
E[(wj(xj ! µj))2] = w2j !2
j
Thus choose wj = 1!j
to scale the features.
But we only have the sample mean and variance for the class, not the real mean and variance
of the class population.
Choose wj = 1sj
, where
s2j =
1N
!
C
(xj !mj)2
which is the average squared di!erence from the sample mean.
The metric then is
d2w(x, z) =
n!
j=1
(xj ! zj
sj)2
which is like measuring the distance using standard deviation units (on a per class basis).
This weighting is equivalent to a transformation of features
1
Week 2 Graphics
Charles Robertson
May 8, 2005
1 Equivariance Feature Weighting
d2w(x, z) =
d!
j=1
(wj(xj ! zj))2
If the variance of feature j is !2j , then the variance of wjxj is
E[(wj(xj ! µj))2] = w2j !2
j
Thus choose wj = 1!j
to scale the features.
But we only have the sample mean and variance for the class, not the real mean and variance
of the class population.
Choose wj = 1sj
, where
s2j =
1N
!
C
(xj !mj)2
which is the average squared di!erence from the sample mean.
The metric then is
d2w(x, z) =
n!
j=1
(xj ! zj
sj)2
which is like measuring the distance using standard deviation units (on a per class basis).
This weighting is equivalent to a transformation of features
1
Week 2 Graphics
Charles Robertson
May 8, 2005
1 Equivariance Feature Weighting
d2w(x, z) =
d!
j=1
(wj(xj ! zj))2
If the variance of feature j is !2j , then the variance of wjxj is
E[(wj(xj ! µj))2] = w2j !2
j
Thus choose wj = 1!j
to scale the features.
But we only have the sample mean and variance for the class, not the real mean and variance
of the class population.
Choose wj = 1sj
, where
s2j =
1N
!
C
(xj !mj)2
which is the average squared di!erence from the sample mean.
The metric then is
d2w(x, z) =
n!
j=1
(xj ! zj
sj)2
which is like measuring the distance using standard deviation units (on a per class basis).
This weighting is equivalent to a transformation of features
1
30
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Equivariance Feature Weighting
But we only have the sample mean and variance, not thereal mean and variance of the class population.
Week 2 Graphics
Charles Robertson
May 8, 2005
1 Equivariance Feature Weighting
d2w(x, z) =
d!
j=1
(wj(xj ! zj))2
If the variance of feature j is !2j , then the variance of wjxj is
E[(wj(xj ! µj))2] = w2j !2
j
Thus choose wj = 1!j
to scale the features.
But we only have the sample mean and variance for the class, not the real mean and variance
of the class population.
Choose wj = 1sj
, where
s2j =
1N
!
C
(xj !mj)2
which is the average squared di!erence from the sample mean.
The metric then is
d2w(x, z) =
n!
j=1
(xj ! zj
sj)2
which is like measuring the distance using standard deviation units (on a per class basis).
This weighting is equivalent to a transformation of features
1
Week 2 Graphics
Charles Robertson
May 8, 2005
1 Equivariance Feature Weighting
d2w(x, z) =
d!
j=1
(wj(xj ! zj))2
If the variance of feature j is !2j , then the variance of wjxj is
E[(wj(xj ! µj))2] = w2j !2
j
Thus choose wj = 1!j
to scale the features.
But we only have the sample mean and variance for the class, not the real mean and variance
of the class population.
Choose wj = 1sj
, where
s2j =
1N
!
C
(xj !mj)2
which is the average squared di!erence from the sample mean.
The metric then is
d2w(x, z) =
n!
j=1
(xj ! zj
sj)2
which is like measuring the distance using standard deviation units (on a per class basis).
This weighting is equivalent to a transformation of features
1
which is the average squared difference from the sample mean.
Week 2 Graphics
Charles Robertson
May 8, 2005
1 Equivariance Feature Weighting
d2w(x, z) =
d!
j=1
(wj(xj ! zj))2
If the variance of feature j is !2j , then the variance of wjxj is
E[(wj(xj ! µj))2] = w2j !2
j
Thus choose wj = 1!j
to scale the features.
But we only have the sample mean and variance for the class, not the real mean and variance
of the class population.
Choose wj = 1sj
, where
s2j =
1N
!
C
(xj !mj)2
which is the average squared di!erence from the sample mean.
The metric then is
d2w(x, z) =
n!
j=1
(xj ! zj
sj)2
which is like measuring the distance using standard deviation units (on a per class basis).
This weighting is equivalent to a transformation of features
1
So the metric then is
which is like measuring the distance using standard deviation units.
31
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
This weighting is equivalent to a transformation of features.1. EQUIVARIANCE FEATURE WEIGHTING 2
x! = Wx =
!
"""#
w1
w2 00 . . .
wn
$
%%%&
!
"""#
x1
x2...
xn
$
%%%&
New features are just scaled versions of the original ones, all unit variance.
From the example:
For C1, s21 = 1
4 [(0! 4)2 + (4! 4)2 + (4! 4)2 + (8! 4)2] = 8
And s22 = 1
4 [(0)2 + (1)2 + (1)2 + (0)2] = 12
The class C2 has the same variances as for C1
d2w(x,m1) =
(4! 4)2
8+
(3! 0)2
1/2= 18
d2w(x,m2) =
(4! 8)2
8+
(3! 2)2
1/2= 4
! x!C2
This is only good if the features are uncorrelated.
New features are just scaled versions of the original ones,all in unit variance.
Back to example...
32
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 33
Summary
• Euclidean distance is a common distance measure,
but not the only one.
• Metrics must meet 4 constraints: identity, non-
negative, symmetric, triangle inequality
• Between classes there exist decision boundaries
• Minimum Euclidean Distance is not always a good
classifier
• Weighting features by the inverse of the sample
variance gives better classification, but only good if
the features are uncorrelated