Chapter 7
Pattern ClassificationAlgorithms for Face Recognition
7.1 Introduction
The best pattern recognizers in most instances are human beings. Yet we do
not completely understand how the brain recognize patterns. Pattern recog-
nition is the study of how machines can observe the environment, learn to
distinguish pattern of interest from their background, and make sound and
reasonable decisions about the categories of the patterns. Automatic (ma-
chine) recognition, description, classification and grouping of patterns are im-
portant problems in a variety of engineering and scientific disciplines. Pattern
recognition can be viewed as the categorization of input data into identifiable
classes via the extraction of significant features or attributes of the data. Duda
and Hart [Duda & Hart 1973], [Duda et al. 2001] define it as a field concerned
with machine recognition of meaningful regularities in noisy or complex en-
vironment. It encompasses a wide range of information processing problems
of great practical significance from pattern recognition of simple patterns like
character patterns and speech patterns, to complex problems like human face
recognition and medical diagnosis. Today, pattern recognition is an integral
part of most intelligent systems built for decision making. Normally the pat-
129
7. Pattern Classification Algorithms for Face Recognition
tern recognition processes make use of one of the following two classification
strategies.
i. Supervised classification (e.g., discriminant analysis) in which the input pat-
tern is identified as a member of a predefined class.
ii. Unsupervised classification (e.g., clustering and Principal Component Anal-
ysis) in which the pattern is assigned to a hitherto unknown class.
In the present study various recognition experiments are conducted us-
ing the different pattern recognition algorithms in order to identify the cred-
ibility of the feature parameters proposed. The State Space Point Distribu-
tion(SSPD) features extracted from the gray-scale images of the human faces
as explained in chapter 5, ALR Feature Vector derived from the face im-
ages as discussed in chapter 6 are used as parameters for recognition study.
The well-known approaches that are widely used to solve pattern recognition
problems including clustering technique (c-Means algorithm), statistical pat-
tern classifiers (k-Nearest Neighbour classifier and Bayesian classifier), and
connectionist approach (Artificial Neural Networks) are used for recognizing
human face patterns. c-Means clustering technique is based on unsupervised
learning approach. The k-NN classifier, Bayesian and artificial neural network
work on the basis of supervised learning strategy.
This chapter is organized in three sections. Section 7.2 presents the human
face recognition experiments conducted using Cluster Analysis. The section 7.3
deals with recognition experiments conducted using statistical pattern recogni-
tion strategies - k-NN and Bayesian classifiers. The section 7.4 describes the
Artificial Neural Network architecture and the simulation experiments con-
ducted for the recognition of human face patterns along with the performance
comparisons of various classifiers and proposed parameters.
130
7. Pattern Classification Algorithms for Face Recognition
7.2 Cluster analysis for pattern recognition
In real life pattern recognition tasks, we handle a huge amount of information
that are perceived. Here processing every piece of information as a single entity
would be impossible. Hence we tend to categorize entities into clusters, which
are characterized by common attributes of the entities it contains and hence
the huge amount of information contained in any relevant process is reduced.
Some of the common definitions proposed for a cluster are given below.
1. A cluster is a set of entities that are alike, and entities from different
clusters are not alike.
2. A cluster is an aggregation of points in the test space such that the
distance between any two points in the cluster is less than the distance
between any point in the cluster and any point not in it.
3. Clusters may be described as connected regions of a p-dimensional space
containing a relatively high density of points, separated from other such
regions by a region containing a relatively low density of points.
Clustering is a major tool used in pattern recognition processes generally
for data reduction, hypothesis generation, hypothesis testing and prediction
based on grouping. In several cases, the amount of data available in a problem
can be very large and, as a consequence, its effective processing becomes very
demanding. In this context data reduction by the help of cluster analysis
can be used in order to group data into a number of reduced representative
clusters. Then each cluster can be processed as a single entity. In some other
applications cluster analysis can be used to infer some hypothesis concerning
the nature of the data. These hypotheses must then be verified using other
131
7. Pattern Classification Algorithms for Face Recognition
data sets and in this context, cluster analysis is used for the verification of the
validity of a specific hypothesis.
Another important application of cluster analysis is the prediction based
on grouping. In this case, cluster analysis is applied to the available data set,
and the resulting clusters are characterized based on the characteristics of the
patterns by which they are formed. Consequently, if we are given an unknown
pattern, we can determine the cluster to which it is more likely to belong and
we characterize it based on the characteristics of the respective cluster. In
the present study, we are interested in applying cluster analysis for prediction
based on the grouping using c-Means clustering technique for the recognition
of human face image patterns. The implementation details and experimental
results using this technique are explained in the following section.
7.2.1 c-Means clustering for face recognition
The c-Means algorithm is one of the most simple and well-known clustering
techniques. This has been applied to variety of pattern recognition problems.
It is based on the minimization of an objective function, which is defined as
the sum of the squared distances from all points in a cluster domain to the
cluster centre. Determining the prototypes or cluster centers is a major task
in designing a classifier based on clustering. This is normally achieved on the
basis of minimum distance approach. Prior to designing pattern clustering
algorithms we must define a similarity measure by which we decide whether
or not two patterns x and y are members of the same cluster. A similarity
measure δ(x,y) is usually defined, so that the principle lim δ(x,y) = 0 as
x → y hold. This is the case for example, if the patterns are in Rn and we
define
δ(x,y) = ∥x− y∥2
132
7. Pattern Classification Algorithms for Face Recognition
The c-Means algorithm partitions a collection of n vectors xj, where j =
1, . . . , n intom groupsGi, i = 1, . . . ,m and finds cluster centers ci, i = 1, . . . ,m
corresponding to each group such that a cost function of dissimilarity (or
distance) measure is minimized. A generic distance function d(xk, ci) can
be applied for vector xk in group i; the corresponding cost function is thus
expressed as
J =m∑i=1
Ji =m∑i=1
( ∑k,xk∈Gi
d(xk − ci)
)(7.2.1)
In this work the Euclidean distance is chosen as the dissimilarity measure
between a vector xk in group Gi and the corresponding cluster centre ci. Here
the cost function is defined by
J =m∑i=1
Ji =m∑i=1
( ∑k,xk∈Gi
∥xk − ci∥2)
(7.2.2)
where Ji =∑
k,xk∈Gi
∥xk − ci∥2 is the cost function within group i. The value of
Ji depends on the geometrical properties of Gi and the location of ci.
The collection of partitioned groups can be defined by a m x n binary
membership matrix U, where the element uij is 1 if the jth data point xj
belongs to group i, and 0 otherwise. Once the cluster centers ci are fixed, the
value of uij can be computed using the expression,
uij =
{1, if∥xj − ci∥2 ≤ ∥xj − ck∥2 for each k ̸= i
0, otherwise(7.2.3)
where i = 1, . . . ,m, j = 1, . . . , n and k ≤ m
That is, xj belongs to group i if ci is the closest centre among all centers.
Since a given data point can only be in a group, the membership matrix U
has the following properties:
m∑i=1
uij = 1, ∀j = 1, ......, n
133
7. Pattern Classification Algorithms for Face Recognition
andm∑i=1
n∑j=1
uij =n
If uij is fixed, then the optimal cluster centre ci that minimize the cost
function in equation 7.2.1 is computed by finding the mean of all vectors in
group i given by the expression,
ci =1
|Gi|∑
k,xk∈Gi
xk (7.2.4)
where |Gi| is the size of Gi, or
|Gi| =n∑
j=1
uij
For a batch-mode operation, the c-Means algorithm is presented with a
data set xi, i = 1, . . . n; the algorithm determines the cluster centres ci, and
the membership matrix U iteratively using the algorithm 4.
Algorithm 4 : Clustering Algorithm
Step 1: Initialize the cluster centers ci , i = 1, . . . ,m. This is typicallyachieved by randomly selecting m points from among all of the datapoints.
Step 2: Determine the membership matrix U using the equation 7.2.3.
Step 3: Compute the cost function according to equation 7.2.2. Stop if eitherit is below a certain tolerance value or its improvement over previousiteration is below a certain threshold.
Step 4: Update the cluster centers according to equation 7.2.4. Go to step 2.
Step 5: Finally in the recognition stage an unknown pattern x is comparedwith each final cluster centers obtained by applying the above steps. Thecluster l with minimum distance from the unknown pattern x is foundout by the given expression,
x ∈ l if ∥x− cl∥2 < ∥x− ci∥2, for all i = 1, 2 . . . ,m, i ̸= l
The following section describes the simulation of the algorithm 4 along
with the recognition results obtained for human face image patterns.
134
7. Pattern Classification Algorithms for Face Recognition
7.2.2 Simulation experiment and results
The recognition experiment is conducted by simulating the algorithm 4 using
MATLAB. The State Space Point Distribution (SSPD) parameters extracted
from the gray-scale face images as discussed in chapter 5, ALR Feature Vectors
extracted as explained in chapter 6 are used for recognition purpose. The face
images of the KNUFDB face database as well as AT&T face database are
used in the simulation study. The recognition accuracies based on the above
said features ( SSPD & ALR features )using c-Means clustering techniques are
given in Table 7.1 and in Table 7.2 respectively. Graphical representation of
these recognition results based on the two features using c-Means clustering
technique is shown in figure 7.1
Table 7.1: Classification results using c-Means algorithm on KNUFDB database.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 19 31.67 22 36.672 21 35.00 23 38.333 35 58.33 38 63.334 33 55.00 35 58.335 50 83.33 40 66.666 40 66.66 41 68.337 51 85.00 40 66.668 42 70.00 42 70.009 48 80.00 40 66.6610 39 65.00 38 63.3311 37 61.66 39 65.0012 39 65.00 40 66.6613 44 73.33 40 66.6614 40 66.66 40 66.6615 38 63.33 41 68.3316 38 63.33 40 66.6617 39 65.00 40 66.6618 37 61.66 40 66.6619 36 60.00 43 71.6620 36 60.00 43 71.6621 39 65.00 44 73.3322 38 63.33 45 75.0023 50 83.33 39 65.0024 32 53.33 38 63.33
continued on next page
135
7. Pattern Classification Algorithms for Face Recognition
Table 7.1 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD25 29 48.33 38 63.3326 29 48.33 37 61.6627 28 46.66 37 61.6628 30 50.00 38 63.3329 40 66.66 39 65.0030 41 68.33 40 66.6631 40 66.66 40 66.6632 28 46.66 35 58.3333 26 43.33 36 60.0034 26 43.33 35 58.3335 35 58.33 37 61.6636 35 58.33 36 60.0037 28 46.66 38 63.3338 31 51.66 37 61.6639 33 55.00 36 60.0040 35 58.33 40 66.6641 31 51.66 38 63.3342 34 56.66 37 61.6643 35 58.33 35 58.3344 36 60.00 36 60.0045 40 66.66 42 70.0046 40 66.66 40 66.6647 43 71.66 43 71.6648 38 63.33 40 66.6649 37 61.66 40 66.6650 36 60.00 42 70.0051 35 58.33 42 70.0052 38 63.33 40 66.6653 40 66.66 41 68.3354 41 68.33 41 68.3355 42 70.00 41 68.3356 38 63.33 42 70.0057 40 66.66 42 70.0058 42 70.00 42 70.0059 40 66.66 40 66.6660 35 58.33 40 66.66Overall Recognition 61% 64.83%
Table 7.2: Classification results using c-Means Algorithm on AT&T database.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 3 30.00 4 40.00
continued on next page
136
7. Pattern Classification Algorithms for Face Recognition
Table 7.2 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD2 5 50.00 5 50.003 6 60.00 6 60.004 6 60.00 7 70.005 8 80.00 8 80.006 6 60.00 7 70.007 7 70.00 8 80.008 7 70.00 7 70.009 6 60.00 6 60.0010 7 70.00 8 80.0011 6 60.00 8 80.0012 7 70.00 6 60.0013 6 60.00 8 80.0014 7 70.00 6 60.0015 7 70.00 7 70.0016 6 60.00 7 70.0017 7 70.00 7 70.0018 6 60.00 6 60.0019 8 80.00 6 60.0020 6 60.00 8 80.0021 7 70.00 6 60.0022 6 60.00 6 60.0023 6 60.00 7 70.0024 7 70.00 6 60.0025 7 70.00 8 80.0026 8 80.00 8 80.0027 7 70.00 8 80.0028 6 60.00 9 90.0029 7 70.00 8 80.0030 6 60.00 7 70.0031 7 70.00 7 70.0032 6 60.00 8 80.0033 7 70.00 7 70.0034 6 60.00 7 70.0035 7 70.00 6 60.0036 6 60.00 7 70.0037 7 70.00 6 60.0038 6 60.00 6 60.0039 6 60.00 7 70.0040 5 50.00 6 60.00Overall Recognition 64.25% 68.75%
The recognition results indicate the credibility of the extracted features on
the basis of clusters that can be formed with the help of an unsupervised
learning process. The cluster centers formed from the training set show that
137
7. Pattern Classification Algorithms for Face Recognition
the extracted features are good enough to distinguish the face patterns from
one another.
0 5 10 15 20 25 30 35 40 45 50 55 600
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature VectorSSPD Feature Vector
(a) Results on KNUFDB database
0 4 8 12 16 20 24 28 32 36 400
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature VectorSSPD Feature Vector
(b) Results on AT&T database
Figure 7.1: Recognition accuracies for c-Means classifier using SSPD & ALR fea-ture vectors
138
7. Pattern Classification Algorithms for Face Recognition
The overall recognition accuracies obtained using c-Means clustering tech-
nique based on SSPD features and ALR Feature Vectors are 61.00%, and
64.83% respectively on KNUFDB and 64.25% & 68.75% on AT&T face databases.
The alternative classifier used in this study is the well-known nonparametric
k-Nearest Neighbour statistical classifier. The following section describes the
recognition experiments performed using the above said features and k-NN
classifier.
7.3 Statistical pattern classification
In the statistical pattern classification process, each pattern is represented by a
d -dimensional feature vector and it is viewed as a point in the d -dimensional
space. Given a set of training patterns from each class, the objective is to
establish decision boundaries in the feature space which separate patterns be-
longing to different classes. The recognition system is operated in two phases,
training (learning) and classification (testing). The following section describes
the pattern recognition experiment conducted for the recognition of human
faces using k-NN classifier.
7.3.1 k-Nearest Neighbour classifier for face recognition
Pattern classification by distance functions is one of the earliest concepts in
pattern recognition [Tou & Gonzalez 1975],[Friedman & Kandel 1999]. Here
the proximity of an unknown pattern to a class serves as a measure of its
classification. A class can be characterized by single or multiple prototype
pattern(s). The k-Nearest Neighbour method is a well-known non-parametric
classifier, where a posteriori probability is estimated from the frequency of
nearest neighbours of the unknown pattern. It considers multiple prototypes
while making a decision and uses a piecewise linear discriminant function.
139
7. Pattern Classification Algorithms for Face Recognition
Various pattern recognition studies with first-rate performance accuracy are
also reported based on this classification technique [Ray & Chatterjee 1984],
[Zhang & Srihari 2004], [Pernkopf 2005].
Consider the case of m classes ci, i = 1 . . .m and a set of N samples
patterns yi, i = 1, 2, . . . , N whose classification is a priori known. Let x
denote an arbitrary incoming pattern. The nearest neighbour classification
approach classifies x in the pattern class of its nearest neighbour in the set yi,
i = 1, . . . N i.e.,
If ∥x− yj∥2 = min∥x− yi∥2 , 1 ≤ i ≤ N , then x ∈ cj
This scheme can be termed as 1 − NN rule since it employs only one near-
est neighbour to x for classification. This can be extended by considering
the k nearest neighbours to x and using a majority-rule type classifier. The
algorithm 5 summarizes the classification process.
Algorithm 5 Minimum distance k -Nearest Neighbor Classifier.
Input:N -the number of pre-classified patternsm - the number of pattern classes.(yi, ci), 1 ≤ i ≤ N - N ordered pairs, where yi is the ith
pre-classified pattern and ci its class number (1 ≤ ci ≤ m, ∀i).k -the order of NN classifierx-an incoming pattern.
Output: L - the number of class into which x is classified.
Step 1: Set S = {yi, ci}, i = 1, . . . , N
Step 2: Find (yj, cj) ∈ S which satisfies ∥x−yj∥2 = min∥x−yi∥2,where1 ≤i ≤ m.
Step 3: If k = 1 set L = cj and stop; else initializean m - dimensional vector I: I(i′) = 0, i′ ̸= cj; I(cj) = 1, 1 ≤ i ≤ m andset S = S − {(yj, cj)}
140
7. Pattern Classification Algorithms for Face Recognition
Algorithm 5 contd. . . .
Step 4: For i0 = 1, 2 . . . , k − 1 do steps 5-6.
Step 5: Find (yj, cj) ∈ S such that ∥x− yj∥2 = min∥x− yi∥2,where 1 ≤ i ≤ N.
Step 6: Set I(cj) = I(cj) + 1 and S = S − {(yj, cj)}.
Step 7: Set L = max{I(i′)} , 1 ≤ i′ ≤ m and stop.
In the case of k-Nearest Neighbour classifier, we compute the distance of
similarity between the features of a test sample and the features of every train-
ing sample. The class of the majority among the k-nearest training samples is
deemed as the class of the test sample.
7.3.2 Simulation experiment and results
The recognition experiment is conducted by simulating the kNN algorithm
using MATLAB. The State Space Point Distribution (SSPD) parameters ex-
tracted from the gray-scale face images as discussed in chapter 5 and ALR
Feature Vector extracted as explained in chapter 6 are used for recognition
purpose. The face images of the KNUFDB face database as well as AT&T
face database are used in the simulation study. The recognition accuracies
based on the above said features ( SSPD & ALR features )using k-NN classifier
are given in Table 7.3 and in Table 7.4 respectively. Graphical representation
of these recognition results based on the two features using k-NN classifier is
shown in figure 7.2.
Table 7.3: Classification results using k-NN algorithm on KNUFDB database.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 18 60.00 16 53.33
continued on next page
141
7. Pattern Classification Algorithms for Face Recognition
Table 7.3 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD2 16 53.33 19 63.333 18 60.00 18 60.004 18 60.00 18 60.005 17 56.67 20 66.676 17 56.67 20 66.677 18 60.00 20 66.678 19 63.33 19 63.339 19 63.33 19 63.3310 19 63.33 20 66.6711 18 60.00 22 73.3312 18 60.00 22 73.3313 19 63.33 20 66.6714 19 63.33 20 66.6715 21 70.00 19 63.3316 17 56.67 19 63.3317 18 60.00 19 63.3318 25 83.33 18 60.0019 20 66.67 19 63.3320 18 60.00 18 60.0021 19 63.33 19 63.3322 16 53.33 21 70.0023 18 60.00 21 70.0024 20 66.67 20 66.6725 21 70.00 22 73.3326 23 76.67 18 60.0027 20 66.67 18 60.0028 20 66.67 18 60.0029 22 73.33 19 63.3330 20 66.67 19 63.3331 22 73.33 22 73.3332 22 73.33 20 66.6733 20 66.67 20 66.6734 20 66.67 20 66.6735 19 63.33 18 60.0036 19 63.33 20 66.6737 19 63.33 19 63.3338 18 60.00 20 66.6739 19 63.33 22 73.3340 18 60.00 19 63.3341 19 63.33 18 60.0042 18 60.00 18 60.0043 19 63.33 19 63.3344 18 60.00 19 63.3345 17 56.67 21 70.0046 18 60.00 17 56.6747 18 60.00 16 53.3348 18 60.00 25 83.3349 16 53.33 20 66.67
continued on next page
142
7. Pattern Classification Algorithms for Face Recognition
Table 7.3 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD50 19 63.33 18 60.0051 18 60.00 19 63.3352 19 63.33 20 66.6753 16 53.33 18 60.0054 16 53.33 20 66.6755 20 66.67 20 66.6756 20 66.67 20 66.6757 21 70.00 20 66.6758 20 66.67 22 73.3359 24 80.00 25 83.3360 24 80.00 24 80.00Overall Recognition 65.89% 67.22%
Table 7.4: Classification results using k-NN algorithm on AT&T database.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 2 40.00 2 40.002 3 60.00 3 60.003 3 60.00 3 60.004 3 60.00 4 80.005 3 60.00 4 80.006 3 60.00 4 80.007 2 40.00 3 60.008 4 80.00 4 80.009 3 60.00 4 80.0010 4 80.00 4 80.0011 2 40.00 4 80.0012 3 60.00 4 80.0013 3 60.00 4 80.0014 4 80.00 3 60.0015 3 60.00 3 60.0016 4 80.00 4 80.0017 3 60.00 4 80.0018 4 80.00 3 60.0019 3 60.00 4 80.0020 4 80.00 3 60.0021 3 60.00 4 80.0022 3 60.00 3 60.0023 3 60.00 4 80.0024 4 80.00 4 80.0025 4 80.00 4 80.0026 5 100.00 4 80.0027 4 80.00 4 80.0028 3 60.00 4 80.00
continued on next page
143
7. Pattern Classification Algorithms for Face Recognition
Table 7.4 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD29 4 80.00 4 80.0030 4 80.00 4 80.0031 4 80.00 4 80.0032 3 60.00 3 60.0033 4 80.00 4 80.0034 5 100.00 4 80.0035 5 100.00 4 80.0036 4 80.00 4 80.0037 4 80.00 4 80.0038 5 100.00 4 80.0039 4 80.00 4 80.0040 4 80.00 4 80.00Overall Recognition 71.00% 74.50%
0 6 12 18 24 30 36 42 48 54 600
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature VectorSSPD Feature Vector
(a) Results on KNUFDB database
contd. . .
144
7. Pattern Classification Algorithms for Face Recognition
0 4 8 12 16 20 24 28 32 36 400
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature VectorSSPD Feature Vector
(b) Results on AT&T database
Figure 7.2: Recognition accuracies for kNN classifier using SSPD & ALR featurevectors
The overall recognition accuracies obtained using k-NN classifier and SSPD
&ALR Feature Vectors are 65.89% & 67.22% and 71.0 % & 74.5% on KNUFDB
and AT & T face databases respectively. The recognition results are found
better than the previous experiment conducted using c-Means clustering tech-
nique.
7.3.3 Bayesian classifier for face recognition
In this section we present a probabilistic approach to face recognition. As
is true in most fields that deal with measuring and interpreting physical
events, probability consideration become important in pattern recognition
because of the randomness under which pattern classes normally are gener-
ated [Haykin 2001], [Gonzales & Woods 2002]. It is also possible to derive a
classification approach that is optimal in the sense that, on average, its use
145
7. Pattern Classification Algorithms for Face Recognition
yields the lowest probability of committing classification errors. The proba-
bility that a particular pattern x belongs to class ci is denoted by p(ci/x). If
the pattern classifier decides that x is in cj when it actually belongs to ci, it
incurs a loss, Lij. As pattern x may belong to any one of N classes under
consideration, the average loss incurred in assigning x to class cj is
rj(x) =N∑k=1
Lkjp(ck/x) (7.3.1)
This equation can be re-written as
rj(x) =1
p(x)
N∑k=1
Lkjp(x/ck)P (ck) (7.3.2)
where p(x/ck) is the p.d.f. of the patterns from class ck and P (ck) is the
probability of occurrence of class ck. Because 1/p(x) is positive and common
to all the rj(x), j = 1, 2 . . .m. It can be dropped from equation ( 7.3.2) without
affecting the relative order of these functions from the smallest to the largest
value. The expression for the average loss then reduces to
rj(x) ≃N∑k=1
Lkjp(x/ck)P (ck) (7.3.3)
The classifier hasN possible classes to choose from any given unknown pattern.
If it computes r1(x), r2(x), . . . , rm(x) for each pattern x and assigns the pattern
to the class with the smallest loss, the total average loss with minimum. The
classifier that minimizes the total average loss is called the Bayes classifier.
Thus the Bayes classifier assigns an unknown classifier pattern x to class ci if
ri(x) < rj(x) for j = 1, 2, . . .m, j ̸= i. i.e., x is assigned to class ci if
N∑k=1
Lkjp(x/ck)P (ck) <N∑q=1
Lqjp(x/cq)P (cq) (7.3.4)
For all j; j ̸= i. The loss for a correct decision generally is assigned a value
of zero, and the loss for any incorrect decision usually is assigned the same
146
7. Pattern Classification Algorithms for Face Recognition
non-zero value (say, 1). Under these conditions, the loss function becomes
Lij = 1− δij (7.3.5)
where δij = 1 if i = j and δij = 0 if i ̸= j. Equation 7.3.5 indicates a loss of
unity for incorrect decisions and a loss of zero for correct decision. Substituting
equation 7.3.5 into equation 7.3.3 yields
rj(x) =N∑k=1
(1− δkj)p(x/ck)P (ck)
= p(x)− p(x/cj)P (cj). (7.3.6)
The Bayesian classifier then assigns a pattern x to class cj if, for all j ̸= i,
p(x)− p(x/ci)P (ci) < p(x)− p(x/cj)P (cj) (7.3.7)
or, equivalently, if
p(x/ci)P (ci) < p(x/cj)P (cj), j = 1, 2, . . .m; j ̸= i. (7.3.8)
Thus Bayes classifier for a 0-1 loss function is nothing more than the compu-
tation of the decision functions of the form
dj(x) = p(x/cj)P (cj)j = 1, 2, . . .m; j ̸= i (7.3.9)
where a pattern vector x is assigned to the class whose decision function yields
the largest numerical value. The decision functions given in equation 7.3.9 are
optimal in the sense that they minimize the average loss in miss classification.
For this optimality to hold, however, the probability density functions of the
patterns in each class, as well as the probability of occurrence of each class,
must be known. The latter requirement usually is not a problem. For instance,
if all classes are equally likely to occur, then P (cj) = 1/m. We assumed the
147
7. Pattern Classification Algorithms for Face Recognition
p(x/cj) as Gaussian probability density function. In the n−dimensional case,
the Gaussian density of the vectors in the jth pattern class has the form
p(x/cj) =1
(2π)n/2|Cj|1/2e−
12(x−mj)
TCj−1(x−mj) (7.3.10)
where mean vector mj and covariance matrix Cj are given as follows
mj =1
Nj
∑x∈cj
x, and (7.3.11)
Cj =1
Nj
∑x∈cj
xxT −mjmTj (7.3.12)
Since Gaussian density is assumed, working with natural logarithm of the
decision function is more convenient. i.e. we can use the form
dj(x) =ln [p(x/cj)P (cj)]
= ln p(x/cj) + lnP (cj) (7.3.13)
On substitution of Equ. 7.3.9 into 7.3.13 yields
dj(x) = lnP (cj)−n
2ln 2π − 1
2ln |Cj| −
1
2
[(x−mj)
TC−1j (x−mj)
](7.3.14)
The term n2ln 2π is same for all classes, so it can be eliminated from the
above equation, which then becomes
dj(x) = lnP (cj)−1
2ln |Cj| −
1
2
[(x−mj)
TC−1j (x−mj)
](7.3.15)
for j = 1, 2, . . . ,m. This equation represents the Bayes decision functions for
Gaussian pattern classes under the condition of a 0 - 1 loss function.
7.3.4 Simulation experiment and results
The recognition experiment is conducted by simulating the Bayesian algo-
rithm using MATLAB. The State Space Point Distribution (SSPD) parame-
ters extracted from the gray-scale face images as discussed in chapter 5, ALR
148
7. Pattern Classification Algorithms for Face Recognition
Feature Vectors extracted as explained in chapter 6 are used for recognition
purpose. The face images of the KNUFDB face database as well as AT&T
face database are used in the simulation study. The recognition accuracies
based on the above said features ( SSPD & ALR features ) on these databases
using Bayesian classifier are given in Table 7.5 and in Table 7.6 respectively.
Graphical representation of these recognition results based on the two features
using Bayesian classifier is shown in figure 7.3
Table 7.5: Classification results using Bayesian classification algorithm onKNUFDB database.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 16 53.33 16 53.332 17 56.67 19 63.333 18 60.00 18 60.004 18 60.00 18 60.005 17 56.67 20 66.676 18 60.00 20 66.677 18 60.00 20 66.678 19 63.33 19 63.339 19 63.33 19 63.3310 19 63.33 20 66.6711 18 60.00 22 73.3312 18 60.00 22 73.3313 19 63.33 20 66.6714 19 63.33 20 66.6715 21 70.00 19 63.3316 17 56.67 19 63.3317 18 60.00 19 63.3318 25 83.33 18 60.0019 20 66.67 19 63.3320 18 60.00 18 60.0021 19 63.33 19 63.3322 16 53.33 21 70.0023 18 60.00 21 70.0024 20 66.67 20 66.6725 21 70.00 22 73.3326 23 76.67 18 60.0027 20 66.67 18 60.0028 20 66.67 18 60.0029 22 73.33 19 63.3330 20 66.67 19 63.3331 22 73.33 22 73.3332 22 73.33 20 66.67
continued on next page149
7. Pattern Classification Algorithms for Face Recognition
Table 7.5 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD33 20 66.67 20 66.6734 20 66.67 20 66.6735 19 63.33 18 60.0036 19 63.33 20 66.6737 19 63.33 19 63.3338 18 60.00 20 66.6739 19 63.33 22 73.3340 18 60.00 19 63.3341 19 63.33 18 60.0042 18 60.00 18 60.0043 19 63.33 19 63.3344 18 60.00 19 63.3345 17 56.67 21 70.0046 18 60.00 17 56.6747 18 60.00 16 53.3348 18 60.00 25 83.3349 16 53.33 20 66.6750 19 63.33 18 60.0051 18 60.00 19 63.3352 19 63.33 20 66.6753 16 53.33 18 60.0054 16 53.33 20 66.6755 20 66.67 20 66.6756 20 66.67 20 66.6757 21 70.00 20 66.6758 20 66.67 22 73.3359 24 80.00 25 83.3360 24 80.00 24 80.00Overall Recognition 63.61% 65.50%
Table 7.6: Classification results using Bayesian classification algorithm on AT&Tdatabase.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 1 20.00 2 40.002 2 40.00 3 60.003 3 60.00 3 60.004 3 60.00 3 60.005 3 60.00 4 80.006 3 60.00 3 60.007 3 60.00 4 80.008 3 60.00 3 60.009 4 80.00 4 80.0010 4 80.00 3 60.00
continued on next page
150
7. Pattern Classification Algorithms for Face Recognition
Table 7.6 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD11 4 80.00 3 60.0012 2 40.00 3 60.0013 3 60.00 5 100.0014 3 60.00 4 80.0015 4 80.00 4 80.0016 3 60.00 3 60.0017 3 60.00 3 60.0018 3 60.00 4 80.0019 3 60.00 3 60.0020 4 80.00 4 80.0021 2 40.00 5 100.0022 3 60.00 3 60.0023 3 60.00 4 80.0024 3 60.00 4 80.0025 4 80.00 4 80.0026 4 80.00 4 80.0027 5 100.00 5 100.0028 4 80.00 4 80.0029 3 60.00 5 100.0030 4 80.00 3 60.0031 4 80.00 4 80.0032 4 80.00 4 80.0033 3 60.00 3 60.0034 2 40.00 4 80.0035 5 100.00 4 80.0036 5 100.00 4 80.0037 5 100.00 3 60.0038 4 80.00 4 80.0039 5 100.00 3 60.0040 5 100.00 4 80.00Overall Recognition 69.00% 73.00%
The overall recognition accuracies obtained for the sixty individuals face
images using Bayesian classifier and SSPD & ALR Feature Vectors are 63.61%
& 65.50% and 69.00% & 73.00% on KNUFDB and AT & T face databases
respectively. The recognition results are found better than the previous ex-
periment conducted using c-Means clustering technique but not better than
k-NN classifier.
These three algorithms do not fully accommodate the small variations in
the extracted features. These results specify the need of improving the classi-
151
7. Pattern Classification Algorithms for Face Recognition
0 6 12 18 24 30 36 42 48 54 600
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature Vector
SSPD Feature Vector
(a) Results on KNUFDB database
0 4 8 12 16 20 24 28 32 36 400
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature Vector
SSPD Feature Vector
(b) Results on AT&T database
Figure 7.3: Recognition accuracies for Bayesian classifier using SSPD & ALR fea-ture vectors
152
7. Pattern Classification Algorithms for Face Recognition
fication algorithm for large class pattern classification problems. In the next
section we present a recognition study conducted using neural network that
is capable of adaptively accommodating the minor variations in the extracted
features.
7.4 Neural network for face recognition
In recent years, neural networks have been successfully applied in many of the
pattern recognition and machine learning systems [Ripley 1996], [Haykin 2001],
[Simpson 1990]. These models are composed of a highly interconnected mesh
of nonlinear computing elements, whose structure is drawn from analogies
with biological neural systems. Since the advent of Feed Forward Multi Layer
Perceptron (FFMLP) and error back propagation training algorithm, great im-
provements in terms of recognition performance and automatic training have
been achieved in the area of pattern recognition [Looney 1997]. In the present
study, we used a 3-layer architecture suitable for the classification module
using FFMLP for the recognition of human face images.
The following sections deal with the recognition experiments conducted
based on the feed-forward neural network for face recognition. A brief descrip-
tion about the diverse use of neural networks in pattern recognition followed
by the general ANN architecture is presented first. In the next section the
error back propagation algorithm used for training FFMLP is illustrated. The
final section deals with the neural network architecture used for the human
face pattern classification studies followed by the description of simulation
experiments and recognition results.
153
7. Pattern Classification Algorithms for Face Recognition
7.4.1 Neural networks for pattern recognition
Artificial Neural Networks (ANN) can be most adequately characterized as
computational models with particular properties such as the ability to adapt
or learn, to generalize, to cluster or organize data, based on a massively paral-
lel architecture. The history of ANNs starts with the introduction of simplified
neurons in the work of McCulloch and Pitts [McCulloch & Pitts 1943]. These
neurons were presented as models of biological neurons and as conceptual
mathematical neurons like threshold logic devices that could perform com-
putational task. The work of Hebb further developed the understanding of
the neural model [Hebb 1949]. Hebb proposed a qualitative mechanism de-
scribing the process by which synaptic connections are modified in order to
reflect the learning process undertaken by interconnected neurons, when they
are influenced by some environmental stimuli. Rosenblatt with his percep-
tron model, further enhanced our understanding of artificial learning devices
[Rosenblatt 1959]. However, the analysis by Minsky and Papert in their work
on perceptrons, in which they showed the deficiencies and restrictions ex-
isting in these simplified models, caused a major set back in this research
area [Minsky & Papert 1969]. ANNs attempt to replicate the computational
power (low level arithmetic processing ability) of biological neural networks
and, there by, hopefully endow machines with some of the (higher-level) cog-
nitive abilities that biological organisms possess. These networks are reputed
to possess the following basic characteristics:
• Adaptiveness: the ability to adjust the connection strengths to new data
or information
• Speed: due to massive parallelism
• Robustness: to missing, confusing, and/ or noisy data
154
7. Pattern Classification Algorithms for Face Recognition
• Optimality: regarding the error rates in performance
Several neural network learning algorithms have been developed in the past
years. In these algorithms, a set of rules defines the evolution process under-
taken by the synaptic connections of the networks, thus allowing them to learn
how to perform specified tasks. The following sections provide an overview of
neural network models and discuss in more detail about the learning algo-
rithm used in classifying the face images, namely the Back-propagation (BP)
learning algorithm.
7.4.2 General ANN architecture
A neural network consists of a set of massively interconnected processing el-
ements called neurons. These neurons are interconnected through a set of
connection weights, or synaptic weights. Every neuron i has Ni inputs, and
one output Yi. The inputs labeled si1, si2, . . . , siNi represent signals coming
either from other neurons in the network, or from external world. Neuron i
has Ni synaptic weights, each one associated with each of the neuron inputs.
These synaptic weights are labeled wi1, wi2, . . . , wiNi, and represent real valued
quantities that multiply the corresponding input signal. Also every neuron i
has an extra input, which is set to a fixed value θ, and is referred to as the
threshold of the neuron that must be exceeded for there to be any activation in
the neuron. Every neuron computes its own internal state or total activation,
according to the following expression,
xj =
Ni∑i=1
wijsij + θι , j = 1, 2, . . . ,M (7.4.1)
whereM is the total number of Neurons and Ni is the number of inputs to each
neuron. Figure 7.4 shows a schematic description of the neuron. The total
activation is simply the inner product of the input vector Si = [si0, si1, , siNi]T
155
7. Pattern Classification Algorithms for Face Recognition
by the weight vector Wi = [wi0, wi1, . . . , wiNi]T . Every neuron computes its
output according to a function Yi = f(xi), also known as threshold or activa-
tion function. The exact nature of f will depend on the neural network model
under study.
Figure 7.4: Simple neuron representation
In the present study, we use a mostly applied sigmoid function in the
thresholding unit defined by the expression,
S(x) =1
1 + e−ax(7.4.2)
This function is also called S-shaped function. It is a bounded, monotonic,
non-decreasing function that provides a graded non-linear response as shown
in figure 7.5. The network topology used in the present study is the feed for-
ward network. In this architecture the data flow from input to output units
strictly feed forward, the data processing can extend over multiple layers of
units but no feed back connections are present. This type of structure incorpo-
rates one or more hidden layers, whose computation nodes are correspondingly
156
7. Pattern Classification Algorithms for Face Recognition
Figure 7.5: Sigmoid threshold function
called hidden neurons or hidden nodes. The function of the hidden nodes is to
intervene between the external input and the network output. By adding one
or more layers, the network is able to extract higher-order statistics. The abil-
ity of hidden neurons to extract higher-order statistics is particularly valuable
when the size of the input layer is large. The structural architecture of the
neural network is intimately linked to the learning algorithm used to train the
network. In this study we used Error Back-propagation learning algorithm to
train the input patterns in the multilayer feed forward neural network. The
detailed description of the learning algorithm is given in the following section.
7.4.3 Back-propagation algorithm for training feed-forwardmultilayer perceptron (FFMLP)
The back propagation algorithm (BP) is the most popular method for neural
network training and it has been used to solve numerous real life problems. In
a multilayer feed forward neural network BP performs iterative minimization
157
7. Pattern Classification Algorithms for Face Recognition
of a cost function by making weight connection adjustments according to the
error between the computed and desired output values. Figure 7.6 shows a
general three layer network.
Figure 7.6: A general three layer network
The following relationships for the derivation of the back-propagation hold:
Ok =1
1 + e−netk
netk =∑k
wikOj
Oj =1
1 + e−netj
netj =∑j
wijOi (7.4.3)
where Ok indicate the output vector of the kth output layer and netk the
activation vector of the kth output layer. The cost function (error function)
is defined as the mean square sum of differences between the output values of
the network and the desired target values. The following formula is used for
158
7. Pattern Classification Algorithms for Face Recognition
this error computation,
E =1
2
∑p
(∑k
(tpk −Opk)2) (7.4.4)
where p is the subscript representing the pattern and k represents the output
units. In this way, tpk is the target value of output unit k for pattern p and Opk
is the actual output value of layer unit k for pattern p. During the training
process a set of feature vectors corresponding to each pattern class is used.
Each training pattern consists of a pair with the input and corresponding
target output. The patterns are presented to the network sequentially, in an
iterative manner. The appropriate weight corrections are performed during the
process to adapt the network to the desired behavior. The iterative procedure
continues until the connection weight values allow the network to perform the
required mapping. Each presentation of whole pattern set is named as epoch.
The minimization of the error function is carried out using the gradient-
descent technique. The necessary corrections to the weights of the network
for each iteration n are obtained by calculating the partial derivative of the
error function in relation to each weight wjk, which gives a direction of steepest
descent. A gradient vector representing the steepest increasing direction in the
weight space is thus obtained. Due to the fact that a minimization is required,
the weight update value ∆wjk uses the negative of the corresponding gradient
vector component for that weight. The delta rule determines the amount of
weight update based on this gradient direction along with a step size,
∆wjk = −η∂E
∂wjk
(7.4.5)
The parameter η represents the step size and is called the learning rate. The
partial derivative is equal to,
∂E
∂wjk
=∂E
∂Ok
∂Ok
∂netk
∂netk∂wjk
= (tk −Ok)Ok(1−Ok)Oj (7.4.6)
159
7. Pattern Classification Algorithms for Face Recognition
The error signal δk is defined as,
δk = (tk −Ok)Ok(1−Ok) (7.4.7)
so that the delta rule formula becomes:
∆wjk = ηδkOj (7.4.8)
For the hidden neuron, the weight change of wij is obtained in a similar way.
A change to the weight, wij, changes Oj and this changes the inputs into each
unit k, in the output layer. The change in E with a change in wij is therefore
the sum of the changes to each of the output units. The chain rules produces:
∂E
∂wij
=∑k
∂E
∂Ok
∂Ok
∂netk
∂netk∂Oj
∂Oj
∂netj
∂netj∂wij
=∑
−(tk −Ok)Ok(1−Ok)wjkOj(1−Oj)Oi
= −OiOj(1−Oj)∑k
δkwjk (7.4.9)
So that defining the error δj as:
δj = Oj(1−Ok)∑k
δkwjk (7.4.10)
we have the weight change in the hidden layer equal to:
∆wij = ηδjOi (7.4.11)
The δk for the output units can be calculated using directly available values,
since the error measure is based on the difference between the desired output
tk and the actual output Ok. However, that measure is not available for the
hidden neurons. The solution is to back-propagate the δk values, layer by
layer through the network, so that finally the weights are updated. A momen-
tum term was introduced in the back-propagation algorithm by Rumelhart
160
7. Pattern Classification Algorithms for Face Recognition
[Rumelhart et al. 1986]. Here the present weight is modified by incorporating
the influence of the passed iterations. Then the delta rule becomes
∆wij(n) = −η∂E
∂wjk
+ α∆wij(n− 1) (7.4.12)
where α is the momentum parameter and determines the amount of influence
from the previous iteration on the present one. The momentum introduces a
damping effect on the search procedure, thus avoiding oscillations in irregu-
lar areas of the error surface by averaging gradient components with opposite
sign and accelerating the convergence in long flat areas. In some situations it
possibly avoids the search procedure from being stopped in a local minimum,
helping it to skip over those regions without performing any minimization
there. Momentum may be considered as an approximation to a second-order
method, as it uses information from the previous iterations. In some applica-
tions, it has been shown to improve the convergence of the back propagation
algorithm.
The following section describes the simulation experiments and results ob-
tained using ANN classifier, and ALR and SSPD feature parameters discussed
in the previous chapters.
7.4.4 Simulation experiment and results
Present study investigates the recognition capabilities of the above explained
FFMLP-based Face Recognition system. For this purpose the multilayer feed
forward neural network is simulated with the Back propagation learning al-
gorithm. A constant learning rate, 0.00001, is used. The initial weights are
obtained by generating random numbers less than one. The number of nodes
in the input layer is fixed according to the feature vector size. The recognition
experiment is repeated by changing the number of hidden layers and number
161
7. Pattern Classification Algorithms for Face Recognition
Table 7.7: ANN learning parameters for ALR feature vector on KNUFDB & AT&T face databases.
Value for
Parameter
AT&T KNUFDB
Input nodes 40 40Output nodes 40 60Hidden nodes 1 1Performance function MSE MSEError goal 0.01 0.01Transformation function Sigmoid SigmoidLearning Rate 0.00001 0.00001Number of epochs 19800 21380Momentum Constant 0.3 0.3
of nodes in each hidden layer. By trial and error experiment, the number of
hidden layers is fixed as one and the number of nodes in the hidden layer is set
appropriately for obtaining the successful architecture in the present study.
The training process is terminated when the MSE is less than ε or the
number of epochs exceeds T . The error tolerance ε is fixed as 0.01 and num-
ber of optimum epochs T has been found by trial and error experiment. The
network is trained using the SSPD and ALR Feature Vectors separately. We
have used a set of 3600 samples of the 110 individuals for iteratively comput-
ing the final weight matrix and a disjoint set of face image patterns of same
size from the KNUFDB database for recognition purpose. The final training
parameters after a successful epoch are given below in tables 7.7 and 7.8.
The recognition accuracies obtained for the sixty persons in the KNUFDB
and AT&T face databases are tabulated in Tables 7.9 and 7.10 using the
above said features and artificial neural network. The graphical representation
of these recognition results based on different features using artificial neural
network is shown in figure 7.7
162
7. Pattern Classification Algorithms for Face Recognition
Table 7.8: ANN learning parameters for SSPD feature vector on KNUFDB & AT&T face databases.
Value for
Parameter
AT&T KNUFDB
Input nodes 16 16Output nodes 40 60Hidden nodes 16 16Performance function MSE MSEError goal 0.01 0.01Transformation function Sigmoid SigmoidLearning Rate 0.00001 0.00001Number of epochs 14500 20450Momentum Constant 0.3 0.3
Table 7.9: Classification results using ANN on KNUFDB database.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 16 53.33 18 60.002 18 60.00 19 63.333 18 60.00 20 66.674 19 63.33 20 66.675 20 66.67 22 73.336 20 66.67 20 66.677 20 66.67 24 80.008 19 63.33 20 66.679 19 63.33 20 66.6710 20 66.67 24 80.0011 22 73.33 21 70.0012 22 73.33 20 66.6713 20 66.67 21 70.0014 20 66.67 24 80.0015 24 80.00 21 70.0016 23 76.67 23 76.6717 20 66.67 22 73.3318 20 66.67 23 76.6719 19 63.33 22 73.3320 21 70.00 23 76.6721 19 63.33 22 73.3322 21 70.00 24 80.0023 19 63.33 25 83.3324 24 80.00 23 76.6725 25 83.33 24 80.0026 26 86.67 25 83.33
continued on next page
163
7. Pattern Classification Algorithms for Face Recognition
Table 7.9 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD27 23 76.67 23 76.6728 26 86.67 25 83.3329 23 76.67 24 80.0030 25 83.33 26 86.6731 24 80.00 26 86.6732 25 83.33 25 83.3333 26 86.67 23 76.6734 23 76.67 24 80.0035 26 86.67 26 86.6736 23 76.67 24 80.0037 25 83.33 22 73.3338 24 80.00 26 86.6739 25 83.33 21 70.0040 29 96.67 24 80.0041 24 80.00 26 86.6742 25 83.33 25 83.3343 26 86.67 25 83.3344 23 76.67 25 83.3345 26 86.67 24 80.0046 23 76.67 25 83.3347 25 83.33 28 93.3348 24 80.00 29 96.6749 25 83.33 24 80.0050 26 86.67 25 83.3351 24 80.00 26 86.6752 22 73.33 23 76.6753 26 86.67 26 86.6754 21 70.00 23 76.6755 24 80.00 25 83.3356 26 86.67 24 80.0057 25 83.33 25 83.3358 25 83.33 26 86.6759 25 83.33 28 93.3360 25 83.33 28 93.33Overall Recognition 76.17 % 78.83%
Table 7.10: Classification results using ANN on AT&T face database.
Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)
ID correctly ALRFV correctly SSPD1 2 40.00 3 60.002 3 60.00 4 80.003 4 80.00 5 100.004 5 100.00 5 100.00
continued on next page
164
7. Pattern Classification Algorithms for Face Recognition
Table 7.10 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy
classified (%) classified (%)ID correctly ALRFV correctly SSPD5 5 100.00 5 100.006 5 100.00 5 100.007 5 100.00 5 100.008 4 80.00 5 100.009 5 100.00 5 100.0010 5 100.00 5 100.0011 5 100.00 5 100.0012 5 100.00 5 100.0013 5 100.00 5 100.0014 5 100.00 5 100.0015 5 100.00 5 100.0016 5 100.00 5 100.0017 5 100.00 5 100.0018 5 100.00 5 100.0019 5 100.00 5 100.0020 5 100.00 5 100.0021 5 100.00 5 100.0022 5 100.00 5 100.0023 5 100.00 5 100.0024 5 100.00 5 100.0025 5 100.00 5 100.0026 5 100.00 5 100.0027 5 100.00 5 100.0028 5 100.00 5 100.0029 5 100.00 5 100.0030 5 100.00 5 100.0031 5 100.00 5 100.0032 5 100.00 5 100.0033 5 100.00 5 100.0034 5 100.00 5 100.0035 5 100.00 5 100.0036 5 100.00 5 100.0037 5 100.00 5 100.0038 5 100.00 5 100.0039 5 100.00 5 100.0040 5 100.00 5 100.00Overall Recognition 96.50% 98.50%
165
7. Pattern Classification Algorithms for Face Recognition
0 6 12 18 24 30 36 42 48 54 600
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature Vector
SSPD Feature Vector
(a) Results on KNUFDB database
0 4 8 12 16 20 24 28 32 36 400
10
20
30
40
50
60
70
80
90
100
Person Class
% R
ecog
nitio
n
ALR Feature Vectot
SSPD Feature Vector
(b) Results on AT&T database
Figure 7.7: Recognition accuracies for ANN classifier using SSPD & ALR featurevectors
166
7. Pattern Classification Algorithms for Face Recognition
7.5 Conclusion
Human face recognition studies based on the parameters developed in chapter
5 and 6 using different classifiers are presented in this chapter. The cluster
analysis using the c-Means clustering technique is conducted and the tech-
nique is used for the recognition of face image patterns. The credibility of
the extracted parameters is also tested with the k-NN classifier and Bayesian
classifier. A connectionist model based recognition system by means of neural
network is then implemented and tested using SSPD and ALR Feature Vectors
extracted from the face images. The highest recognition accuracy (98.50%) is
obtained based on the SSPD Feature Vector using FFMLP classifier on AT&T
database. The results also specify the need for improving the classification al-
gorithm in order to fully accommodate the small variations present in the
extracted features. To this end, ALR Feature Vector and SSPD parameters
are further used for developing a Face Recognition System with the help of a
Support Vector Machine classifier.
167