Chapter 7 Pattern Classi cation Algorithms for Face Recognition

Chapter 7

Pattern ClassificationAlgorithms for Face Recognition

7.1 Introduction

The best pattern recognizers in most instances are human beings. Yet we do

not completely understand how the brain recognize patterns. Pattern recog-

nition is the study of how machines can observe the environment, learn to

distinguish pattern of interest from their background, and make sound and

reasonable decisions about the categories of the patterns. Automatic (ma-

chine) recognition, description, classification and grouping of patterns are im-

portant problems in a variety of engineering and scientific disciplines. Pattern

recognition can be viewed as the categorization of input data into identifiable

classes via the extraction of significant features or attributes of the data. Duda

and Hart [Duda & Hart 1973], [Duda et al. 2001] define it as a field concerned

with machine recognition of meaningful regularities in noisy or complex en-

vironment. It encompasses a wide range of information processing problems

of great practical significance from pattern recognition of simple patterns like

character patterns and speech patterns, to complex problems like human face

recognition and medical diagnosis. Today, pattern recognition is an integral

part of most intelligent systems built for decision making. Normally the pat-

129

7. Pattern Classification Algorithms for Face Recognition

tern recognition processes make use of one of the following two classification

strategies.

i. Supervised classification (e.g., discriminant analysis) in which the input pat-

tern is identified as a member of a predefined class.

ii. Unsupervised classification (e.g., clustering and Principal Component Anal-

ysis) in which the pattern is assigned to a hitherto unknown class.

In the present study various recognition experiments are conducted us-

ing the different pattern recognition algorithms in order to identify the cred-

ibility of the feature parameters proposed. The State Space Point Distribu-

tion(SSPD) features extracted from the gray-scale images of the human faces

as explained in chapter 5, ALR Feature Vector derived from the face im-

ages as discussed in chapter 6 are used as parameters for recognition study.

The well-known approaches that are widely used to solve pattern recognition

problems including clustering technique (c-Means algorithm), statistical pat-

tern classifiers (k-Nearest Neighbour classifier and Bayesian classifier), and

connectionist approach (Artificial Neural Networks) are used for recognizing

human face patterns. c-Means clustering technique is based on unsupervised

learning approach. The k-NN classifier, Bayesian and artificial neural network

work on the basis of supervised learning strategy.

This chapter is organized in three sections. Section 7.2 presents the human

face recognition experiments conducted using Cluster Analysis. The section 7.3

deals with recognition experiments conducted using statistical pattern recogni-

tion strategies - k-NN and Bayesian classifiers. The section 7.4 describes the

Artificial Neural Network architecture and the simulation experiments con-

ducted for the recognition of human face patterns along with the performance

comparisons of various classifiers and proposed parameters.

130


7.2 Cluster analysis for pattern recognition

In real life pattern recognition tasks, we handle a huge amount of information

that are perceived. Here processing every piece of information as a single entity

would be impossible. Hence we tend to categorize entities into clusters, which

are characterized by common attributes of the entities it contains and hence

the huge amount of information contained in any relevant process is reduced.

Some of the common definitions proposed for a cluster are given below.

1. A cluster is a set of entities that are alike, and entities from different

clusters are not alike.

2. A cluster is an aggregation of points in the test space such that the

distance between any two points in the cluster is less than the distance

between any point in the cluster and any point not in it.

3. Clusters may be described as connected regions of a p-dimensional space

containing a relatively high density of points, separated from other such

regions by a region containing a relatively low density of points.

Clustering is a major tool used in pattern recognition processes generally

for data reduction, hypothesis generation, hypothesis testing and prediction

based on grouping. In several cases, the amount of data available in a problem

can be very large and, as a consequence, its effective processing becomes very

demanding. In this context data reduction by the help of cluster analysis

can be used in order to group data into a number of reduced representative

clusters. Then each cluster can be processed as a single entity. In some other

applications cluster analysis can be used to infer some hypothesis concerning

the nature of the data. These hypotheses must then be verified using other

131


data sets and in this context, cluster analysis is used for the verification of the

validity of a specific hypothesis.

Another important application of cluster analysis is the prediction based

on grouping. In this case, cluster analysis is applied to the available data set,

and the resulting clusters are characterized based on the characteristics of the

patterns by which they are formed. Consequently, if we are given an unknown

pattern, we can determine the cluster to which it is more likely to belong and

we characterize it based on the characteristics of the respective cluster. In

the present study, we are interested in applying cluster analysis for prediction

based on the grouping using c-Means clustering technique for the recognition

of human face image patterns. The implementation details and experimental

results using this technique are explained in the following section.

7.2.1 c-Means clustering for face recognition

The c-Means algorithm is one of the most simple and well-known clustering

techniques. This has been applied to variety of pattern recognition problems.

It is based on the minimization of an objective function, which is defined as

the sum of the squared distances from all points in a cluster domain to the

cluster centre. Determining the prototypes or cluster centers is a major task

in designing a classifier based on clustering. This is normally achieved on the

basis of minimum distance approach. Prior to designing pattern clustering

algorithms we must define a similarity measure by which we decide whether

or not two patterns x and y are members of the same cluster. A similarity

measure δ(x,y) is usually defined, so that the principle lim δ(x,y) = 0 as

x → y hold. This is the case for example, if the patterns are in Rn and we

define

δ(x,y) = ∥x− y∥2

132


The c-Means algorithm partitions a collection of n vectors xj, where j =

1, . . . , n intom groupsGi, i = 1, . . . ,m and finds cluster centers ci, i = 1, . . . ,m

corresponding to each group such that a cost function of dissimilarity (or

distance) measure is minimized. A generic distance function d(xk, ci) can

be applied for vector xk in group i; the corresponding cost function is thus

expressed as

J =m∑i=1

Ji =m∑i=1

( ∑k,xk∈Gi

d(xk − ci)

)(7.2.1)

In this work the Euclidean distance is chosen as the dissimilarity measure

between a vector xk in group Gi and the corresponding cluster centre ci. Here

the cost function is defined by

J =m∑i=1

Ji =m∑i=1

( ∑k,xk∈Gi

∥xk − ci∥2)

(7.2.2)

where Ji =∑

k,xk∈Gi

∥xk − ci∥2 is the cost function within group i. The value of

Ji depends on the geometrical properties of Gi and the location of ci.

The collection of partitioned groups can be defined by a m x n binary

membership matrix U, where the element uij is 1 if the jth data point xj

belongs to group i, and 0 otherwise. Once the cluster centers ci are fixed, the

value of uij can be computed using the expression,

uij =

{1, if∥xj − ci∥2 ≤ ∥xj − ck∥2 for each k ̸= i

0, otherwise(7.2.3)

where i = 1, . . . ,m, j = 1, . . . , n and k ≤ m

That is, xj belongs to group i if ci is the closest centre among all centers.

Since a given data point can only be in a group, the membership matrix U

has the following properties:

m∑i=1

uij = 1, ∀j = 1, ......, n

133


andm∑i=1

n∑j=1

uij =n

If uij is fixed, then the optimal cluster centre ci that minimize the cost

function in equation 7.2.1 is computed by finding the mean of all vectors in

group i given by the expression,

ci =1

|Gi|∑

k,xk∈Gi

xk (7.2.4)

where |Gi| is the size of Gi, or

|Gi| =n∑

j=1

uij

For a batch-mode operation, the c-Means algorithm is presented with a

data set xi, i = 1, . . . n; the algorithm determines the cluster centres ci, and

the membership matrix U iteratively using the algorithm 4.

Algorithm 4 : Clustering Algorithm

Step 1: Initialize the cluster centers ci , i = 1, . . . ,m. This is typicallyachieved by randomly selecting m points from among all of the datapoints.

Step 2: Determine the membership matrix U using the equation 7.2.3.

Step 3: Compute the cost function according to equation 7.2.2. Stop if eitherit is below a certain tolerance value or its improvement over previousiteration is below a certain threshold.

Step 4: Update the cluster centers according to equation 7.2.4. Go to step 2.

Step 5: Finally in the recognition stage an unknown pattern x is comparedwith each final cluster centers obtained by applying the above steps. Thecluster l with minimum distance from the unknown pattern x is foundout by the given expression,

x ∈ l if ∥x− cl∥2 < ∥x− ci∥2, for all i = 1, 2 . . . ,m, i ̸= l

The following section describes the simulation of the algorithm 4 along

with the recognition results obtained for human face image patterns.

134


7.2.2 Simulation experiment and results

The recognition experiment is conducted by simulating the algorithm 4 using

MATLAB. The State Space Point Distribution (SSPD) parameters extracted

from the gray-scale face images as discussed in chapter 5, ALR Feature Vectors

extracted as explained in chapter 6 are used for recognition purpose. The face

images of the KNUFDB face database as well as AT&T face database are

used in the simulation study. The recognition accuracies based on the above

said features ( SSPD & ALR features )using c-Means clustering techniques are

given in Table 7.1 and in Table 7.2 respectively. Graphical representation of

these recognition results based on the two features using c-Means clustering

technique is shown in figure 7.1

Table 7.1: Classification results using c-Means algorithm on KNUFDB database.

Person No. samples Accuracy No. samples Accuracyclassified (%) classified (%)

ID correctly ALRFV correctly SSPD1 19 31.67 22 36.672 21 35.00 23 38.333 35 58.33 38 63.334 33 55.00 35 58.335 50 83.33 40 66.666 40 66.66 41 68.337 51 85.00 40 66.668 42 70.00 42 70.009 48 80.00 40 66.6610 39 65.00 38 63.3311 37 61.66 39 65.0012 39 65.00 40 66.6613 44 73.33 40 66.6614 40 66.66 40 66.6615 38 63.33 41 68.3316 38 63.33 40 66.6617 39 65.00 40 66.6618 37 61.66 40 66.6619 36 60.00 43 71.6620 36 60.00 43 71.6621 39 65.00 44 73.3322 38 63.33 45 75.0023 50 83.33 39 65.0024 32 53.33 38 63.33

continued on next page

135


Table 7.1 – continued from previous pagePerson No. samples Accuracy No. samples Accuracy

classified (%) classified (%)ID correctly ALRFV correctly SSPD25 29 48.33 38 63.3326 29 48.33 37 61.6627 28 46.66 37 61.6628 30 50.00 38 63.3329 40 66.66 39 65.0030 41 68.33 40 66.6631 40 66.66 40 66.6632 28 46.66 35 58.3333 26 43.33 36 60.0034 26 43.33 35 58.3335 35 58.33 37 61.6636 35 58.33 36 60.0037 28 46.66 38 63.3338 31 51.66 37 61.6639 33 55.00 36 60.0040 35 58.33 40 66.6641 31 51.66 38 63.3342 34 56.66 37 61.6643 35 58.33 35 58.3344 36 60.00 36 60.0045 40 66.66 42 70.0046 40 66.66 40 66.6647 43 71.66 43 71.6648 38 63.33 40 66.6649 37 61.66 40 66.6650 36 60.00 42 70.0051 35 58.33 42 70.0052 38 63.33 40 66.6653 40 66.66 41 68.3354 41 68.33 41 68.3355 42 70.00 41 68.3356 38 63.33 42 70.0057 40 66.66 42 70.0058 42 70.00 42 70.0059 40 66.66 40 66.6660 35 58.33 40 66.66Overall Recognition 61% 64.83%

Table 7.2: Classification results using c-Means Algorithm on AT&T database.


ID correctly ALRFV correctly SSPD1 3 30.00 4 40.00


136



classified (%) classified (%)ID correctly ALRFV correctly SSPD2 5 50.00 5 50.003 6 60.00 6 60.004 6 60.00 7 70.005 8 80.00 8 80.006 6 60.00 7 70.007 7 70.00 8 80.008 7 70.00 7 70.009 6 60.00 6 60.0010 7 70.00 8 80.0011 6 60.00 8 80.0012 7 70.00 6 60.0013 6 60.00 8 80.0014 7 70.00 6 60.0015 7 70.00 7 70.0016 6 60.00 7 70.0017 7 70.00 7 70.0018 6 60.00 6 60.0019 8 80.00 6 60.0020 6 60.00 8 80.0021 7 70.00 6 60.0022 6 60.00 6 60.0023 6 60.00 7 70.0024 7 70.00 6 60.0025 7 70.00 8 80.0026 8 80.00 8 80.0027 7 70.00 8 80.0028 6 60.00 9 90.0029 7 70.00 8 80.0030 6 60.00 7 70.0031 7 70.00 7 70.0032 6 60.00 8 80.0033 7 70.00 7 70.0034 6 60.00 7 70.0035 7 70.00 6 60.0036 6 60.00 7 70.0037 7 70.00 6 60.0038 6 60.00 6 60.0039 6 60.00 7 70.0040 5 50.00 6 60.00Overall Recognition 64.25% 68.75%

The recognition results indicate the credibility of the extracted features on

the basis of clusters that can be formed with the help of an unsupervised

learning process. The cluster centers formed from the training set show that

137


the extracted features are good enough to distinguish the face patterns from

one another.

0 5 10 15 20 25 30 35 40 45 50 55 600

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n

ALR Feature VectorSSPD Feature Vector

(a) Results on KNUFDB database

0 4 8 12 16 20 24 28 32 36 400

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n


(b) Results on AT&T database

Figure 7.1: Recognition accuracies for c-Means classifier using SSPD & ALR fea-ture vectors

138


The overall recognition accuracies obtained using c-Means clustering tech-

nique based on SSPD features and ALR Feature Vectors are 61.00%, and

64.83% respectively on KNUFDB and 64.25% & 68.75% on AT&T face databases.

The alternative classifier used in this study is the well-known nonparametric

k-Nearest Neighbour statistical classifier. The following section describes the

recognition experiments performed using the above said features and k-NN

classifier.

7.3 Statistical pattern classification

In the statistical pattern classification process, each pattern is represented by a

d -dimensional feature vector and it is viewed as a point in the d -dimensional

space. Given a set of training patterns from each class, the objective is to

establish decision boundaries in the feature space which separate patterns be-

longing to different classes. The recognition system is operated in two phases,

training (learning) and classification (testing). The following section describes

the pattern recognition experiment conducted for the recognition of human

faces using k-NN classifier.

7.3.1 k-Nearest Neighbour classifier for face recognition

Pattern classification by distance functions is one of the earliest concepts in

pattern recognition [Tou & Gonzalez 1975],[Friedman & Kandel 1999]. Here

the proximity of an unknown pattern to a class serves as a measure of its

classification. A class can be characterized by single or multiple prototype

pattern(s). The k-Nearest Neighbour method is a well-known non-parametric

classifier, where a posteriori probability is estimated from the frequency of

nearest neighbours of the unknown pattern. It considers multiple prototypes

while making a decision and uses a piecewise linear discriminant function.

139


Various pattern recognition studies with first-rate performance accuracy are

also reported based on this classification technique [Ray & Chatterjee 1984],

[Zhang & Srihari 2004], [Pernkopf 2005].

Consider the case of m classes ci, i = 1 . . .m and a set of N samples

patterns yi, i = 1, 2, . . . , N whose classification is a priori known. Let x

denote an arbitrary incoming pattern. The nearest neighbour classification

approach classifies x in the pattern class of its nearest neighbour in the set yi,

i = 1, . . . N i.e.,

If ∥x− yj∥2 = min∥x− yi∥2 , 1 ≤ i ≤ N , then x ∈ cj

This scheme can be termed as 1 − NN rule since it employs only one near-

est neighbour to x for classification. This can be extended by considering

the k nearest neighbours to x and using a majority-rule type classifier. The

algorithm 5 summarizes the classification process.

Algorithm 5 Minimum distance k -Nearest Neighbor Classifier.

Input:N -the number of pre-classified patternsm - the number of pattern classes.(yi, ci), 1 ≤ i ≤ N - N ordered pairs, where yi is the ith

pre-classified pattern and ci its class number (1 ≤ ci ≤ m, ∀i).k -the order of NN classifierx-an incoming pattern.

Output: L - the number of class into which x is classified.

Step 1: Set S = {yi, ci}, i = 1, . . . , N

Step 2: Find (yj, cj) ∈ S which satisfies ∥x−yj∥2 = min∥x−yi∥2,where1 ≤i ≤ m.

Step 3: If k = 1 set L = cj and stop; else initializean m - dimensional vector I: I(i′) = 0, i′ ̸= cj; I(cj) = 1, 1 ≤ i ≤ m andset S = S − {(yj, cj)}

140


Algorithm 5 contd. . . .

Step 4: For i0 = 1, 2 . . . , k − 1 do steps 5-6.

Step 5: Find (yj, cj) ∈ S such that ∥x− yj∥2 = min∥x− yi∥2,where 1 ≤ i ≤ N.

Step 6: Set I(cj) = I(cj) + 1 and S = S − {(yj, cj)}.

Step 7: Set L = max{I(i′)} , 1 ≤ i′ ≤ m and stop.

In the case of k-Nearest Neighbour classifier, we compute the distance of

similarity between the features of a test sample and the features of every train-

ing sample. The class of the majority among the k-nearest training samples is

deemed as the class of the test sample.


The recognition experiment is conducted by simulating the kNN algorithm

using MATLAB. The State Space Point Distribution (SSPD) parameters ex-

tracted from the gray-scale face images as discussed in chapter 5 and ALR

Feature Vector extracted as explained in chapter 6 are used for recognition

purpose. The face images of the KNUFDB face database as well as AT&T

face database are used in the simulation study. The recognition accuracies

based on the above said features ( SSPD & ALR features )using k-NN classifier

are given in Table 7.3 and in Table 7.4 respectively. Graphical representation

of these recognition results based on the two features using k-NN classifier is

shown in figure 7.2.

Table 7.3: Classification results using k-NN algorithm on KNUFDB database.


ID correctly ALRFV correctly SSPD1 18 60.00 16 53.33


141



classified (%) classified (%)ID correctly ALRFV correctly SSPD2 16 53.33 19 63.333 18 60.00 18 60.004 18 60.00 18 60.005 17 56.67 20 66.676 17 56.67 20 66.677 18 60.00 20 66.678 19 63.33 19 63.339 19 63.33 19 63.3310 19 63.33 20 66.6711 18 60.00 22 73.3312 18 60.00 22 73.3313 19 63.33 20 66.6714 19 63.33 20 66.6715 21 70.00 19 63.3316 17 56.67 19 63.3317 18 60.00 19 63.3318 25 83.33 18 60.0019 20 66.67 19 63.3320 18 60.00 18 60.0021 19 63.33 19 63.3322 16 53.33 21 70.0023 18 60.00 21 70.0024 20 66.67 20 66.6725 21 70.00 22 73.3326 23 76.67 18 60.0027 20 66.67 18 60.0028 20 66.67 18 60.0029 22 73.33 19 63.3330 20 66.67 19 63.3331 22 73.33 22 73.3332 22 73.33 20 66.6733 20 66.67 20 66.6734 20 66.67 20 66.6735 19 63.33 18 60.0036 19 63.33 20 66.6737 19 63.33 19 63.3338 18 60.00 20 66.6739 19 63.33 22 73.3340 18 60.00 19 63.3341 19 63.33 18 60.0042 18 60.00 18 60.0043 19 63.33 19 63.3344 18 60.00 19 63.3345 17 56.67 21 70.0046 18 60.00 17 56.6747 18 60.00 16 53.3348 18 60.00 25 83.3349 16 53.33 20 66.67


142



classified (%) classified (%)ID correctly ALRFV correctly SSPD50 19 63.33 18 60.0051 18 60.00 19 63.3352 19 63.33 20 66.6753 16 53.33 18 60.0054 16 53.33 20 66.6755 20 66.67 20 66.6756 20 66.67 20 66.6757 21 70.00 20 66.6758 20 66.67 22 73.3359 24 80.00 25 83.3360 24 80.00 24 80.00Overall Recognition 65.89% 67.22%

Table 7.4: Classification results using k-NN algorithm on AT&T database.


ID correctly ALRFV correctly SSPD1 2 40.00 2 40.002 3 60.00 3 60.003 3 60.00 3 60.004 3 60.00 4 80.005 3 60.00 4 80.006 3 60.00 4 80.007 2 40.00 3 60.008 4 80.00 4 80.009 3 60.00 4 80.0010 4 80.00 4 80.0011 2 40.00 4 80.0012 3 60.00 4 80.0013 3 60.00 4 80.0014 4 80.00 3 60.0015 3 60.00 3 60.0016 4 80.00 4 80.0017 3 60.00 4 80.0018 4 80.00 3 60.0019 3 60.00 4 80.0020 4 80.00 3 60.0021 3 60.00 4 80.0022 3 60.00 3 60.0023 3 60.00 4 80.0024 4 80.00 4 80.0025 4 80.00 4 80.0026 5 100.00 4 80.0027 4 80.00 4 80.0028 3 60.00 4 80.00


143



classified (%) classified (%)ID correctly ALRFV correctly SSPD29 4 80.00 4 80.0030 4 80.00 4 80.0031 4 80.00 4 80.0032 3 60.00 3 60.0033 4 80.00 4 80.0034 5 100.00 4 80.0035 5 100.00 4 80.0036 4 80.00 4 80.0037 4 80.00 4 80.0038 5 100.00 4 80.0039 4 80.00 4 80.0040 4 80.00 4 80.00Overall Recognition 71.00% 74.50%

0 6 12 18 24 30 36 42 48 54 600

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n



contd. . .

144


0 4 8 12 16 20 24 28 32 36 400

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n



Figure 7.2: Recognition accuracies for kNN classifier using SSPD & ALR featurevectors

The overall recognition accuracies obtained using k-NN classifier and SSPD

&ALR Feature Vectors are 65.89% & 67.22% and 71.0 % & 74.5% on KNUFDB

and AT & T face databases respectively. The recognition results are found

better than the previous experiment conducted using c-Means clustering tech-

nique.

7.3.3 Bayesian classifier for face recognition

In this section we present a probabilistic approach to face recognition. As

is true in most fields that deal with measuring and interpreting physical

events, probability consideration become important in pattern recognition

because of the randomness under which pattern classes normally are gener-

ated [Haykin 2001], [Gonzales & Woods 2002]. It is also possible to derive a

classification approach that is optimal in the sense that, on average, its use

145


yields the lowest probability of committing classification errors. The proba-

bility that a particular pattern x belongs to class ci is denoted by p(ci/x). If

the pattern classifier decides that x is in cj when it actually belongs to ci, it

incurs a loss, Lij. As pattern x may belong to any one of N classes under

consideration, the average loss incurred in assigning x to class cj is

rj(x) =N∑k=1

Lkjp(ck/x) (7.3.1)

This equation can be re-written as

rj(x) =1

p(x)

N∑k=1

Lkjp(x/ck)P (ck) (7.3.2)

where p(x/ck) is the p.d.f. of the patterns from class ck and P (ck) is the

probability of occurrence of class ck. Because 1/p(x) is positive and common

to all the rj(x), j = 1, 2 . . .m. It can be dropped from equation ( 7.3.2) without

affecting the relative order of these functions from the smallest to the largest

value. The expression for the average loss then reduces to

rj(x) ≃N∑k=1

Lkjp(x/ck)P (ck) (7.3.3)

The classifier hasN possible classes to choose from any given unknown pattern.

If it computes r1(x), r2(x), . . . , rm(x) for each pattern x and assigns the pattern

to the class with the smallest loss, the total average loss with minimum. The

classifier that minimizes the total average loss is called the Bayes classifier.

Thus the Bayes classifier assigns an unknown classifier pattern x to class ci if

ri(x) < rj(x) for j = 1, 2, . . .m, j ̸= i. i.e., x is assigned to class ci if

N∑k=1

Lkjp(x/ck)P (ck) <N∑q=1

Lqjp(x/cq)P (cq) (7.3.4)

For all j; j ̸= i. The loss for a correct decision generally is assigned a value

of zero, and the loss for any incorrect decision usually is assigned the same

146


non-zero value (say, 1). Under these conditions, the loss function becomes

Lij = 1− δij (7.3.5)

where δij = 1 if i = j and δij = 0 if i ̸= j. Equation 7.3.5 indicates a loss of

unity for incorrect decisions and a loss of zero for correct decision. Substituting

equation 7.3.5 into equation 7.3.3 yields

rj(x) =N∑k=1

(1− δkj)p(x/ck)P (ck)

= p(x)− p(x/cj)P (cj). (7.3.6)

The Bayesian classifier then assigns a pattern x to class cj if, for all j ̸= i,

p(x)− p(x/ci)P (ci) < p(x)− p(x/cj)P (cj) (7.3.7)

or, equivalently, if

p(x/ci)P (ci) < p(x/cj)P (cj), j = 1, 2, . . .m; j ̸= i. (7.3.8)

Thus Bayes classifier for a 0-1 loss function is nothing more than the compu-

tation of the decision functions of the form

dj(x) = p(x/cj)P (cj)j = 1, 2, . . .m; j ̸= i (7.3.9)

where a pattern vector x is assigned to the class whose decision function yields

the largest numerical value. The decision functions given in equation 7.3.9 are

optimal in the sense that they minimize the average loss in miss classification.

For this optimality to hold, however, the probability density functions of the

patterns in each class, as well as the probability of occurrence of each class,

must be known. The latter requirement usually is not a problem. For instance,

if all classes are equally likely to occur, then P (cj) = 1/m. We assumed the

147


p(x/cj) as Gaussian probability density function. In the n−dimensional case,

the Gaussian density of the vectors in the jth pattern class has the form

p(x/cj) =1

(2π)n/2|Cj|1/2e−

12(x−mj)

TCj−1(x−mj) (7.3.10)

where mean vector mj and covariance matrix Cj are given as follows

mj =1

Nj

∑x∈cj

x, and (7.3.11)

Cj =1

Nj

∑x∈cj

xxT −mjmTj (7.3.12)

Since Gaussian density is assumed, working with natural logarithm of the

decision function is more convenient. i.e. we can use the form

dj(x) =ln [p(x/cj)P (cj)]

= ln p(x/cj) + lnP (cj) (7.3.13)

On substitution of Equ. 7.3.9 into 7.3.13 yields

dj(x) = lnP (cj)−n

2ln 2π − 1

2ln |Cj| −

1

2

[(x−mj)

TC−1j (x−mj)

](7.3.14)

The term n2ln 2π is same for all classes, so it can be eliminated from the

above equation, which then becomes

dj(x) = lnP (cj)−1

2ln |Cj| −

1

2

[(x−mj)

TC−1j (x−mj)

](7.3.15)

for j = 1, 2, . . . ,m. This equation represents the Bayes decision functions for

Gaussian pattern classes under the condition of a 0 - 1 loss function.


The recognition experiment is conducted by simulating the Bayesian algo-

rithm using MATLAB. The State Space Point Distribution (SSPD) parame-

ters extracted from the gray-scale face images as discussed in chapter 5, ALR

148


Feature Vectors extracted as explained in chapter 6 are used for recognition

purpose. The face images of the KNUFDB face database as well as AT&T

face database are used in the simulation study. The recognition accuracies

based on the above said features ( SSPD & ALR features ) on these databases

using Bayesian classifier are given in Table 7.5 and in Table 7.6 respectively.

Graphical representation of these recognition results based on the two features

using Bayesian classifier is shown in figure 7.3

Table 7.5: Classification results using Bayesian classification algorithm onKNUFDB database.


ID correctly ALRFV correctly SSPD1 16 53.33 16 53.332 17 56.67 19 63.333 18 60.00 18 60.004 18 60.00 18 60.005 17 56.67 20 66.676 18 60.00 20 66.677 18 60.00 20 66.678 19 63.33 19 63.339 19 63.33 19 63.3310 19 63.33 20 66.6711 18 60.00 22 73.3312 18 60.00 22 73.3313 19 63.33 20 66.6714 19 63.33 20 66.6715 21 70.00 19 63.3316 17 56.67 19 63.3317 18 60.00 19 63.3318 25 83.33 18 60.0019 20 66.67 19 63.3320 18 60.00 18 60.0021 19 63.33 19 63.3322 16 53.33 21 70.0023 18 60.00 21 70.0024 20 66.67 20 66.6725 21 70.00 22 73.3326 23 76.67 18 60.0027 20 66.67 18 60.0028 20 66.67 18 60.0029 22 73.33 19 63.3330 20 66.67 19 63.3331 22 73.33 22 73.3332 22 73.33 20 66.67

continued on next page149



classified (%) classified (%)ID correctly ALRFV correctly SSPD33 20 66.67 20 66.6734 20 66.67 20 66.6735 19 63.33 18 60.0036 19 63.33 20 66.6737 19 63.33 19 63.3338 18 60.00 20 66.6739 19 63.33 22 73.3340 18 60.00 19 63.3341 19 63.33 18 60.0042 18 60.00 18 60.0043 19 63.33 19 63.3344 18 60.00 19 63.3345 17 56.67 21 70.0046 18 60.00 17 56.6747 18 60.00 16 53.3348 18 60.00 25 83.3349 16 53.33 20 66.6750 19 63.33 18 60.0051 18 60.00 19 63.3352 19 63.33 20 66.6753 16 53.33 18 60.0054 16 53.33 20 66.6755 20 66.67 20 66.6756 20 66.67 20 66.6757 21 70.00 20 66.6758 20 66.67 22 73.3359 24 80.00 25 83.3360 24 80.00 24 80.00Overall Recognition 63.61% 65.50%

Table 7.6: Classification results using Bayesian classification algorithm on AT&Tdatabase.


ID correctly ALRFV correctly SSPD1 1 20.00 2 40.002 2 40.00 3 60.003 3 60.00 3 60.004 3 60.00 3 60.005 3 60.00 4 80.006 3 60.00 3 60.007 3 60.00 4 80.008 3 60.00 3 60.009 4 80.00 4 80.0010 4 80.00 3 60.00


150



classified (%) classified (%)ID correctly ALRFV correctly SSPD11 4 80.00 3 60.0012 2 40.00 3 60.0013 3 60.00 5 100.0014 3 60.00 4 80.0015 4 80.00 4 80.0016 3 60.00 3 60.0017 3 60.00 3 60.0018 3 60.00 4 80.0019 3 60.00 3 60.0020 4 80.00 4 80.0021 2 40.00 5 100.0022 3 60.00 3 60.0023 3 60.00 4 80.0024 3 60.00 4 80.0025 4 80.00 4 80.0026 4 80.00 4 80.0027 5 100.00 5 100.0028 4 80.00 4 80.0029 3 60.00 5 100.0030 4 80.00 3 60.0031 4 80.00 4 80.0032 4 80.00 4 80.0033 3 60.00 3 60.0034 2 40.00 4 80.0035 5 100.00 4 80.0036 5 100.00 4 80.0037 5 100.00 3 60.0038 4 80.00 4 80.0039 5 100.00 3 60.0040 5 100.00 4 80.00Overall Recognition 69.00% 73.00%

The overall recognition accuracies obtained for the sixty individuals face

images using Bayesian classifier and SSPD & ALR Feature Vectors are 63.61%

& 65.50% and 69.00% & 73.00% on KNUFDB and AT & T face databases

respectively. The recognition results are found better than the previous ex-

periment conducted using c-Means clustering technique but not better than

k-NN classifier.

These three algorithms do not fully accommodate the small variations in

the extracted features. These results specify the need of improving the classi-

151


0 6 12 18 24 30 36 42 48 54 600

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n

ALR Feature Vector

SSPD Feature Vector


0 4 8 12 16 20 24 28 32 36 400

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n

ALR Feature Vector

SSPD Feature Vector


Figure 7.3: Recognition accuracies for Bayesian classifier using SSPD & ALR fea-ture vectors

152


fication algorithm for large class pattern classification problems. In the next

section we present a recognition study conducted using neural network that

is capable of adaptively accommodating the minor variations in the extracted

features.

7.4 Neural network for face recognition

In recent years, neural networks have been successfully applied in many of the

pattern recognition and machine learning systems [Ripley 1996], [Haykin 2001],

[Simpson 1990]. These models are composed of a highly interconnected mesh

of nonlinear computing elements, whose structure is drawn from analogies

with biological neural systems. Since the advent of Feed Forward Multi Layer

Perceptron (FFMLP) and error back propagation training algorithm, great im-

provements in terms of recognition performance and automatic training have

been achieved in the area of pattern recognition [Looney 1997]. In the present

study, we used a 3-layer architecture suitable for the classification module

using FFMLP for the recognition of human face images.

The following sections deal with the recognition experiments conducted

based on the feed-forward neural network for face recognition. A brief descrip-

tion about the diverse use of neural networks in pattern recognition followed

by the general ANN architecture is presented first. In the next section the

error back propagation algorithm used for training FFMLP is illustrated. The

final section deals with the neural network architecture used for the human

face pattern classification studies followed by the description of simulation

experiments and recognition results.

153


7.4.1 Neural networks for pattern recognition

Artificial Neural Networks (ANN) can be most adequately characterized as

computational models with particular properties such as the ability to adapt

or learn, to generalize, to cluster or organize data, based on a massively paral-

lel architecture. The history of ANNs starts with the introduction of simplified

neurons in the work of McCulloch and Pitts [McCulloch & Pitts 1943]. These

neurons were presented as models of biological neurons and as conceptual

mathematical neurons like threshold logic devices that could perform com-

putational task. The work of Hebb further developed the understanding of

the neural model [Hebb 1949]. Hebb proposed a qualitative mechanism de-

scribing the process by which synaptic connections are modified in order to

reflect the learning process undertaken by interconnected neurons, when they

are influenced by some environmental stimuli. Rosenblatt with his percep-

tron model, further enhanced our understanding of artificial learning devices

[Rosenblatt 1959]. However, the analysis by Minsky and Papert in their work

on perceptrons, in which they showed the deficiencies and restrictions ex-

isting in these simplified models, caused a major set back in this research

area [Minsky & Papert 1969]. ANNs attempt to replicate the computational

power (low level arithmetic processing ability) of biological neural networks

and, there by, hopefully endow machines with some of the (higher-level) cog-

nitive abilities that biological organisms possess. These networks are reputed

to possess the following basic characteristics:

• Adaptiveness: the ability to adjust the connection strengths to new data

or information

• Speed: due to massive parallelism

• Robustness: to missing, confusing, and/ or noisy data

154


• Optimality: regarding the error rates in performance

Several neural network learning algorithms have been developed in the past

years. In these algorithms, a set of rules defines the evolution process under-

taken by the synaptic connections of the networks, thus allowing them to learn

how to perform specified tasks. The following sections provide an overview of

neural network models and discuss in more detail about the learning algo-

rithm used in classifying the face images, namely the Back-propagation (BP)

learning algorithm.

7.4.2 General ANN architecture

A neural network consists of a set of massively interconnected processing el-

ements called neurons. These neurons are interconnected through a set of

connection weights, or synaptic weights. Every neuron i has Ni inputs, and

one output Yi. The inputs labeled si1, si2, . . . , siNi represent signals coming

either from other neurons in the network, or from external world. Neuron i

has Ni synaptic weights, each one associated with each of the neuron inputs.

These synaptic weights are labeled wi1, wi2, . . . , wiNi, and represent real valued

quantities that multiply the corresponding input signal. Also every neuron i

has an extra input, which is set to a fixed value θ, and is referred to as the

threshold of the neuron that must be exceeded for there to be any activation in

the neuron. Every neuron computes its own internal state or total activation,

according to the following expression,

xj =

Ni∑i=1

wijsij + θι , j = 1, 2, . . . ,M (7.4.1)

whereM is the total number of Neurons and Ni is the number of inputs to each

neuron. Figure 7.4 shows a schematic description of the neuron. The total

activation is simply the inner product of the input vector Si = [si0, si1, , siNi]T

155


by the weight vector Wi = [wi0, wi1, . . . , wiNi]T . Every neuron computes its

output according to a function Yi = f(xi), also known as threshold or activa-

tion function. The exact nature of f will depend on the neural network model

under study.

Figure 7.4: Simple neuron representation

In the present study, we use a mostly applied sigmoid function in the

thresholding unit defined by the expression,

S(x) =1

1 + e−ax(7.4.2)

This function is also called S-shaped function. It is a bounded, monotonic,

non-decreasing function that provides a graded non-linear response as shown

in figure 7.5. The network topology used in the present study is the feed for-

ward network. In this architecture the data flow from input to output units

strictly feed forward, the data processing can extend over multiple layers of

units but no feed back connections are present. This type of structure incorpo-

rates one or more hidden layers, whose computation nodes are correspondingly

156


Figure 7.5: Sigmoid threshold function

called hidden neurons or hidden nodes. The function of the hidden nodes is to

intervene between the external input and the network output. By adding one

or more layers, the network is able to extract higher-order statistics. The abil-

ity of hidden neurons to extract higher-order statistics is particularly valuable

when the size of the input layer is large. The structural architecture of the

neural network is intimately linked to the learning algorithm used to train the

network. In this study we used Error Back-propagation learning algorithm to

train the input patterns in the multilayer feed forward neural network. The

detailed description of the learning algorithm is given in the following section.

7.4.3 Back-propagation algorithm for training feed-forwardmultilayer perceptron (FFMLP)

The back propagation algorithm (BP) is the most popular method for neural

network training and it has been used to solve numerous real life problems. In

a multilayer feed forward neural network BP performs iterative minimization

157


of a cost function by making weight connection adjustments according to the

error between the computed and desired output values. Figure 7.6 shows a

general three layer network.

Figure 7.6: A general three layer network

The following relationships for the derivation of the back-propagation hold:

Ok =1

1 + e−netk

netk =∑k

wikOj

Oj =1

1 + e−netj

netj =∑j

wijOi (7.4.3)

where Ok indicate the output vector of the kth output layer and netk the

activation vector of the kth output layer. The cost function (error function)

is defined as the mean square sum of differences between the output values of

the network and the desired target values. The following formula is used for

158


this error computation,

E =1

2

∑p

(∑k

(tpk −Opk)2) (7.4.4)

where p is the subscript representing the pattern and k represents the output

units. In this way, tpk is the target value of output unit k for pattern p and Opk

is the actual output value of layer unit k for pattern p. During the training

process a set of feature vectors corresponding to each pattern class is used.

Each training pattern consists of a pair with the input and corresponding

target output. The patterns are presented to the network sequentially, in an

iterative manner. The appropriate weight corrections are performed during the

process to adapt the network to the desired behavior. The iterative procedure

continues until the connection weight values allow the network to perform the

required mapping. Each presentation of whole pattern set is named as epoch.

The minimization of the error function is carried out using the gradient-

descent technique. The necessary corrections to the weights of the network

for each iteration n are obtained by calculating the partial derivative of the

error function in relation to each weight wjk, which gives a direction of steepest

descent. A gradient vector representing the steepest increasing direction in the

weight space is thus obtained. Due to the fact that a minimization is required,

the weight update value ∆wjk uses the negative of the corresponding gradient

vector component for that weight. The delta rule determines the amount of

weight update based on this gradient direction along with a step size,

∆wjk = −η∂E

∂wjk

(7.4.5)

The parameter η represents the step size and is called the learning rate. The

partial derivative is equal to,

∂E

∂wjk

=∂E

∂Ok

∂Ok

∂netk

∂netk∂wjk

= (tk −Ok)Ok(1−Ok)Oj (7.4.6)

159


The error signal δk is defined as,

δk = (tk −Ok)Ok(1−Ok) (7.4.7)

so that the delta rule formula becomes:

∆wjk = ηδkOj (7.4.8)

For the hidden neuron, the weight change of wij is obtained in a similar way.

A change to the weight, wij, changes Oj and this changes the inputs into each

unit k, in the output layer. The change in E with a change in wij is therefore

the sum of the changes to each of the output units. The chain rules produces:

∂E

∂wij

=∑k

∂E

∂Ok

∂Ok

∂netk

∂netk∂Oj

∂Oj

∂netj

∂netj∂wij

=∑

−(tk −Ok)Ok(1−Ok)wjkOj(1−Oj)Oi

= −OiOj(1−Oj)∑k

δkwjk (7.4.9)

So that defining the error δj as:

δj = Oj(1−Ok)∑k

δkwjk (7.4.10)

we have the weight change in the hidden layer equal to:

∆wij = ηδjOi (7.4.11)

The δk for the output units can be calculated using directly available values,

since the error measure is based on the difference between the desired output

tk and the actual output Ok. However, that measure is not available for the

hidden neurons. The solution is to back-propagate the δk values, layer by

layer through the network, so that finally the weights are updated. A momen-

tum term was introduced in the back-propagation algorithm by Rumelhart

160


[Rumelhart et al. 1986]. Here the present weight is modified by incorporating

the influence of the passed iterations. Then the delta rule becomes

∆wij(n) = −η∂E

∂wjk

+ α∆wij(n− 1) (7.4.12)

where α is the momentum parameter and determines the amount of influence

from the previous iteration on the present one. The momentum introduces a

damping effect on the search procedure, thus avoiding oscillations in irregu-

lar areas of the error surface by averaging gradient components with opposite

sign and accelerating the convergence in long flat areas. In some situations it

possibly avoids the search procedure from being stopped in a local minimum,

helping it to skip over those regions without performing any minimization

there. Momentum may be considered as an approximation to a second-order

method, as it uses information from the previous iterations. In some applica-

tions, it has been shown to improve the convergence of the back propagation

algorithm.

The following section describes the simulation experiments and results ob-

tained using ANN classifier, and ALR and SSPD feature parameters discussed

in the previous chapters.


Present study investigates the recognition capabilities of the above explained

FFMLP-based Face Recognition system. For this purpose the multilayer feed

forward neural network is simulated with the Back propagation learning al-

gorithm. A constant learning rate, 0.00001, is used. The initial weights are

obtained by generating random numbers less than one. The number of nodes

in the input layer is fixed according to the feature vector size. The recognition

experiment is repeated by changing the number of hidden layers and number

161


Table 7.7: ANN learning parameters for ALR feature vector on KNUFDB & AT&T face databases.

Value for

Parameter

AT&T KNUFDB

Input nodes 40 40Output nodes 40 60Hidden nodes 1 1Performance function MSE MSEError goal 0.01 0.01Transformation function Sigmoid SigmoidLearning Rate 0.00001 0.00001Number of epochs 19800 21380Momentum Constant 0.3 0.3

of nodes in each hidden layer. By trial and error experiment, the number of

hidden layers is fixed as one and the number of nodes in the hidden layer is set

appropriately for obtaining the successful architecture in the present study.

The training process is terminated when the MSE is less than ε or the

number of epochs exceeds T . The error tolerance ε is fixed as 0.01 and num-

ber of optimum epochs T has been found by trial and error experiment. The

network is trained using the SSPD and ALR Feature Vectors separately. We

have used a set of 3600 samples of the 110 individuals for iteratively comput-

ing the final weight matrix and a disjoint set of face image patterns of same

size from the KNUFDB database for recognition purpose. The final training

parameters after a successful epoch are given below in tables 7.7 and 7.8.

The recognition accuracies obtained for the sixty persons in the KNUFDB

and AT&T face databases are tabulated in Tables 7.9 and 7.10 using the

above said features and artificial neural network. The graphical representation

of these recognition results based on different features using artificial neural

network is shown in figure 7.7

162


Table 7.8: ANN learning parameters for SSPD feature vector on KNUFDB & AT&T face databases.

Value for

Parameter

AT&T KNUFDB

Input nodes 16 16Output nodes 40 60Hidden nodes 16 16Performance function MSE MSEError goal 0.01 0.01Transformation function Sigmoid SigmoidLearning Rate 0.00001 0.00001Number of epochs 14500 20450Momentum Constant 0.3 0.3

Table 7.9: Classification results using ANN on KNUFDB database.


ID correctly ALRFV correctly SSPD1 16 53.33 18 60.002 18 60.00 19 63.333 18 60.00 20 66.674 19 63.33 20 66.675 20 66.67 22 73.336 20 66.67 20 66.677 20 66.67 24 80.008 19 63.33 20 66.679 19 63.33 20 66.6710 20 66.67 24 80.0011 22 73.33 21 70.0012 22 73.33 20 66.6713 20 66.67 21 70.0014 20 66.67 24 80.0015 24 80.00 21 70.0016 23 76.67 23 76.6717 20 66.67 22 73.3318 20 66.67 23 76.6719 19 63.33 22 73.3320 21 70.00 23 76.6721 19 63.33 22 73.3322 21 70.00 24 80.0023 19 63.33 25 83.3324 24 80.00 23 76.6725 25 83.33 24 80.0026 26 86.67 25 83.33


163



classified (%) classified (%)ID correctly ALRFV correctly SSPD27 23 76.67 23 76.6728 26 86.67 25 83.3329 23 76.67 24 80.0030 25 83.33 26 86.6731 24 80.00 26 86.6732 25 83.33 25 83.3333 26 86.67 23 76.6734 23 76.67 24 80.0035 26 86.67 26 86.6736 23 76.67 24 80.0037 25 83.33 22 73.3338 24 80.00 26 86.6739 25 83.33 21 70.0040 29 96.67 24 80.0041 24 80.00 26 86.6742 25 83.33 25 83.3343 26 86.67 25 83.3344 23 76.67 25 83.3345 26 86.67 24 80.0046 23 76.67 25 83.3347 25 83.33 28 93.3348 24 80.00 29 96.6749 25 83.33 24 80.0050 26 86.67 25 83.3351 24 80.00 26 86.6752 22 73.33 23 76.6753 26 86.67 26 86.6754 21 70.00 23 76.6755 24 80.00 25 83.3356 26 86.67 24 80.0057 25 83.33 25 83.3358 25 83.33 26 86.6759 25 83.33 28 93.3360 25 83.33 28 93.33Overall Recognition 76.17 % 78.83%

Table 7.10: Classification results using ANN on AT&T face database.


ID correctly ALRFV correctly SSPD1 2 40.00 3 60.002 3 60.00 4 80.003 4 80.00 5 100.004 5 100.00 5 100.00


164



classified (%) classified (%)ID correctly ALRFV correctly SSPD5 5 100.00 5 100.006 5 100.00 5 100.007 5 100.00 5 100.008 4 80.00 5 100.009 5 100.00 5 100.0010 5 100.00 5 100.0011 5 100.00 5 100.0012 5 100.00 5 100.0013 5 100.00 5 100.0014 5 100.00 5 100.0015 5 100.00 5 100.0016 5 100.00 5 100.0017 5 100.00 5 100.0018 5 100.00 5 100.0019 5 100.00 5 100.0020 5 100.00 5 100.0021 5 100.00 5 100.0022 5 100.00 5 100.0023 5 100.00 5 100.0024 5 100.00 5 100.0025 5 100.00 5 100.0026 5 100.00 5 100.0027 5 100.00 5 100.0028 5 100.00 5 100.0029 5 100.00 5 100.0030 5 100.00 5 100.0031 5 100.00 5 100.0032 5 100.00 5 100.0033 5 100.00 5 100.0034 5 100.00 5 100.0035 5 100.00 5 100.0036 5 100.00 5 100.0037 5 100.00 5 100.0038 5 100.00 5 100.0039 5 100.00 5 100.0040 5 100.00 5 100.00Overall Recognition 96.50% 98.50%

165


0 6 12 18 24 30 36 42 48 54 600

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n

ALR Feature Vector

SSPD Feature Vector


0 4 8 12 16 20 24 28 32 36 400

10

20

30

40

50

60

70

80

90

100

Person Class

% R

ecog

nitio

n

ALR Feature Vectot

SSPD Feature Vector


Figure 7.7: Recognition accuracies for ANN classifier using SSPD & ALR featurevectors

166


7.5 Conclusion

Human face recognition studies based on the parameters developed in chapter

5 and 6 using different classifiers are presented in this chapter. The cluster

analysis using the c-Means clustering technique is conducted and the tech-

nique is used for the recognition of face image patterns. The credibility of

the extracted parameters is also tested with the k-NN classifier and Bayesian

classifier. A connectionist model based recognition system by means of neural

network is then implemented and tested using SSPD and ALR Feature Vectors

extracted from the face images. The highest recognition accuracy (98.50%) is

obtained based on the SSPD Feature Vector using FFMLP classifier on AT&T

database. The results also specify the need for improving the classification al-

gorithm in order to fully accommodate the small variations present in the

extracted features. To this end, ALR Feature Vector and SSPD parameters

are further used for developing a Face Recognition System with the help of a

Support Vector Machine classifier.

167

Date post:	03-Feb-2022
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Chapter 7 Pattern Classi cation Algorithms for Face Recognition

Documents