+ All Categories
Home > Documents > Can a linear autoassociator recognize faces from new orientations?

Can a linear autoassociator recognize faces from new orientations?

Date post: 02-Oct-2016
Category:
Upload: herve
View: 213 times
Download: 0 times
Share this document with a friend
8
D. Valentin and H. Abdi Vol. 13, No. 4/April 1996/J. Opt. Soc. Am. A 717 Can a linear autoassociator recognize faces from new orientations? Dominique Valentin* University of Texas at Dallas, Richardson, Texas 75083-0688 Herv´ e Abdi University of Texas at Dallas, Richardson, Texas, and Universit ´ e de Bourgogne ` a Dijon, Dijon, France Received April 5, 1995; revised manuscript received September 20, 1995; accepted October 16, 1995 An often noted limitation of computational models of faces operating on two-dimensional pixel intensity representations is that they cannot handle changes in orientation. We show that this limitation can be overcome by the use of multiple views of a given face instead of a single view to represent the face. Specifically, we show that a linear autoassociator trained to reconstruct multiple views of a set of faces is able to recognize the faces from new view angles. An analysis of the internal representation of the memory (i.e., eigenvectors of the between-unit-connection weight matrix) shows a dissociation between two kinds of perceptual information: orientation and identity information. 1996 Optical Society of America 1. INTRODUCTION In recent years quite a number of computational ystatis- tical models of face processing have been proposed in the literature. These models have been successfully applied to a wide range of tasks such as image compression, face detection, categorization, recognition, and identification, as well as feature detection and selection (see Refs. 1 and 2 for reviews). Much of the effort going into these recent models has been concentrated on the processing of single frontal (or nearly frontal) two-dimensional (2D) pixel-based representations of faces. Consequently, their performance is sensitive to substantial variations in light- ing conditions, size, and position in the image. To avoid these problems, a preprocessing of the faces is necessary. However, this can be done in a relatively straightforward manner by the use of automatic algorithms for locating the faces in the images and normalizing them for size, lighting, and position (see e.g., Ref. 3). Most models in the literature assume that the pre- processing step has already been implemented and focus on the problem of face recognition per se. An additional and more drastic limitation of pixel-based representations is that they are not three-dimensionally invariant, and hence the performance of models operating on this type of representation across large changes in orientation is rather poor. The problem, therefore, is to know whether it is possible to extend existing models so that they can handle depth rotation. The methodology of this study is based on empirical evi- dence suggesting that human subjects trained with mul- tiple 2D views of unfamiliar faces handle depth rotation better than human subjects trained with single views of the same faces. 4,5 This could be due to several reasons. First, subjects could elaborate a three-dimensional (3D) invariant representation of the faces, such as the object- centered structural description described by Marr and Nishihara 6 for object recognition. However, although the object-centered approach seems relevant for the general domain of object recognition, its extension to the specific case of face recognition is problematic. Whereas object recognition requires stimuli to be assigned to broad cate- gories that maximize the physical similarities between ex- emplars (i.e., the “basic level categories” as defined by Rosch, 7 face recognition requires the discrimination of stimuli within the basic level category “face.” Object- centered models seem more appropriate for assigning objects to basic level categories than for discriminating instances within basic level categories. Some recent studies in computational theory of 3D ob- ject recognition suggest that, for basic level categories, objects might be more efficiently represented by a set of viewer-centered representations than by a single object- centered representation. 8 – 10 These representations de- pend on the position of the viewer relative to the object to be recognized and therefore are specific to the particu- lar viewpoint from which the object is perceived. When this principle is applied to the problem of face recogni- tion, a face could be represented in memory by a limited set of 2D view-dependent descriptions that correspond to its familiar orientations. Recognition could be achieved by transforming an input representation of a face to the orientation of the nearest stored representation 11 or, if the faces are represented by a series of views that are close enough to each other, by interpolating among those views. 12 – 14 In the series of simulations we report here, we trained an autoassociative memory with multiple views of a set of faces, and we tested its ability to generalize to new views of the learned faces. Autoassociative memories are a powerful tool for storing, recognizing, and catego- rizing faces represented as pixel-intensity images. 15 – 18 Their power stems mainly from the fact that they use extremely efficient and well-explored computational algo- rithms (i.e., eigenvalue and singular-value decomposition) and are easily analyzable in terms of traditional mathe- 0740-3232/96/040717-08$06.00 1996 Optical Society of America
Transcript

D. Valentin and H. Abdi Vol. 13, No. 4/April 1996 /J. Opt. Soc. Am. A 717

Can a linear autoassociator recognizefaces from new orientations?

Dominique Valentin*

University of Texas at Dallas, Richardson, Texas 75083-0688

Herve Abdi

University of Texas at Dallas, Richardson, Texas, and Universite de Bourgogne a Dijon, Dijon, France

Received April 5, 1995; revised manuscript received September 20, 1995; accepted October 16, 1995

An often noted limitation of computational models of faces operating on two-dimensional pixel intensityrepresentations is that they cannot handle changes in orientation. We show that this limitation can beovercome by the use of multiple views of a given face instead of a single view to represent the face.Specifically, we show that a linear autoassociator trained to reconstruct multiple views of a set of faces isable to recognize the faces from new view angles. An analysis of the internal representation of the memory(i.e., eigenvectors of the between-unit-connection weight matrix) shows a dissociation between two kinds ofperceptual information: orientation and identity information. 1996 Optical Society of America

1. INTRODUCTION

In recent years quite a number of computationalystatis-tical models of face processing have been proposed in theliterature. These models have been successfully appliedto a wide range of tasks such as image compression, facedetection, categorization, recognition, and identification,as well as feature detection and selection (see Refs. 1and 2 for reviews). Much of the effort going into theserecent models has been concentrated on the processingof single frontal (or nearly frontal) two-dimensional (2D)pixel-based representations of faces. Consequently, theirperformance is sensitive to substantial variations in light-ing conditions, size, and position in the image. To avoidthese problems, a preprocessing of the faces is necessary.However, this can be done in a relatively straightforwardmanner by the use of automatic algorithms for locatingthe faces in the images and normalizing them for size,lighting, and position (see e.g., Ref. 3).

Most models in the literature assume that the pre-processing step has already been implemented and focuson the problem of face recognition per se. An additionaland more drastic limitation of pixel-based representationsis that they are not three-dimensionally invariant, andhence the performance of models operating on this typeof representation across large changes in orientation israther poor. The problem, therefore, is to know whetherit is possible to extend existing models so that they canhandle depth rotation.

The methodology of this study is based on empirical evi-dence suggesting that human subjects trained with mul-tiple 2D views of unfamiliar faces handle depth rotationbetter than human subjects trained with single views ofthe same faces.4,5 This could be due to several reasons.First, subjects could elaborate a three-dimensional (3D)invariant representation of the faces, such as the object-centered structural description described by Marr andNishihara6 for object recognition. However, although the

0740-3232/96/040717-08$06.00

object-centered approach seems relevant for the generaldomain of object recognition, its extension to the specificcase of face recognition is problematic. Whereas objectrecognition requires stimuli to be assigned to broad cate-gories that maximize the physical similarities between ex-emplars (i.e., the “basic level categories” as defined byRosch,7 face recognition requires the discrimination ofstimuli within the basic level category “face.” Object-centered models seem more appropriate for assigningobjects to basic level categories than for discriminatinginstances within basic level categories.

Some recent studies in computational theory of 3D ob-ject recognition suggest that, for basic level categories,objects might be more efficiently represented by a set ofviewer-centered representations than by a single object-centered representation.8 – 10 These representations de-pend on the position of the viewer relative to the objectto be recognized and therefore are specific to the particu-lar viewpoint from which the object is perceived. Whenthis principle is applied to the problem of face recogni-tion, a face could be represented in memory by a limitedset of 2D view-dependent descriptions that correspond toits familiar orientations. Recognition could be achievedby transforming an input representation of a face to theorientation of the nearest stored representation11 or, ifthe faces are represented by a series of views that areclose enough to each other, by interpolating among thoseviews.12 – 14

In the series of simulations we report here, we trainedan autoassociative memory with multiple views of a setof faces, and we tested its ability to generalize to newviews of the learned faces. Autoassociative memoriesare a powerful tool for storing, recognizing, and catego-rizing faces represented as pixel-intensity images.15 – 18

Their power stems mainly from the fact that they useextremely efficient and well-explored computational algo-rithms (i.e., eigenvalue and singular-value decomposition)and are easily analyzable in terms of traditional mathe-

1996 Optical Society of America

718 J. Opt. Soc. Am. A/Vol. 13, No. 4 /April 1996 D. Valentin and H. Abdi

matical concepts and statistical techniques (i.e., least-squares estimation and principal components analysis).Because of these properties, autoassociative memories arealso a useful tool for compressing3,19 and analyzing20 theperceptual information in faces.

In the framework of image compression, the principal-components-analysis technique (or Karhunen–Loevetransform or singular-value decomposition, see Ref. 21)amounts to computing the eigenvectors of the pixel-covariance matrix of a set of images represented as gray-level vectors. The eigenvectors constitute an orthogonalbasis for representing (i.e., in the eigenspace) the im-ages. The images can be either perfectly representedby use of the complete eigenspace or estimated by useof a least-squares low-dimensional representation madeof the eigenvectors with the largest eigenvalues. Sincean object is represented as a point in a multidimensionalspace, its distance from the other objects in the spacecan be computed and used as a basis for the recogni-tion process (e.g., nearest-neighbor algorithm). This ap-proach has been applied recently by Murase and Nayar22

to the problems of 3D object recognition and by Pentlandet al.23 to 3D face recognition.

Murase and Nayar represented a set of objects byusing two different types of eigenspaces: a universaleigenspace and an object eigenspace. The universaleigenspace is used to identify the object presented asprompt. Once the object is identified as a particular ob-ject, it is projected onto the appropriate object eigenspaceand its orientation is determined.

A somewhat different approach was used by Pentlandet al.23 Instead of the universal and object eigenspacesproposed by Murase and Nayar, they used a multiple setof view-based eigenspaces. Each eigenspace is obtainedby computing the eigenvectors of an image set of differentindividuals in a common orientation. The orientation ofa face is first determined by computing its distance fromeach separate eigenspace. The projection coefficients ofthe face onto the closest eigenspace are then used toidentify the face by use of a nearest-neighbor algorithm.

The autoassociative model we present here is somewhatsimilar to the approach of both Murase and Nayar22 andPentland et al.23 in that using an autoassociative memoryto store and retrieve faces is equivalent to computing theeigendecomposition of the set of faces and representingthe faces as a weighted sum of eigenvectors.15 Specifi-cally, since the autoassociative memory is trained withmultiple views of a set of faces, it is equivalent to theuniversal eigenspace representation proposed by Muraseand Nayar, with the exception that in our application allthe eigenvectors of the set of faces are kept. The ratio-nale for keeping the complete eigenspace is that we aremore interested in using the eigenvector representationas a tool for analyzing the perceptual information in facesthan for compressing the information. In the second partof the series of simulations reported below, we show thatan advantage of having a representation in the completeuniversal eigenspace is that the information about boththe identity and the orientation of the faces is preserved.

This paper is organized as follows: after presentinga brief description of a face autoassociative memory, wereport a series of simulations showing that, when multi-ple 2D views of faces are used to train an autoassociative

memory, then the memory can deal with the problem ofdepth rotation. Next we analyze the perceptual proper-ties of the internal representation developed by the au-toassociative memory (i.e., the eigenvectors of the weightmatrix).

2. LINEAR AUTOASSOCIATORA linear autoassociator (or autoassociative memory) isa network of simple units interconnected by a set ofweighted connections. Learning occurs by modifying theweight of the connections. Autoassociative memories arecontent addressable because, when part of a stimulus isgiven as a memory key, the memory gives back the com-plete stimulus, filling in the missing components.

To store a face in an autoassociative memory (seeFig. 1), the face is first digitized as a pixel image and therows of the image concatenated to form an I 3 1 pixel vec-tor xk (where I is the number of pixels used to representa face and k indicates a particular face). Each numeri-cal element in xk is the gray level of the correspondingpixel. For computational convenience the vectors xk arenormalized so that xk

T xk ­ 1. The set of training facesis represented by an I 3 K matrix X obtained by con-catenating the pixel vectors (where K is the number oftraining faces). The faces are stored in the memory bysetting the weights of the connections between cells.These values are stored in an I 3 I matrix W with theuse of Hebbian learning. The Hebbian-learning rule in alinear associator sets the change of the connection weightsto be proportional to the product of the input and the out-put. For an autoassociator the changes are proportionalto the product of the input vector by its transpose (i.e.,the outer product of the vector and itself). With a pro-portionality constant equal to 1, W is obtained as

Fig. 1. Illustration of a face autoassociative memory.

D. Valentin and H. Abdi Vol. 13, No. 4/April 1996 /J. Opt. Soc. Am. A 719

W ­ XXT ­KX

k­1

xkxkT , (1)

where T denotes the transpose operation.Retrieval of a face is done by presenting the face as

input to the memory. Specifically, recall of the kth faceis achieved as

xk ­ Wxk , (2)

where xk represents the answer of the memory. Thequality of this answer can be estimated, either by visuallycomparing the reconstructed face with the original faceor, more formally, by computing the cosine of the anglebetween xk and xk:

cossxk, xkd ­xk

T xk

kxkk kxkk, (3)

where kxkk is the Euclidean norm of vector xk (i.e., kxkk ­pxk

T xk d. A cosine of 1 indicates a perfect reconstructionof the stimulus.

Figure 2 (middle row) displays the response of a mem-ory trained with Hebbian learning when a learned face, anew face, and a random pattern are presented as input.Clearly, the memory gives the same response for everystimulus, and hence it is not able to discriminate betweenfaces or even to discriminate a face from a random pat-tern. The performance of the memory can be improvedby using the Widrow–Hoff error-correction learning rule.This is equivalent to using a gradient descent method toadjust the weights of the connections so as to reduce thesquared error between the faces and their reconstructions.The values of the weights are first computed by use ofHebbian learning and then iteratively corrected by use ofthe error signal (i.e., the difference between the input faceand the answer of the memory). Specifically, the weightmatrix is expressed at time n 1 1 as

Wfn11g ­ Wfng 1 hsX 2 XW dXT , (4)

where n represents the iteration number and h is asmall positive constant (called the learning constant).Note that, as we shall detail below [see Eq. (6)], theWidrow–Hoff learning rule can be implemented moresimply by use of the eigendecomposition of W.24,25 Af-ter complete Widrow–Hoff learning, all the faces in thetraining set are perfectly reconstructed, and if new facesare used as memory cues, they are distorted as an in-verse function of their similarity with the training faces(see Fig. 2, bottom row).

Previous studies (see Ref. 26 for a review) showed thatautoassociative memories constitute a useful tool for rec-ognizing and categorizing faces. Yet the problem withthese previous studies is that, because they used singleviews of the faces (generally a frontal view) as trainingsets, memory performance was very sensitive to depthrotation. The first objective of the simulations that wepresent in Section 3 was to determine whether this limi-tation of autoassociative memories is a definite flaw orwhether it can be overcome by the use of multiple viewsas training sets.

3. SIMULATIONSThe purposes of the series of simulations presented in thissection are

≤ To test the ability of an autoassociative memoryto recognize faces from new orientations when differentnumbers of views of the faces are used as training sets.

≤ To analyze the perceptual properties of the internalrepresentation elaborated by a memory trained on multi-ple views of faces.

There are several ways to test the ability of an autoas-sociative memory to recognize faces. An easy one is todetermine whether the memory can distinguish betweenlearned and new faces. This can be done by training thememory to reconstruct a set of target face images. Aftercompletion of learning, a new set of face images, composedof an equal number of learned faces and new faces (or dis-tractors), is presented as input to the memory. For eachface, target or distractor, the quality of the answer of thememory is estimated [e.g., by computing the cosine be-tween input and output as described by Eq. (3)]. If, onthe average, the quality of reconstruction of target facesexceeds the quality of reconstruction of distractor faces,the model is said to distinguish between learned and un-learned face (i.e., the model recognizes the old faces).

This procedure was used in the first simulations to an-alyze the ability of the memory to generalize the informa-tion learned from a limited set of views of faces to newviews of the faces.

Fig. 2. Top row, three stimuli; middle row, responses producedby an autoassociative memory trained with Hebbian learning tothese stimuli; bottom row, responses produced by an autoasso-ciative memory trained with Widrow–Hoff learning to the samestimuli. The stimuli are from left to right a learned face, a newface, and a random pattern. When Hebbian learning is used,the memory produces the same response for all stimuli. Note,however, that the squared correlation between this answer andthe face stimuli (r2 ­ 0.82 for the learned face and r2 ­ 0.42 forthe new face) is larger than that between the answer and therandom pattern sr2 ­ 0d. When Widrow–Hoff learning is used,the learned face is perfectly reconstructed sr2 ­ 1d, and the newstimuli are distorted as an inverse function of their similarity tothe learned face sr2 ­ 0.62 for the new face and r2 ­ 0 for therandom pattern).

720 J. Opt. Soc. Am. A/Vol. 13, No. 4 /April 1996 D. Valentin and H. Abdi

Fig. 3. Examples of five stimuli used in the four-view learn-ing condition. The head samples the rotation in depth with, 20-deg steps.

A. Recognition Task

1. StimuliThirty female faces were used as stimuli. The faces wererepresented either by ten views sampling the rotation ofthe head from full face to profile in approximately 10-degsteps or by five views sampling the rotation of the headin approximately 20-deg steps (Fig. 3). The face imageswere first digitized with a resolution of 230 3 240 pixelswith 256 gray levels per pixel and then compressed by lo-cal averaging with a 2 3 2 window to give 115 3 120 pixelimages. The images were roughly aligned along the axisof the eyes so that the eyes of all faces were approximatelythe same height. None of the pictured faces had majordistinguishing characteristics such as clothes, jewelry, orglasses.

2. ProcedureThe procedure included two phases: a learning phaseand a testing phase in which we tested the ability ofthe memory to recognize learned faces presented fromnew view angles. Half the faces were used as targetsand the other half were used as distractors. During thelearning phase, the target faces were stored in an au-toassociative memory by means of complete Widrow–Hofflearning. The faces were represented by a single view(one-view condition), four views (four-view condition), ornine views (nine-view condition), sampling the rotationof the head from profile to full face. In all learningconditions, 15 faces were used as training faces. In theone-view and four-view conditions, one or four views, re-spectively, of each training face were randomly chosenfrom a set of five possible views (0, 22, 45, 66, and 90 degfrom full face) to form the training set. In the nine-viewcondition, nine views of each training face were randomlychosen from a set of ten possible views (0, 10, 20, 30,40, 50, 60, 70, 80, and 90 deg from full face) to form thetraining set. In all conditions the remaining views wereused as testing views. After learning, all the views ofthe targets were perfectly reconstructed (i.e., the cosinesbetween the original faces and their reconstructions wereequal to 1).

During the testing phase, new views of the target facesand all the views of the distractors were filtered throughthe memory. For each view a cosine was computed be-tween the original and reconstructed images [see Eq. (3)].The cosine provides an indication of the familiarity of themodel with the face. The higher the cosine is, the moreprobable it is that the face has been learned. The recog-nition task was implemented by setting a criterion co-sine and by categorizing every face with a cosine greaterthan the criterion as learned and every face with a cosinesmaller than the criterion as new. Different values of thecosine were used as the criterion to generate a receiveroperating characteristic27 (ROC) for each view of the faces.

This procedure was repeated until each view of thetarget faces had been used, in turn, as the new viewduring testing. Then all the previous target faces wereexchanged with the distractors, and the procedure was re-peated until each view of the new target faces had beenused as the new view during testing. This was done toincrease the size of the testing set. All the results re-ported below were computed on a total of 30 faces, eachtested in five view angles (0, 22, 45, 66, and 90 deg rota-tion from frontal view) for each of the three learning con-ditions (one-view, four-view, and nine-view conditions).

3. Results and DiscussionFigure 4 presents the quality of reconstruction of the facesas a function of the view orientation for each learningcondition. Three major points can be noted from thisfigure:

1. The greater the number of views used to representthe faces, the better the quality of reconstruction.

2. The views are not all equally well reconstructed.Specifically, the frontal and the profile views are less wellreconstructed than the intermediary views.

3. When only a single view of the faces is used to trainthe memory, there is no major difference in the qualityof reconstruction between targets and distractors. Whenfour views are used, the targets are better reconstructedthan the distractors. The difference between targets anddistractors increases with the number of views used torepresent the faces.

Fig. 4. Quality of reconstruction of the targets (solid curves)and distractors (dotted–dashed curves) as a function of learningcondition.

D. Valentin and H. Abdi Vol. 13, No. 4/April 1996 /J. Opt. Soc. Am. A 721

Fig. 5. ROC curve as a function of learning condition.

In summary, it seems that using several views of thefaces as a training set increases the general reconstruc-tion performance of the memory as well as its ability todiscriminate between learned faces presented from a newview angle and new faces. Besides this first approxima-tion, we also need to look at recognition performance ofthe memory as measured by the area under the ROC (alsocalled area under the curve). This is the signal-detectionmeasure used to assess the discriminability of the newstimuli. The area under the curve provides an unbiasedestimate of the proportion of correct classification, wherechance performance is 0.50 (Ref. 27).

Figure 5 displays recognition performance averagedacross view angles. Clearly, in the one-view conditionthe hit rate is equivalent to the false-alarm rate for allvalues of the criterion, and therefore the memory is notable to recognize the faces on which it has been trainedwhen those faces are presented in novel orientations(area under ROC ­ 0.51). In the four-view condition theperformance of the memory is slightly better (hit ratesystematically greater than false-alarm rate) but still notvery impressive (area under ROC ­ 0.56). In contrast,in the nine-view condition the memory is clearly able todiscriminate between learned and new faces (area underROC ­ 0.79).

This performance is somewhat poorer than the perfor-mance reported in previous work by Pentland et al.23 Us-ing a set of nine view-based eigenspaces to estimate firstthe orientation of the face and then to recognize it, theyobtained an average performance of 86% correct recog-nition. The model proposed by Pentland et al. differsin two ways from the autoassociative model describedhere. First, in the model of Pentland et al., separateeigendecompositions are used for different orientations ofthe faces, whereas a single eigendecomposition (univer-sal eigenspace) is used in our model. Although the viewbased solution is likely to be more accurate than the uni-versal solution, it has the disadvantages of (1) assumingthat the orientation of the faces is known at the timethe eigendecomposition is computed and (2) being com-putationally intensive since it requires as many eigen-decompositions as orientations. The advantages of theautoassociative model, besides its simplicity, are that (1)

it does not assume any prior knowledge about the orienta-tion of the faces and (2) (as we shall see in Subsection 3.B)it dissociates spontaneously the information relative tothe orientation of the faces from the information relativeto their identity.

The second difference between the model of Pentlandet al. and the autoassociative memory resides in the algo-rithm used to recognize the faces. Pentland et al. useda nearest-neighbor algorithm to decide whether a faceis recognized or not, whereas we used a signal-detectionmethod. In this framework the nearest-neighbor algo-rithm searches for the smallest Euclidean distance be-tween the projection onto the eigenspace of the face imageto be recognized and the projection of the learned faces.If this distance is smaller than a certain threshold, theface is recognized and classified as its nearest neighbor.If the smallest distance is above the threshold, then it ischaracterized as unknown.

The main difference between the signal-detection andnearest-neighbor methods is that the nearest-neighborone keeps an explicit record of the faces to be classi-fied, whereas the signal-detection method is based onlyon the responses of the memory. Although we can expectthese two algorithms to yield roughly comparable perfor-mance when the same information is used, further workis needed to assess the comparative merits of these meth-ods quantitatively.

Figure 6 displays the recognition performance of thememory when a frontal, a profile, and a 3y4 view ofthe faces were presented as input for the testing phase.Again, it appears that not all the views of the faces areequally easy to recognize. Although recognition perfor-mance is always greater in the nine-view condition thanin the two other view conditions for all view angles, it isfor the 3y4 view that this difference becomes very clear.In other words, it seems that 3y4 views are easier to rec-ognize than either profile or frontal views.

In summary, the results reported in this section in-dicate that when enough views of the faces are used astraining sets, an autoassociative memory is able to han-dle the problem of depth rotation.

B. Eigenvector RepresentationOne advantage of using an autoassociative memory tostore faces is that, since the weight matrix W is a cross-product matrix (and therefore is positive semidefinite), itcan be analyzed in terms of its eigendecomposition16,28 as

W ­ XXT ­ ULLLUT , UT U ­ I , (5)

where U is the matrix of eigenvectors of W and LLL is thediagonal matrix of eigenvalues. The eigendecomposition

Fig. 6. ROC curve as a function of learning condition andview angle. Dotted curves, one-view condition; dotted–dashedcurves, four-view condition; solid curves, nine-view condition.

722 J. Opt. Soc. Am. A/Vol. 13, No. 4 /April 1996 D. Valentin and H. Abdi

of W can be used also to analyze the Widrow–Hoff learn-ing rule. Specifically, Eq. (4) can be rewritten as [seeEq. (24)]

Wfn11g ­ UFFFfngUT , FFFfng ­ I 2 sI 2 hLLLdn, (6)

which shows that Widrow–Hoff learning affects only theeigenvalues of W. In particular, if h is properly chosen,25

FFFfng converges toward the unity matrix, and Wfng fromEq. (6) converges toward

Wf`g ­ UUT , (7)

which is equivalent to saying that the weight matrix issphericized24 (i.e., all of its eigenvalues are equal to 1).

Along the same lines, retrieval of a given face by thememory can be expressed as a weighted linear combina-tion of the eigenvectors of W. This is shown by combin-ing Eqs. (2) and (5) as

xk ­ Wxk ­ ULLLUT xk ­LP

l­1llulul

T xk , (8)

where the scalar ulT xk corresponds to the projection of

the kth face onto the lth eigenvector.If complete Widrow–Hoff learning is used [see Eq. (7)],

Eq. (8) reduces to

xk ­LP

l­1ulul

T xk . (9)

Interestingly, since the weight matrix is a pixel cross-product matrix, its eigenvectors can be graphically dis-played and visually analyzed. Previous studies usingthis type of analysis showed that

≤ The eigenvectors of a face autoassociative memoryare facelike and can be interpreted as some kinds ofmacrofeatures or building blocks from which the faces aremade.5,19

≤ A face can be approximated with a small number ofeigenvectors.3,19

≤ Different ranges of eigenvectors convey differenttypes of information. Specifically, eigenvectors withlarge eigenvalues convey mostly information useful forcategorizing the faces along general semantic dimen-sions such as sex or race. Conversely, eigenvectors withlow eigenvalues convey information useful for identifyingspecific faces.20,21

Using this eigendecomposition technique, we examinedthe perceptual information conveyed by different rangesof eigenvectors extracted from a cross-product face ma-trix built from 5 views of 30 female faces. As indicatedabove, the five views sampled the rotation of the head inapproximately 20-deg steps from full face to profile (seeFig. 3). By visually examining faces reconstructed withuse of either the first 10 eigenvectors (the ones with thelargest eigenvalues), a middle range of eigenvectors (11to 50), or the last 100 eigenvectors (the ones with thesmallest eigenvalues) and by looking at the projectionsof the faces onto individual eigenvectors, we found that

when multiple views of faces are used as input, the eigen-vectors with large eigenvalues capture information rela-tive to the orientations, and general shapes of the faces.These eigenvectors are useful in detecting the particularposes of the faces. In contrast, eigenvectors with inter-mediate eigenvalues contain information specific to smallsets of faces across orientations. Eigenvectors with thesmallest eigenvalues are specific to particular faces inparticular orientations (or in a limited range of orienta-tions). The dissociation between orientation and identityinformation is illustrated in Fig. 7.

An additional examination of individual eigenvectorsshowed that for our sample of faces, the first eigenvectorrepresents some kind of average across faces and poses.The second and third eigenvectors oppose profile views tofrontal views for all the faces. The projections of a faceonto these eigenvectors could be used as a way to detectthe orientation of a face presented as input to the memory.A face with a strong negative value on the second eigen-vector would be categorized as a profile view and a facewith a strong positive value as a frontal view. Likewise,a face with an intermediate negative value would be cate-gorized as an intermediate orientation between profile

Fig. 7. Frontal and profile views of a face reconstructed from topto bottom with all the eigenvectors, the first ten eigenvectors,eigenvectors 11–50, and the last 100 eigenvectors of a faceautoassociative memory.

D. Valentin and H. Abdi Vol. 13, No. 4/April 1996 /J. Opt. Soc. Am. A 723

Fig. 8. Illustration of the role of the second eigenvector in codingthe orientation of the faces. Adding the second eigenvector tothe first one gives rise to a profile view; subtracting it gives riseto a frontal view.

and 3y4, and a face with a positive value as an interme-diate orientation between frontal and 3y4 view. Finally,a face with a zero value would be likely to be a 3y4 view.As an illustration of the role of the second eigenvectorfor coding the orientation of the faces, Fig. 8 shows thatwhen the first two eigenvectors are combined by addition,a profile view of a face appears. Inversely, if we subtractthe first two eigenvectors, we create a frontal view.

From a theoretical point of view, the dissociation be-tween orientation-specific and identity-specific eigenvec-tors can be analyzed in terms of both the eigenvaluesassociated with the eigenvectors and the visual proper-ties of face images. First, the eigenvalues of the face au-toassociative memory correspond to the variance of theprojections of the faces onto the eigenvectors. The eigen-vector with the largest eigenvalue accounts for the largestproportion of variance in the face set, the eigenvectorwith the second largest proportion of variance accountsfor the second-largest proportion of variance, and so on.Therefore the eigenvectors with large eigenvalues cap-ture the perceptual information that is common to manyfaces, and those with small eigenvalues capture the per-ceptual information shared by only a few faces. Second,because faces are highly similar objects—they all sharethe same features arranged in roughly the same configu-ration—the correlation between two faces in the sameorientation (e.g., two frontal views) is larger than the cor-relation between two views of a given face (e.g., full faceand profile). Therefore the information relative to theorientation of the faces accounts for more variance in thepixel-by-pixel autoassociative memory than the informa-

tion relative to the identity of the faces. Consequently, itis captured by the eigenvectors with the largest eigenval-ues. Finally, since the eigenvectors are orthogonal, theeigenvectors with smaller eigenvalues capture the resid-ual information, which is the highly detailed informationuseful for discriminating among individual faces.

4. CONCLUSIONThe study reported here supports the proposition that it isnot necessary to use an explicit 3D representation of facesfor a simple linear autoassociator to recognize faces fromnew view angles. Faces can be represented by a limitedset of 2D pixel-based images. The only constraint is thatthe views need to be spaced closely enough for the model tointerpolate between two views and transfer to new views.

An interesting byproduct of using 2D pictures of facesin conjunction with an autoassociative memory is a spon-taneous dissociation of information relative to the positionof the face and the identity of the face. This dissociationis reminiscent of previous dissociations found between se-mantic components (e.g., gender, race) of face and identity(see Ref. 20). The method of displaying eigenvectors andcombining them, as shown in this paper, can in principlebe applied to any eigenvector-based methods and to ma-terials other than faces.

A last point worth noting is that this study is by na-ture exploratory, and so we did not try to optimize theperformance of the model. Further studies are neededto examine whether the performance of the model can beimproved by use of more faces or different sets of multi-ple views. We can imagine, for example, that the use ofa set of views sampling the rotation of the head with un-equal steps (smaller steps near the critical orientationsof 0 and 90 deg and greater steps for intermediate ori-entations) might help to increase the memory’s ability togeneralize to new views. Also, taking into account thesymmetrical aspect of faces might help to improve thegeneralization performance of the model. Finally, it ispossible that the problem of recognizing faces from a widerange of orientations is not a linear problem and thattherefore a nonlinear algorithm (or classification networkoperating on an eigenvector representation of the faces)would yield better recognition performance than boththe signal-detection approach used here or the nearest-neighbor algorithm used by Pentland et al.23

*Correspondence should be addressed to DominiqueValentin, School of Human Development, The Universityof Texas at Dallas, Richardson, Texas 75083-0688; e-mail,[email protected].

REFERENCES1. H. Abdi and D. Valentin, “Modeles neuronaux, connexion-

istes et numeriques pour la memoire des visages,” Psychol.Fr. 39, 375–391 (1994).

2. D. Valentin, H. Abdi, A. J. O’Toole, and G. W. Cottrell, “Con-nectionist models of face processing: a survey,” PatternRecog. 27, 1209–1230 (1994).

3. M. Turk and A. Pentland, “Eigenfaces for recognition,”J. Cogn. Neurosci. 3, 71–86 (1991).

4. W. D. Dukes and W. Bevan, “Stimulus variation and repeti-tion in the acquisition of naming responses,” J. Exp. Psychol.74, 178–181 (1967).

5. J. C. Bartlett and J. E. Leslie, “Aging and memory for faces

724 J. Opt. Soc. Am. A/Vol. 13, No. 4 /April 1996 D. Valentin and H. Abdi

versus single views of faces,” Mem. Cognit. 14, 371–381(1986).

6. D. Marr and H. K. Nishihara, “Representation and recogni-tion of the spatial organization of three dimensional shape,”Proc. R. Soc. London B 200, 269–294 (1978).

7. E. Rosch, “Principle of categorization,” in E. Rosch and B.Lloyd, eds., Cognition and Categorization (Erlbaum, Hills-dale, N.J., 1978), pp. 27–48.

8. S. Edelman and H. Bulthoff, “Orientation dependence in therecognition of familiar and novel views of three dimensionalobjects,” Vision Res. 32, 2385–2400 (1992).

9. D. G. Lowe, “Three-dimensional object recognition fromsingle two-dimensional images,” Artif. Intell. 31, 355–395(1987).

10. S. Ullman, “Aligning pictorial descriptions: an approach toobject recognition,” Cognition 32, 97–136 (1989).

11. M. J. Tarr and S. Pinker, “Mental rotation and orienta-tion dependence in shape recognition,” Cogn. Psychol. 21,233–282 (1990).

12. S. Edelman and D. Weinshall, “A self-organizing multiple-view representation of 3-D objects,” Biol. Cybern. 64,209–219 (1991).

13. H. Bulthoff and S. Edelman, “Psychological support for a twodimensional view interpolation theory of object recognition,”Proc. Nat. Acad. Sci. USA 89, 60–64 (1992).

14. T. Poggio and S. Edelman, “A network that learns to rec-ognize three-dimensional objects,” Nature 343, 263–266(1990).

15. A. Abdi, “A generalized approach for connectionist auto-associative memories: interpretation, implications andillustration for face processing,” in J. Demongeot, ed.,Artificial Intelligence and Cognitive Sciences (ManchesterU. Press, Manchester, UK, 1988).

16. T. Kohonen, Associative Memory: A System Theoretic Ap-proach (Springer-Verlag, Berlin, 1977).

17. A. J. O’Toole, R. B. Millward, and J. A. Anderson, “A physi-cal system approach to recognition memory for spatiallytransformed faces,” Neural Network 1, 179–199 (1988).

18. A. J. O’Toole, H. Abdi, K. A. Deffenbacher, and J. C.Bartlett, “Classifying faces by race and sex using an autoas-sociative memory trained for recognition,” in K. J. Hammondand D. Gentner, eds., Proceedings of the 13th Annual Con-ference of the Cognitive Science Society (Erlbaum, Hillsdale,N.J., 1991), pp. 847–851.

19. L. Sirovich and M. Kirby, “Low-dimensional procedure forthe characterization of human faces,” J. Opt. Soc. Am. A 4,519–554 (1987).

20. A. J. O’Toole, H. Abdi, K. A. Deffenbacher, and D. Valentin,“A low-dimensional representation of faces in higher dimen-sions of the space,” J. Opt. Soc. Am. A 10, 405–411 (1993).

21. H. Abdi, D. Valentin, B. Edelman, and A. J. O’Toole, “Moreabout the difference between men and women: evidencefrom linear neural networks and the principal componentapproach,” Perception 24, 539–562 (1995).

22. H. Murase and S. K. Nayar, “Learning and recognition of 3Dobjects from appearance,” presented at the IEEE 2nd Quali-tative Vision Workshop, New York, June 1993.

23. A. Pentland, B. Moghaddam, and T. Starner, “View-basedand modular eigenspaces for face recognition,” in Proceed-ings of the IEEE Conference on Computer Vision and PatternRecognition (IEEE, New York, 1994), pp. 84–91.

24. A. Abdi, Les Reseaux de Neurones (Press Universitaire deGrenoble, Grenoble, France, 1994).

25. A. Abdi, “A neural network primer,” J. Biol. Syst. 2, 247–281(1994).

26. D. Valentin, H. Abdi, and A. J. O’Toole, “Categorization andidentification of human face images by neural networks: areview of the linear autoassociative and principal componentapproaches,” J. Biol. Syst. 2, 413–429 (1994).

27. D. M. Green and J. A. Swets, Signal Detection Theory andPsychophysics (Wiley, New York, 1966).

28. J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones,“Distinctive features, categorical perception, and probabilitylearning: some applications of a neural model,” Psychol.Rev. 84, 413–451 (1977).


Recommended