+ All Categories
Home > Documents > Orthogonal tensor rank one differential graph preserving projections with its application to facial...

Orthogonal tensor rank one differential graph preserving projections with its application to facial...

Date post: 10-Sep-2016
Category:
Upload: shuai-liu
View: 213 times
Download: 0 times
Share this document with a friend
12
Letters Orthogonal tensor rank one differential graph preserving projections with its application to facial expression recognition Shuai Liu a,b,n , Qiuqi Ruan a,b , Yi Jin a,b a Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China b Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China article info Article history: Received 8 August 2011 Received in revised form 29 October 2011 Accepted 9 December 2011 Communicated by Tao Mei Available online 27 December 2011 Keywords: Dimensionality reduction Orthogonal Tensor Rank-one Differential Graph Preserving Projections (OTR1DGPP) Rank-one basis tensors Tensor rank-one decomposition Facial expression recognition abstract In this paper, a new tensor dimensionality reduction algorithm is proposed based on graph preserving criterion and tensor rank-one projections. In the algorithm, a novel, effective and converged orthogonalization process is given based on a differential-form objective function. A set of orthogonal rank-one basis tensors are obtained to preserve the intra-class local manifolds and enhance the inter- class margins. The algorithm is evaluated by applying to the basic facial expressions recognition. & 2011 Elsevier B.V. All rights reserved. 1. Introduction It is usually important to get compact representation of the high dimensional data in many computer science fields such as pattern recognition and computer vision. Thus dimensionality reduction has been always a hot research topic in the past decades. The predominant idea is to find projections of the original data in the low dimensional subspace. For example, the Principal Component Analysis (PCA) [1] and the Linear Discriminant Analysis (LDA) [2] (the classical dimensionality reduction algorithms), respectively, find the projections with maximal variances and the projections optimal for classification. Both of them presume that the distribu- tion of data was linear and had Gaussian properties (i.e. the mean and the variance reflects the main characteristic of the distribu- tion), so they may not work for the non-linear distribution. Recently the manifold learning algorithms are developed to resolve such problems, the most famous ones including the Neighborhood Preserving Embedding (NPE) [3] and the Locality Preserving Projection (LPP) [4]. NPE finds projections maintaining a linear relationship of the original points within a small local range, while LPP finds projections preserving the relative distances between the local points. Furthermore, Yan et al. [5] propose a general framework explaining most of the dimensionality reduction algo- rithms based on the geometric graphs and the graph preserving criterion. However all the above dimensionality reduction algo- rithms work in the vector space, and thus they need to transform the data into vectors beforehand. This leads to some serious problems for the intrinsic non-vector data, e.g., images and videos. First, it usually generates a very high dimensional input space, which brings the dimensionality disaster; second, the training samples’ number is usually too small vis- a-vis the high dimension- ality of the transformed vectors, resulting in the Small Sample Size (SSS) problem. Third, the inner spatial redundancy of each non- vector datum will not be exploited, e.g. the redundancy among the columns and the rows of images. To overcome these problems, recently tensor representation [8] is introduced to the traditional dimensionality reduction algorithms. With the tensor representa- tion, we can directly project the original multi-dimensional data (high order tensors) onto a set of basis tensors (each basis tensor has the same dimensions and the spatial structure as the original datum) [12]. For example, given a group of images of size m n, an m-dimension and an n-dimension basis vectors are firstly esti- mated from the column space and the row space, respectively, and their tensor product (refer to Section 2.1) construct a basis tensor; then the projection of an image is obtained by the inner product (refer to Section 2.1) between the image and the basis tensor. The dimension of the column or row space of the multi-dimensional data is usually not very high, so that the dimensionality disaster and the SSS problem can be avoided. Moreover the spatial Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2011.12.011 n Corresponding author at: Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China. Tel.: þ86 10 51688402; fax: þ86 10 51688402. E-mail addresses: [email protected], [email protected] (S. Liu). Neurocomputing 82 (2012) 238–249
Transcript

Neurocomputing 82 (2012) 238–249

Contents lists available at SciVerse ScienceDirect

Neurocomputing

0925-23

doi:10.1

n Corr

Univers

E-m

journal homepage: www.elsevier.com/locate/neucom

Letters

Orthogonal tensor rank one differential graph preserving projectionswith its application to facial expression recognition

Shuai Liu a,b,n, Qiuqi Ruan a,b, Yi Jin a,b

a Institute of Information Science, Beijing Jiaotong University, Beijing 100044, Chinab Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

a r t i c l e i n f o

Article history:

Received 8 August 2011

Received in revised form

29 October 2011

Accepted 9 December 2011

Communicated by Tao Meiclass margins. The algorithm is evaluated by applying to the basic facial expressions recognition.

Available online 27 December 2011

Keywords:

Dimensionality reduction

Orthogonal Tensor Rank-one Differential

Graph Preserving Projections (OTR1DGPP)

Rank-one basis tensors

Tensor rank-one decomposition

Facial expression recognition

12/$ - see front matter & 2011 Elsevier B.V. A

016/j.neucom.2011.12.011

esponding author at: Institute of Informati

ity, Beijing 100044, China. Tel.: þ86 10 51688

ail addresses: [email protected], 06112050@

a b s t r a c t

In this paper, a new tensor dimensionality reduction algorithm is proposed based on graph preserving

criterion and tensor rank-one projections. In the algorithm, a novel, effective and converged

orthogonalization process is given based on a differential-form objective function. A set of orthogonal

rank-one basis tensors are obtained to preserve the intra-class local manifolds and enhance the inter-

& 2011 Elsevier B.V. All rights reserved.

1. Introduction

It is usually important to get compact representation of the highdimensional data in many computer science fields such as patternrecognition and computer vision. Thus dimensionality reductionhas been always a hot research topic in the past decades. Thepredominant idea is to find projections of the original data in thelow dimensional subspace. For example, the Principal ComponentAnalysis (PCA) [1] and the Linear Discriminant Analysis (LDA) [2](the classical dimensionality reduction algorithms), respectively,find the projections with maximal variances and the projectionsoptimal for classification. Both of them presume that the distribu-tion of data was linear and had Gaussian properties (i.e. the meanand the variance reflects the main characteristic of the distribu-tion), so they may not work for the non-linear distribution.Recently the manifold learning algorithms are developed to resolvesuch problems, the most famous ones including the NeighborhoodPreserving Embedding (NPE) [3] and the Locality PreservingProjection (LPP) [4]. NPE finds projections maintaining a linearrelationship of the original points within a small local range, whileLPP finds projections preserving the relative distances betweenthe local points. Furthermore, Yan et al. [5] propose a general

ll rights reserved.

on Science, Beijing Jiaotong

402; fax: þ86 10 51688402.

bjtu.edu.cn (S. Liu).

framework explaining most of the dimensionality reduction algo-rithms based on the geometric graphs and the graph preservingcriterion. However all the above dimensionality reduction algo-rithms work in the vector space, and thus they need to transformthe data into vectors beforehand. This leads to some seriousproblems for the intrinsic non-vector data, e.g., images and videos.First, it usually generates a very high dimensional input space,which brings the dimensionality disaster; second, the trainingsamples’ number is usually too small vis-�a-vis the high dimension-ality of the transformed vectors, resulting in the Small Sample Size(SSS) problem. Third, the inner spatial redundancy of each non-vector datum will not be exploited, e.g. the redundancy among thecolumns and the rows of images. To overcome these problems,recently tensor representation [8] is introduced to the traditionaldimensionality reduction algorithms. With the tensor representa-tion, we can directly project the original multi-dimensional data(high order tensors) onto a set of basis tensors (each basis tensorhas the same dimensions and the spatial structure as the originaldatum) [12]. For example, given a group of images of size m�n, anm-dimension and an n-dimension basis vectors are firstly esti-mated from the column space and the row space, respectively, andtheir tensor product (refer to Section 2.1) construct a basis tensor;then the projection of an image is obtained by the inner product(refer to Section 2.1) between the image and the basis tensor. Thedimension of the column or row space of the multi-dimensionaldata is usually not very high, so that the dimensionality disasterand the SSS problem can be avoided. Moreover the spatial

S. Liu et al. / Neurocomputing 82 (2012) 238–249 239

redundancy information of the original multi-dimensional data isembodied in the spatial structure of the basis tensors. The tensorrank-one decomposition [9–11] is an effective technique to obtainthe basis tensors. In [9], they approximate a collection of matrices(second order tensors) by the linear composition of rank-onematrices and the corresponding coefficients. In [10], the LDA isextended to tackle the high order tensors based on tensor rank-onedecomposition, where the Differential Scatter Discriminant Criter-ion (DSDC) (a generalization of the Fisher discriminant criterion) isadopted to ensure convergence during the iterative trainingprocess. Furthermore, Hua et al. in [11] pursue the orthogonalrank-one basis tensors when doing the discriminant analysis basedon the graph embedding framework [5], where the orthogonaliza-tion of the rank-one basis tensors can bring more discriminativepower (similar with the orthogonal projection vectors [13]). How-ever there exist some problems for the orthogonalization strategythey adopted: first, the number of the orthogonal rank-one basistensors is seriously confined to the dimension of the column space(e.g., for 32�32-sized images the number is at most 32); second,the orthogonal rank-one basis tensors are resolved based on theratio-form objective function in an iterative manner, which arehard to converge. Besides the tensor rank-one decomposition, thetucker tensor projection [6,7,21–24] is also widely used to find thebasis tensors, where a group of transformation matrices wereobtained with the tensor products of each columns of them asthe basis tensors. We have concluded this kind of methods andinvestigated the orthogonalization in [12].

In this paper, we propose a new tensor dimensionality reduc-tion algorithm, the Orthogonal Tensor Rank One DifferentialGraph Preserving Projections (OTR1DGPP), based on the tensorrank-one decomposition and a differential-form graph preservingcriterion (a variation of the ratio-form criterion in [5]). Accordingto [10], we use the differential criterion to construct the objectivefunction which preserves the local intra-class manifold andsimultaneously enhances the pairwise inter-class margins. Differ-ent from [11], we give a novel and effective orthogonalizationprocess to pursue the orthogonal rank-one basis tensors based onthe differential-form objective function, which is more flexibleand is theoretically and experimentally testified to convergeduring the iterative training procedure. These years, automaticfacial expression recognition has attracted great attention due toits wide range of applications in human–computer interactionand robots. Some dimensionality reduction methods based onmanifold learning are applied to facial expression recognition[17–20] due to the fact that the variation of facial expressionsforms some low dimensional manifolds embedded in the highdimensional space. However, all of them are performed based onvector representation, thus much intrinsic structural information(e.g. the row and column structure of images), which may behelpful for discovering the facial expression manifolds is dis-carded. In the experiments, we explore the potential of ouralgorithm for the basic facial expressions recognition, and com-pared it with some related state-of-art dimensionality reductionalgorithms, including the tucker tensor projection based discri-minant algorithms [21–24] and the former orthogonal tuckertensor projection algorithm [12]. The proposed algorithm isexpected to more effectively preserve and distinguish the facialexpression manifolds through the orthogonal tensor rank-oneprojections. However the aim of this paper is not at the facialexpression recognition, and we just want to evaluate our algo-rithm and testify the advantage of the orthogonal rank-one basistensors obtained by our algorithm for discovering the manifoldshidden in the high dimensional tensor space (especially the facialexpression manifolds in the face image space). For clarity, in thispaper the variables’ fonts obey the following rules: the scalars aredenoted by normal symbols such as i, l, h, N; the vectors are

denoted by italic and bold lowercase symbols such as x, y, u; thematrices are denoted by italic uppercase symbols such as U, W , X, S;and the tensors are denoted by bold uppercase symbols such as A,B, X.

The rest of the paper is organized as follows. First, in Section 2we introduce some background knowledge, and next present ouralgorithm in Section 3; then we give the experimental results inSection 4; in the end we discuss and conclude this paper inSection 5.

2. The background knowledge

2.1. BASIC tensor algebra

An nth-order tensor is represented as AARm1�m2�����mn , it has n

modes, and the dimension of the kth mode is mk ð1rkrnÞ.

The element of A is denoted by Ai1i2 :::in , where 1r ikrmk,

1rkrn. The inner product of two tensors A, BARm1�m2����mn is

/A,BS¼Pm1 ,...,mn

i1 ¼ 1,...,in ¼ 1 Ai1 ,...,inUBi1 ,...,in , and the norm of tensor A is

:A:¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi/A,AS

p. The tensor product of AARm1�m2����mn and

BARp1�p2����pq is a new (nþq)th-order tensor denoted by

C¼ ðA� BÞARm1�m2���mn�p1�p2 ���pq , where Ci1i2���inj1 j2���jq¼ Ai1 i2 ���in

Bj1 j2 ���jqð1r ikrmk,1rkrn,1r jsrps,1rsrqÞ. The mode-k

product between a tensor AARm1�m2����mn and a matrix UA

Rmk�mk0

is a new tensor denoted by A¼A�kUA Rm1�����m0k����mn ,

where Ai1 ,���,ik�1 ,q,ikþ 1 ,���,in ¼Pmk

p ¼ 1 Ai1 ,���,ik�1 ,p,ikþ 1 ,���,inUUp,q,q¼ 1, � � � ,m0k.

If a tensor UARm1�m2����mn is equal to the tensor product of n

vectors ukARmk ðk¼ 1,. . .,nÞ, i.e. U¼ u1 � u2 � � � � un, then we

say it is a rank-one tensor, and uk is called the kth component

vector of it ð1rkrnÞ. The projection of a tensor AARm1�m2���mn

onto a rank-one tensor U¼ u1 � u2 � � � � un is a scalar denoted by

a¼/A,US . Given two rank-one tensors U1 ¼ u11 � u2

1 � � � � � un1,

U2 ¼ u12 � u2

2 � � � � � un2, if /U1,U2S¼ 0, then they are orthogonal

(denoted by U1 ? U2 ), and it can be inferred that at least one pairof their corresponding component vectors are orthogonal, i.e.

/uj1,uj

2S¼ 0, where j is an arbitrary index between 1 and n (refer

to [12] for detail explanations).

2.2. The graph preserving criterion

Given N original high dimensional vector samples x1,x2, � � � ,xNARm (the training samples), then dimensionality reductionaims to find the effective low dimensional representationsy1,y2, � � � ,yNARm0 , where m05m. The graph embedding frame-work [5] explains most of the dimensionality reduction algo-rithms by the following graph preserving criterion

Y ¼ arg minP:yi:

2Aii ¼ d

orPia j

:yi�yj:2Sp

ij ¼ d

Xia j

:yi�yj:2Sij

¼ arg mintrðYT LYÞ=trðYT BYÞ ð1Þ

In Eq. (1), Y ¼ ½y1,y2, � � � ,yN�ARm0�N, S is the similarity matrixof an intrinsic graph G constructed by the original samples, whoseelements represent the similarities between the correspondingsamples that to be preserved; A is a diagonal matrix for scalenormalization; Sp is the similarity matrix of a penalty graph Gp,whose elements represent the similarities to be suppressed; L isthe laplacian matrix of S, i.e. L¼D�S, where D is a diagonal matrixwith Dii ¼

Pja iSij,8i; B is either A or the laplacian matrix of Sp,

i.e. B¼ Lp¼Dp�Sp, where Dp is a diagonal matrix with Dp

ii ¼P

ja i

S. Liu et al. / Neurocomputing 82 (2012) 238–249240

Spij,8i; d is a constant; trðdÞ represents the trace of a matrix.

The principle of the graph preserving criterion can be explained asfollows. ‘‘For larger (positive) similarity between samples xi

and xj, the distance between yi and yj should be smaller tominimize the objective function. Likewise, smaller (negative)similarity between xi and xj should lead to larger distancesbetween yi and yj for minimization’’ [5]. According to the graphpreserving criterion, one can preserve different geometric orstatistical properties of the original samples through designingdifferent S and Sp. However, the objective function (1) can onlyfind the low dimensional representations of the training samples.To extend it to any sample, the linearization is commonly used,which finds the low dimensional representation y though map-ping an original sample x onto a set of projection vectors w1,w2, � � � ,wm0 , i.e. y¼ xT Wði¼ 1, � � � ,NÞ, where W ¼ ½w1,w2, � � � ,wm0 �.Let X ¼ ½x1,x2 ,. . .,xN �, then Wcan be obtained by optimizing thefollowing objective function

W ¼ argminW

trðWT XLXT WÞ=trðWT XLpXT WÞ ð2Þ

3. The algorithm

3.1. The overview of the algorithm

Here we propose a tensor dimensionality reduction algorithmaccording to the above graph preserving criterion, which directlytackles multi-dimensional tensors without transforming theminto vectors. In contrast with the linear projection for the vectorsamples as that in (2), given N tensor samples X1,X2, � � � , XN

ARm1�m2����mn , we aim to find a set of orthogonal rank-one tensorsU1,U2, � � � ,UR (i.e. U1 ? U2 ? � � � ? UR) for projection, and we callthem the orthogonal rank-one basis tensors. Considering the one-dimension case, suppose Ur ¼ u1

r � u2r � � � � � un

r ð1rrrRÞ is anyrank-one tensor, then the projection of the tensor Xið1r irNÞ ontoUr is represented as the inner product between them, i.e.yi,r ¼/Xi,UrS, which is an one-dimensional representation of Xi.Then according to the graph preserving criterion, the objectivefunction (1) becomes

Ur ¼ arg minPi a j

:yir�yjr:2Sp

ij¼ d

Pia j

:yir�yjr:2Sij

¼ arg minPi a jð/Xi ,UrS�/Xj ,UrSÞ2Sp

ij¼ d

Xia j

ð/Xi,UrS�/Xj,UrSÞ2Sij: ð3Þ

Moreover, to obtain the orthogonal rank-one basis tensors, weadd the orthogonalization constraints Ur ? Ur�1,Ur ? Ur�2, � � � ,Ur

? U1 for r41 to the objective function (3). And motivated by theDifferential Scatter Discriminant Criterion they adopted in [10]which is more likely to converge, we adapt (3) into a differentialform as

Ur ¼ arg maxfX

ia jð/Xi,UrS�/Xj,UrSÞ2Sp

ij�hX

ia jð/Xi,UrS�/Xj,UrSÞ2Sijg,

ð4Þ

s.t. Ur ? Ur�1,Ur ? Ur�2, � � � ,Ur ? U1 (r41), where h is a balancingparameter.

Furthermore, it can be proved that (see Appendix A)

/Xi,UrS¼Xi�1u1r�2u2

r � � � �nunr , ð5Þ

Thus from (4) and (5), we finally obtain the following objectivefunction

½u1r , � � � ,un

r � ¼ arg maxu1

r ,���,unr

Xia j

:ðXi�XjÞ�1u1r � � � �nun

r :2Sp

ij

�hX

ia j:ðXi�XjÞ�1u1

r � � � �nunr :

2Sij

�ð6Þ

s.t. ðukr�1r Þ

T ukr�1

r�1 ¼ ðukr�2r Þ

T ukr�2

r�2 ¼ � � � ¼ ðuk1r Þ

T uk1

1 ¼ 0 for r41, andðuk

r ÞT uk

r ¼ 1 for 8r, where k¼ 1, � � � ,n, 1rk1, � � � ,kr�1rn, and uji is

the jth component vector of the ith rank-one basis tensorð1r irR, 1r jrnÞ.

Here in our algorithm, S is the similarity matrix of the intrinsicgraph constructed by NPE [3], which is defined as

Sij ¼ ðMþMT�MT MÞij, if ia j;0, otherwise: ð7Þ

In (7), M is resolved by optimizing the following objectivefunction:

minX

i

:Xi�X

jANk1ðiÞ

MijXj:2

s:t:X

jANk1ðiÞ

Mij ¼ 1, ð8Þ

where Nk1ðiÞ represents the nearest k1 neighbors of Xi (refer to [3]

for detail). And Sp is defined as

Spij ¼

expð�:Xi�Xj:2=t2Þ=c, ifðXi,XjÞANk2

ðli,ljÞ

0, otherwise,

(ð9Þ

where li indicates the class that Xi belongs to, and Nk2ðli,ljÞ

represents the nearest k2 pairs of marginal points between classli and class lj. In our algorithm, S reflects the intra-class localmanifold structure, while Sp describes the boundaries betweenpairwise different classes. There are two constraints to (6), thefirst one ensures that at least one pair of the correspondingcomponent vectors of the rank-one basis tensors are orthogonalto each other, where each of the superscript indexes k1, � � � ,kr�1

ð1orrRÞ is an arbitrary number between 1 and n (this constraintis equivalent to Ur ? Ur�1,Ur ? Ur�2, � � � ,Ur ? U1); the secondconstraint regulate the magnitude of the component vectors ofthe rank-one basis tensors. Overall the aim of the objectivefunction (6) of our algorithm is to find a set of orthogonal rank-one basis tensors enhancing the inter-class margins and mean-while keeping the local intra-class manifold structure.

3.2. The detail procedure of the algorithm

There is no close-form solution for (6), and we use the iterativeresolving procedure as that in [10,11]. First, we consider gettingthe first rank-one basis tensor, i.e. U1 ¼ u1

1 � u21 � � � � � un

1. Sup-pose only one component vector uq

1was unknown, then theobjective function (6) becomes

uq1 ¼ arg max

uq1

XXia j

:ðxðq;1Þi �xðq;1Þj Þ�quq1:

2Sp

ij�hXX

ia j

:ðxðq;1Þi �xðq;1Þj Þ�quq1:

2Sij

0@

1A

¼ arg maxuq

1

XXia j

:ðxðq;1Þi �xðq;1Þj ÞTuq

1:2Sp

ij�hXX

ia j:ðxðq;1Þi �xðq;1Þj Þ

Tuq1:

2Sij

0@

1A

¼ arg maxuq

1

ðuq1Þ

T½Xðq;1ÞLp

ðXðq;1ÞÞT�hXðq;1ÞLðXðq;1ÞÞT �uq1

n oð10Þ

s.t. ðuq1Þ

T uq1 ¼ 1.

In (10), xðq;1Þi ¼Xi�1u11 � � � �q�1uq�1

1 �qþ1uqþ11 � � � �nun

1ARmq , Xðq;1Þ ¼

½xðq;1Þ1 ,xðq;1Þ2 , � � � ,xðq;1ÞN �ARmq�N, Lp, L are respectively the laplacian

matrix of Sp, S, and there is no orthogonal constraint for thecomponent vectors of the first rank-one basis tensor. The proof for

xðq;1Þi �quq1 ¼ ðx

ðq;1Þi Þ

T uq1 is easy and the reader can refer to [10] for

detail. It is well known that the optimal uq1 for (10) can be

obtained as the eigenvector of Xðq;1ÞLpðXðq;1ÞÞT�hXðq;1ÞLðXðq;1ÞÞT

associated with the largest eigenvalue. After that uq1 is normalized

as uq1 ¼ uq

1=:uq1:. This procedure is repeated over q¼ 1, � � � ,n one

by one until all of u11,u2

1, � � � ,un1 converge, and then the first rank-

one basis tensor U1 is obtained by U1 ¼ u11 � u2

1 � � � � un1.

With regard to the rth (r41) rank-one basis tensor Ur , itshould be orthogonal to the previous r�1 ones Ur�1, � � � ,U1.However the orthogonal constraint in (6) is too complicated and

S. Liu et al. / Neurocomputing 82 (2012) 238–249 241

unclear to be realized. To make it tractable, we modify it asfollows. First we sequentially partition the total R rank-one basistensors into n sets: fUs0þ 1

�Us1g,fUs1þ1 �Us2

g, � � � ,fUsn�1þ1 �Usn g,where s0 ¼ 0, sn ¼ R, and the size of the kth set is less than mk.If Ur locates in the kth set, then the orthogonal constraint in (6) isreplaced by

ðu1r Þ

T u11 ¼ ðu

1r Þ

T u12 ¼ � � � ðu

1r Þ

T u1s1¼ 0,

ðu2r Þ

T u2s1þ1 ¼ ðu

2r Þ

T u2s1þ2 ¼ � � � ðu

2r Þ

T u2s2¼ 0,

^

ðukr Þ

T uksk�1þ1 ¼ ðu

kr Þ

T uksk�1þ2 ¼ � � � ðu

kr Þ

T ukr�1 ¼ 0:

8>>>>><>>>>>:

ð11Þ

First, the constraint in (11) is stronger than that in (6), thus itis enough to ensure the orthogonalization of Ur . Second, theconstraint is not as strong as that in [11], thus we can get moreorthogonal rank-one basis tensors than the algorithm in [11]. In[11], at most R¼maxðm1,m2, � � � ,mnÞ orthogonal rank-one basistensors can be obtained, while we can get as many asm1þm2þ � � � þmn ones by (11). To make this orthogonal con-straint more understandable, we illustrate it with an example asfollows. Suppose there are totally 10 rank-one basis tensorsU1,U2, � � � ,U10ARm1�m2�m3 , where U1,U2, � � � ,U8 are alreadyknown and U9 is the current one to be resolved. We partitionthem into three sets: fU1 �U3g, fU4 �U6g, fU7 �U10g, then theorthogonal constraints for the three component vectors of U9,i.e. u1

9, u29 and u3

9, are shown in Fig. 1.Now we try to resolve Ur according the objective function (6)

and the constraint (11). Suppose that u1r , � � � ,uq�1

r ,uqþ1r , � � � ,un

r

were known, then we analyze how to fix uqr as follows. First if

qok, uqr needs to be orthogonal with uq

sq�1þ1, � � � ,uqsq

according to(11), thus the objective function becomes

uqr ¼ arg max

uqr

fðuqr Þ

T½Xðq;rÞLp

ðXðq;rÞÞT�hXðq;rÞLðXðq;rÞÞT �uqr g

s:t:ðuqr Þ

T uqsq�1þ1 ¼ ðu

qr Þ

T uqsq�1þ2 ¼ � � � ðu

qr Þ

T uqsq¼ 0,ðuq

r ÞT uq

r ¼ 1 ð12Þ

In (12), Xðq;rÞ ¼ ½xðq;rÞ1 ,xðq;rÞ2 , � � � ,xðq;rÞN �, and xðq;rÞi ¼Xi�1u1r � � � �

q�1uq�1r �qþ1uqþ1

r � � � �nunr ði¼ 1, � � � ,NÞ.

Let G¼ Xðq;rÞLpðXðq;rÞÞT�hXðq;rÞLðXðq;rÞÞT , then we construct the

Lagrange function as

f ðuqr Þ ¼ ðu

qr Þ

TGuqr�lsq�1þ1ðu

qr Þ

T uqsq�1þ1�lsq�1þ2ðu

qr Þ

T uqsq�1þ2

� � � � �lsq ðuqr Þ

T uqsq�lððuq

r ÞT uq

r�1Þ: ð13Þ

Fig. 1. An example for the orthogonal constraint in (11). There are totally R¼10 rank-

already known and U9 is the current one to be resolved.

The optimization for (12) is performed by setting the partialderivative df ðuq

r Þ=duqr to zero, i.e.

2Guqr�lsq�1þ1uq

sq�1þ1�lsq�1þ2uqsq�1þ2� � � � �lsq uq

sq�2luq

r ¼ 0: ð14Þ

Multiplying (14) by ðuqr Þ

T we have

2ðuqr Þ

TGuqr�2l¼ 0, i:e: l¼ ðuq

r ÞTGuq

r : ð15Þ

Comparing (15) with (12), we find that l exactly represents theobjective function to be maximized. Again multiplying (14) succes-

sively by ðuqsq�1þ1Þ

T , ðuqsq�1þ2Þ

T , � � � , ðuqsqÞT , we have 2ðuq

sq�1þ1ÞT

Guqr�lsq�1þ1 ¼ 0, � � � ,2ðuq

sqÞTGuq

r�lsq ¼ 0,i.e.

lsq�1þ1 ¼ 2ðuqsq�1þ1Þ

TGuqr , � � � ,lsq ¼ 2ðuq

sqÞTGuq

r : ð16Þ

According to (14) and (16), we obtain the following equation

2Guqr�2uq

sq�1þ1ðuqsq�1þ1Þ

TGuqr� � � � �2uq

sqðuq

sqÞTGuq

r�2luqr ¼ 0, i.e.

ðI�uqsq�1þ1ðu

qsq�1þ1Þ

T� � � ��uq

sqðuq

sqÞTÞGuq

r ¼ luqr : ð17Þ

In (17) I is the identity matrix. We know that maximizing theobjective function (12) is equivalent to maximizing l in (17),thus uq

r can be calculated as the eigenvector of ðI�uqsq�1þ1

ðuqsq�1þ1Þ

T� � � ��uq

sqðuq

sqÞTÞG associated with the largest eigenvalue.

Second, if q¼k, the constraint (11) becomes

ðuqr Þ

T uqsq�1þ1 ¼ ðu

qr Þ

T uqsq�1þ2 ¼ � � � ðu

qr Þ

T uqr�1 ¼ 0,ðuq

r ÞT uq

r ¼ 1: ð18Þ

The resolving procedure for the objective function (12) underconstraint (18) is the same with that when qok, and it can

be inferred that uqr is the eigenvector of ðI�uq

sq�1þ1ðuqsq�1þ1Þ

T

� � � ��uqr�1ðu

qr�1Þ

TÞG associated with the largest eigenvalue in

the case of q¼k. Finally if q4k, then there is no orthogonal

constraint for uqr according to (11), thus uq

r is calculated as the

eigenvector of Xðq;rÞLpðXðq;rÞÞT�hXðq;rÞLðXðq;rÞÞT associated with the

largest eigenvalue (just like uq1 in (10)). Then, we normalize uq

r by

uqr ¼ uq

r =:uqr :. Repeat the above procedure (starting from (12)) for

q¼ 1, � � � ,n one by one until all u1r ,u2

r , � � � ,unr converge, and finally

we get the rth orthogonal rank-one basis tensor as Ur ¼ u1r � u2

r �

� � �� unr ðr41Þ. Let r¼rþ1 and continue the above procedure until

all the R orthogonal rank-one basis tensors U1,U2, � � � ,UR are

resolved. Then for any tensor XARm1�m2�����mn , we can get itslow dimensional representation as y¼ ð/X,U1S,/X,U2S, � � � ,

/X,URSÞARR . It is worth noting that the orthogonalization

one basis tensors (U1,U2,y,U10ARm1�m2�m3 ) where the first 8 of them have been

S. Liu et al. / Neurocomputing 82 (2012) 238–249242

process from (12) to (17) is novelly devised based on thedifferential-form objective function, which is different from thatin [13]. The complete procedure of the algorithm is given withpseudo-codes in Table 1.

3.3. The convergence analysis of the algorithm

For similarity we just analyze the 2nd-order case, and thehigh-order case can be analyzed in the similar way. In the 2nd-order case, the original tensors become matrices X1,X2, � � � ,XN

ARm1�m2 , and the rth rank-one tensor becomes Ur ¼ u1r�

u2r ARm1�m2 , r¼ 1, � � � ,R. According to Table 1, the objective func-

tion with respect to u1r;t is

f ðu1r;tÞ ¼ f ðu1

r;t ,u2r;t�1Þ ¼ ðu

1r;tÞ

TG1;t;ru1r;t ð19Þ

where G1;t;r¼ Xð1;t;rÞLp

ðXð1;t;rÞÞT�hXð1;t;rÞLðXð1;t;rÞÞT , Xð1;t;rÞ ¼ ½xð1;t;rÞ1 ,xð1;t;rÞ2 , � � � ,xð1;t;rÞN � and xð1;t;rÞi ¼ Xi�2u2

r;t�1 ¼ Xiu2r;t�1 (u2

r;t�1 has beenknown).

And the objective function with respect to u2r;t is

f ðu2r;tÞ ¼ f ðu1

r;t ,u2r;tÞ ¼ ðu

2r;tÞ

TG2;t;ru2r;t ð20Þ

where G2;t;r¼ Xð2;t;rÞLp

ðXð2;t;rÞÞT�hXð2;t;rÞLðXð2;t;rÞÞT , Xð2;t;rÞ ¼ ½xð2;t;rÞ1 ,xð2;t;rÞ2 , � � � ,xð2;t;rÞN � and xð2;t;rÞi ¼ Xi�1u1

r;t ¼ ðXiÞT u1

r;t (u1r;t has been

known).Now we want to prove that, for each r the computed uq

r;t by thealgorithm in Table 1 makes f ðuq

r;tÞ monotonically increase as t

increasing, i.e. we need to prove the following inequation:

f ðu1r;tÞr f ðu2

r;tÞr f ðu1r;tþ1Þ: ð21Þ

First, according to (19) and (20) we can infer the following twoequations (refer to Appendix B for the detail proof)

f ðu1r;tÞ ¼ ðu

1r;tÞ

TG1;t;ru1r;t ¼ ðu

2r;t�1Þ

TG2;t;ru2r;t�1 ð22Þ

f ðu2r;tÞ ¼ ðu

2r;tÞ

TG2;t;ru2r;t ¼ ðu

1r;tÞ

TG1;tþ1;ru1r;t ð23Þ

According to Table 1, the range [1�R] is partitioned into2 sections [1�s1] and [s1þ1�R] in the 2nd-order case. For r¼1,

u21;t ¼ argmax

ufuTG2;t;1ug and u1

1;tþ1 ¼ argmaxufuTG1;tþ1;1ug by the

algorithm, so according to (22) and (23) we have f ðu11;tÞ ¼ ðu

21;t�1Þ

T

G2;t;1u21;t�1r ðu

21;tÞ

TG2;t;1u21;t ¼ f ðu2

1;tÞ and f ðu21;tÞ ¼ ðu

11;tÞ

TG1;tþ1;1u11;t

rðu11;tþ1Þ

TG1;tþ1;1u11;tþ1 ¼ f ðu1

1;tþ1Þ. Thus (21) is proved for r¼1.

Table 1The complete procedure of the algorithm.

Input: N training tensors X1 ,. . .,XN ARm1�����mn .

Output: R orthogonal rank-one basis tensors U1,U2,y, UR , (Rrm1þm2 � � � þmn).

Step 1. Construct the similarity matrices S and Sp respectively by (10) and (12), and c

uqr;0 ¼ uq

r;0=:uqr;0:, 1rrrR, 1rqrn; partition the range [1�R] into n sections: [s0�s1

kth section [sk�1þ1�sk] is less than mk ð1rkrnÞ.

Loop 2. For r¼1 to R {

Loop 3. For t¼1 to T {

Loop 4. For q¼1 to n {

Step 5. Let xðq;t;rÞi ¼Xi�1u1r;t � � � �q�1uq�1

r;t �qþ1uqþ1r;t�1 � � � �nun

r;t�1, Xðq;t;rÞ ¼ ½xðq;t;rÞ1 ,xðq;t;rÞ2 , � �

Step 6. If r¼1, let Oq;t;r¼Gq;t;r;

else if r locates in the kth section, i.e.sk�1þ1rrrsk , then:

(i) if qok, let Oq;t;r¼ ðI�uq

sq�1 þ1ðuqsq�1 þ1Þ

T� � � � �uq

sqðuq

sqÞTÞGq;t;r;

(ii) if q¼k, let Oq;t;r¼ ðI�uq

sq�1 þ1ðuqsq�1 þ1Þ

T� � � ��uq

r�1ðuqr�1Þ

TÞGq;t;r;

(iii) if q4k, let Oq;t;r¼Gq;t;r .

Step 7. Update uqr;t as the eigenvector of Oq;t;r associated with the largest eigenvalue

} End Loop 4.

} End Loop 3.

Step 8. Output the rth orthogonal rank-one basis tensor Ur ¼ u1r;T � u2

r;T � � � � � unr;T .

} End Loop 2.

End.

Next, for 1orrs1, u1r;tþ1 ¼ argmax

uuTfðI�u1

1ðu11Þ

T� � � � �u1

r�1

ðu1r�1Þ

TÞG1;tþ1;r

gu, and ðu1r;tþ1Þ

T u11 ¼ � � � ¼ ðu

1r;tþ1Þ

T u1r�1 ¼ 0, so

according to (23) we have f ðu2r;tÞ ¼ ðu

1r;tÞ

T G1;tþ1;ru1r;t ¼ ðu

1r;tÞ

T

fðI�u11ðu

11Þ

T� � � ��u1

r�1ðu1r�1Þ

TÞG1;tþ1;r

gu1r;t r ðu1

r;tþ1ÞTfðI�u1

1 ðu11Þ

T

� � � ��u1r�1ðu

1r�1Þ

TÞG1;tþ1;r

gu1r;tþ1-

¼ ðu1r;tþ1Þ

TG1;tþ1;ru1r;tþ1 ¼ f ðu1

r;tþ1Þ. On the other hand, according to

Table 1 we get u2r;t ¼ argmax

ufuTG2;t;rug in this case, so according to

(22) we have f ðu1r;tÞ ¼ ðu

2r;t�1Þ

TG2;t;ru2r;t�1rðu

2r;tÞ

TG2;t;ru2r;t ¼ f ðu2

r;tÞ.

Thus (21) is proved for 1orrs1.

In the final case, for s1þ1rrrR, we get u2r;t ¼ argmax

u

uTfðI�u2s1þ1ðu

2s1þ1Þ

T� � � � �u2

r�1ðu2r�1Þ

TÞG2;t;r

gu and ðu2r;tÞ

T u2s1þ1 ¼

� � � ¼ ðu2r;tÞ

T u2r�1 ¼ 0, so according to (22) f ðu1

r;tÞ ¼ ðu2r;t�1Þ

TG2;t;r

u2r;t�1 ¼ ðu

2r;t�1Þ

TfðI�u2

s1þ1 ðu2s1þ1Þ

T� � � � �u2

r�1ðu2r�1Þ

TÞ G2;t;r

gu2r;t�1

r ðu2r;tÞ

TfðI�u2

s1þ1ðu2s1þ1Þ

T� � � � �u2

r�1ðu2r�1Þ

TÞG2;t;r

gu2r;t ¼ ðu

2r;tÞ

TG2;t;r

u2r;t ¼ f ðu2

r;tÞ. On the other hand, in this case we have u1r;tþ1

¼ argmaxu

uTfðI�u11ðu

11Þ

T� � � � �u1

s1ðu1

s1ÞTÞG1;tþ1;r

gu and ðu1r;tþ1Þ

T u11

¼ � � � ¼ ðu1r;tþ1Þ

T u1s1¼ 0, so according to (23) we have f ðu2

r;tÞ ¼

ðu1r;tÞ

TG1;tþ1;ru1r;t ¼ ðu

1r;tÞ

TfðI�u1

1ðu11Þ

T� � � � �u1

s1ðu1

s1ÞTÞG1;tþ1;r

gu1r;t r-

ðu1r;tþ1Þ

TfðI�u1

1 ðu11Þ

T� � � � �u1

s1ðu1

s1ÞTÞ G1;tþ1;r

g u1r;tþ1 ¼ ðu

1r;tþ1Þ

T

G1;tþ1;ru1r;tþ1 ¼ f ðu1

r;tþ1Þ. Thus (21) is proved for s1þ1rrrR.

Summing up the above, for each r we have f ðu1r;tÞr f ðu2

r;tÞ

r f ðu1r;tþ1Þ in the 2nd-order case. And it is not hard to extend the

result to higher order case. Obviously, the objective functionf ðuq

r;tÞ is continuous with respect uqr and it has the upper limit

for normalized uqr (i.e. ðuq

r ÞT uq

r ¼ 1, q¼1, 2). This means that thealgorithm in Table 1 converges to a local optimum in computingeach component vector. Here we judge the convergence of uq

r;t bychecking the similarity of uq

r;t and uqr;t�1, i.e. 9ðuq

r;tÞT uq

r;t�19: if it isenough close to 1, we say uq

r;t converges. And if uqr;t converges for

any q, we say the rth rank-one tensor Ur converges.

4. Experiments

In this paper we use the second order tensors to evaluate ouralgorithm, i.e. the gray-level images. Specifically, we apply it to

ompute the corresponding laplacian matrices L and Lp; initialize uqr;0 ¼ ½1, � � � ,1�,

], [s1þ1�s2],y,[sn�1þ1�sn], where s0¼1, sn¼R, and ensure that the size of the

� ,xðq;t;rÞN �, Gq;t;r¼ Xðq;t;rÞLp

ðXðq;t;rÞÞT�hXðq;t;rÞLðXðq;t;rÞÞT .

; normalize it by uqr;t ¼ uq

r;t=:uqr;t:.

S. Liu et al. / Neurocomputing 82 (2012) 238–249 243

the basic facial expressions recognition (there are 6 basic expres-sions: anger, disgust, fear, happiness, sadness and surprise),where two kinds of data are used, the original face images andthe GLOCAL [16] representations of them. The original images arefrom two universal facial expression databases, the JAFFE data-base [14] and the Cohn–Kanade database [15], and all the faceimages are properly cropped and normalized to 64�64 pixels.The GLOCAL representation of an m1 �m2-sized image I is a newm01 �m02-sized matrix I0 (the glocal matrix) with the elements of I

rearranged. It first partitions I into m02 s1 � s2-sized non-over-lapping blocks B1,B2, � � � ,Bm0

2, where m02 ¼ ðm1 �m2Þ=ðs1 � s2Þ.

Then each block Bi is scanned into a vector viARm01 ðm01 ¼ s1 �

s2Þ by sequentially concatenating the columns of Bi. And then putvi into the ith column of the glocal matrix I0 (see Fig. 2). TheGLOCAL representation has the following meanings: the columnspace of the glocal matrixes describes the local image featureswithin each block (in pixel level), while the row space describesthe global features across all blocks (in appearance level). Thusthe local and global features of an image can be both convenientlyexplored. In the experiments, we obtain the GLOCAL representa-tions of the original images with 8�4-sized blocks (see Fig. 4),and we call them the GLOCAL(8,4) image representations.

Fig. 2. The GLOCAL transform from I to I0 with 2�2-sized blocks. Here I is of size

6�6, and I0 is of size 4�9.

Fig. 3. The samples of three persons in the JAFFE database, the expressions from left t

The proposed OTR1DGPP trained with the GLOCAL(8,4) imagerepresentations is denoted by OTR1DGPP(8,4). We compare theperformances of OTR1DGPP, OTR1DGPP(8,4) with some relatedstate-of-art dimensionality reduction algorithms, including LDA,NPE, LPP, OLPP [13], DATER [21], Tenser ANMM (TANMM) [22],Tenser LRWMMC (TLRWMMC) [23], LTDA [24] and our formerpublished work OTNPE [12]. All the algorithms are performed inthe supervised mode (i.e. the class labels of all the trainingsamples are known), and the nearest neighbor classifier basedon the Euclidean distance is used.

4.1. Experimental results on JAFFE database

The JAFFE database [14] contains 213 face images capturedfrom 10 Japanese females, including the six basic expressions plusthe natural face (see Fig. 3). The GLOCAL(8,4) representations of theimages are illustrated in Fig. 4, where each original 64�64 sizedimage is transformed into 32�128 sized glocal matrix. In theexperiment, we use the three folds cross validation to evaluate ouralgorithm. More specially, we partition the whole data (either theoriginal images or the GLOCAL(8,4) image representations) into3 non-overlapping groups, each group contain roughly 70 images,including about one image per person per expression. Then eachtime we take any two groups for training and leave the rest groupfor testing. The average recognition rates over all the three possiblechoices are calculated. The average recognitions rates versusdimensions of OTR1DGPP and OTR1DGPP(8,4) compared with theother algorithms are shown in Fig. 5. For our algorithm and thevector based algorithms (i.e. LDA, NPE, LPP and OLPP), we take thefirst 80 dimensions of the obtained low dimensional vectors. Forthe algorithms based on tucker tensor projection (i.e. OTNPE,DATER, TANMM, TLRWMMC, LTDA), we take the first 150 dimen-sions unfolded from the resulting tensors (the tucker tensorprojection produces low dimensional tensors of the same orderwith the original ones). Here for our algorithm, we partition therange [1�80] into two sections for OTR1DGPP, where[s0�s1]¼[1�40] and [s1þ1�s2]¼[41�80] (refer to the algorithmin Table 1). Thus the first component vectors of U1, U2 ,� � �, U40, i.e.u1

1,u12,� � �,u1

40AR64, are orthogonalized, and the second componentvectors u2

1, u22,� � �, u2

40AR64 have no orthogonal constraint.While for U41, U42,� � �, U80, the first component vectorsu1

41,u142,� � �,u1

80AR64 have no orthogonal constraint, and the secondcomponent vectors u2

41,u242,� � �,u2

80AR64 are orthogonalized. ForOTR1DGPP(8,4), the second component vectors of all the 80rank-one basis tensors U1,U2,� � �,U80, i.e. u2

1,u22,� � �,u2

80AR128, areorthogonalized, while all the first component vectors have noorthogonal constraint.

o right are: anger, disgust, fear, happiness, sadness, surprise and the natural face.

S. Liu et al. / Neurocomputing 82 (2012) 238–249244

Form Fig. 5. we can see that, first, trained with the original imagesour algorithm tends to perform better than the others; second, usingthe GLOCAL(8,4) image representations, OTR1DGPP(8,4) can achievehigher recognition rate than OTR1DGPP. This is because the GLOCALimage representations reflect both global and local face information,which is useful for the facial expression manifold discovery, and it iseffectively explored and preserved by our algorithm. We also noticethat TANMM and TLRWMMC almost give the identical curves as thedimension increases (this is because they use the very same localneighborhood model), and they perform better than DATER andLTDA here. Corresponding to Fig. 5, the top recognition rates andthe associated dimensions are given in Table 2. It shows thatOTR1DGPP(8,4) obtains the highest top recognition rate (0.959),which is slightly higher than that by OTR1DGPP (0.945). OLPP,

Fig. 4. Examples of the GLOCAL representaions of facial expression imges from

three persons in the JAFFE database. In each row, the left one is the original image,

and the right one is the corresponding glocal representaion with 8�4 sized

blocks.

Fig. 5. Recognition rates vs. dimen

OTNPE, TANMM and TLRWMMC achieve comparable top rates withOTR1DGPP, but the corresponding dimenion (except OLPP) is muchhigher. The non-orthogonal vector algorithms (LDA, NPE and LPP)give the poorest results. To fully show the advantage of theorthogonalization process in our algorithm, we design the followingexperiment. We keep the graph preserving model of OTR1DGPPunchanged, and, respectively, adopt the orthogonalization processesin OLPP [13] and OTNPE [12] to obtain the orthogonal vector basesand the orthogonal tucker tensor projection bases. Here we have toadopt the ratio-form objective function, for the orthogonalizationprocesses in [13] and [12] are both based on the ratio-form objectivefunction. Then by the vector projection and the tucker tensorprojection we respectively get the top recognition rate 0.937 with37 dimensions and the top recognition rate 0.940 with 74 dimen-sions. Compare these results with that obtained by OTR1DGPP (inTable 2), we see that OTR1DGPP achieves higher top recognition rate(0.945) with less dimensions (29).

Furthermore, we check the convergence of our algorithm. InFig. 6, the values of the objective function with respect to u1

r;t andu2

r;t , i.e. f ðu1r;tÞ by (19) and f ðu2

r;tÞ by (20), are both given for each t.We see that, the objective function for given r and the averageobjective function over all r¼1,y,80 monotonously increase as t

increasing, and each of them steadily tends to an upper limit,which exactly fits the theoretically analysis in Section 3.3. InFig. 7, we give the average similarity between uq

r;t and uqr;t�1

sions on the JAFFE database.

Table 2The top recognition rates on the JAFFE database.

The algorithms The top

recognition rate

The

dimensions

OTR1DGPP(8,4) 0.959 50

OTR1DGPP 0.945 29

LDA 0.922 13

LPP 0.913 13

NPE 0.926 22

OLPP 0.935 30

OTNPE 0.935 136

DATER 0.918 136

TANMM 0.939 63

TLRWMMC 0.935 77

LTDA 0.922 108

Fig. 6. The values of the objective function vs. the iteration times. The blue points correspond to f ðu1r,tÞ and the red points correspond to f ðu2

r,t Þ, where f ðu1r,tÞ and f ðu1

r,tÞ are,

respectively, defined in (19) and (20). (a) The objective function when r¼1, (b) r¼10, (c) r¼41 and (d) the average objective function values over r¼1,2,y,80. (For

interpretation of the references to color in this figure legend, the reader is referred to the web version of this article. Note that, in all these figures, the blue points and the

red points appear alternately: first the blue one, then the red one, and so on.)

S. Liu et al. / Neurocomputing 82 (2012) 238–249 245

(q¼1,2), i.e.P80

r ¼ 1 9ðuqr;tÞ

T uqr;t�19 (indicated by the vertical axis), as

the iteration times t increasing (indicated by the horizontal axis).Fig. 7(a),(b) show the average similarity obtained by OTR1DGPP,while Fig. 7(c),(d) show that obtained by OTR1DGPP(8,4). We cansee that all of them converge very well after a few iterations(the average similarity is steadily close to 1), and this testifies theconvergence of our algorithm. Moreover, we visualizes the con-verged orthogonal rank-one basis tensors obtained by OTR1DGPPand OTR1DGPP(8,4) in Fig. 8.

The involved parameters for the above experiments are asfollows. For our algorithm, we set k1¼3 in (8), set k2¼10, t¼22,c¼70 in (9), and set h¼0.1 in (10); accordingly the number ofneighbors for NPE and OTNPE are both set to 3. For LPP and OLPP,a complete surpervised neigboring graph is constructed, i.e. allthe intra-class points are connected to each other, and the weightbetween each two connected points is set to 1. For the localdiscriminant tensor algorithms, i. e. TANMM, TLRWMMC, andLTDA, the intra-class neighbors and the inter-class neighbors areuniformly set to the value of 10 (which is suitable according tothe experiments given in these works).

4.2. Experimental results on Cohn–Kanade database

The Cohn–Kanade database [15] consists of face imagesequences changing from the natural faces to the emotional faces.The six basic expressions are included in these sequences (Fig. 9).We first choose the persons who pose more than three of the basicexpressions (totally 41 such persons), and then within each

sequence of them we select 5�10 static frames to form a sampleset containing 960 samples (160 samples per expression). Next, werandomly chose half of the samples per expression for training andleave the rest for testing. We calculate the average recognition ratesover 10 such random choices, which are shown in Fig. 10. Corre-spondingly, the top recognition rates and the associated dimensionsare given in Table 3. It can be seen that our OTR1DGPP also obtainhigher recognition rate compared with the other algorithms. Exceptthe non-orthogonal vector algorithms which produce obviously badresults, all the other compared algorithms get near top recognitionrate. And the performance of OLPP, OTNPE, LTDA and DATER isgetting close to OTR1DGPP as the dimension increasing, howeverthey are not so satisfying in the low dimensional region. Also TANMMand TLRWMMC display the very same performance, which is inaccordance with the former experimental results in Section 4.1. Onthe other hand, OTR1DGPP(8,4) achieves its top recognition rate morequickly than OTR1DGPP, and the absolute value is higher although ittends to become worse than OTR1DGPP as the dimension increasing(it is always better than the other algorithms). Again, with the samegraph preserving model as OTR1DGPP and adopting the orthogona-lization processes in OLPP [13] and OTNPE [12], we, respectively,obtain the top recognition rate 0.954 with 24 dimensions and the toprecogntion rate 0.956 with 32 dimensions. Compared them with theresult obtained by OTR1DGPP (in Table 2), we can see that ourmethod still give better recognition rate with less dimensions. Herewe set k2¼20, t¼35, c¼120 in (9), and set the other parameters asthe same as those in Section 4.1. The orthogonalization strategy forour algorithm is also the same with that in Section 4.1.

Fig. 7. The convergence characteristic of the algorithm. (a) and (b), respectively, measure the average convergence of the first and second component vectors over the 80

obtained rank-one basis tensors when trained with the original images; and (c) (d) measures the average convergence when trained with the GLOCAL(8,4) image

representations. Here the vertical axis represents the average similarity between ðuqr,t Þ and ðuq

r,t�1Þ (q¼1,2) over r¼1,y,80, where t is the iteration times indicated by the

horizontal axis (refer to the algorithm procedure in Table 1).

Fig. 8. The first 12 converged orthogonal rank-one basis tensors (sequentially from left to right and top to bottom) obtained by OTR1DGPP (a) and OTR1DGPP(8,4) (b).

S. Liu et al. / Neurocomputing 82 (2012) 238–249246

Fig. 9. The six basic facial expressions of two persons in the Cohn–Kanade database, from left to right are: anger, disgust, fear, happiness, sadness and surprise.

Fig. 10. Recognition rates vs. dimensions on the Cohn–Kanade database.

Table 3The top recognition rates on the Cohn–Kanade database.

The algorithms The top

recognition rate

The

dimensions

OTR1DGPP(8,4) 0.964 12

OTR1DGPP 0.96 19

LDA 0.943 6

LPP 0.932 24

NPE 0.944 42

OLPP 0.953 62

OTNPE 0.955 34

DATER 0.951 36

TANMM 0.954 43

TLRWMMC 0.954 42

LTDA 0.955 46

S. Liu et al. / Neurocomputing 82 (2012) 238–249 247

5. Discussions and conclusions

The main contribution of this paper is that we give a novel andeffective method to orthogonalize the rank-one basis tensors basedon a differential-form objective function and the method is

converged according to the theoretical analysis and the experi-mental results. From the above experiments, we verify the follow-ing things. First, as a dimensionality reduction algorithm theproposed OTR1DGPP seems to be able to extract relatively moreeffective low dimensional features from the facial expressionimages for recognition compared with some related state-of-artalgorithms. This is probably because that the orthogonal rank-onebasis tensors obtained by OTR1DGPP provide more discriminativeability and the manifold preserving ability (resulting from theeffective orthogonalization method acting on the effective objec-tive function). In addition to the traditional discriminant ideashared by the former algorithms [2,21–24] (pushing the samplesof different classes away and pulling the samples of the same classclose), our algorithm also uses the LLE [17] to describe the intra-class local manifold structure, which is suitable the facial expres-sion data [17,19]. Second, in contrast with the former orthogonalvector projection [13] and the orthogonal tucker tensor projection[12], our orthogonalization process is based on the tensor rank-oneprojection and the differential-form objective function (differentfrom the process in [11] which adopted the tensor rank-oneprojection and the ratio-form objective function), and it tends to

S. Liu et al. / Neurocomputing 82 (2012) 238–249248

achieve higher recognition rate with fewer dimensions, whichshows the advantage of our process. Third, using the GLOCALimage representations, OTR1DGPP could explore more useful facefeatures for discovering the facial expression manifolds and thusdistinguishing the facial expressions.

What should be noted is that the main purpose of this paper isnot to propose an algorithm especially for the facial expressionrecognition. The facial expression recognition experiments areperformed only to evaluate our algorithm and testify its effec-tiveness for extracting the low dimensional features from tensorsamples especially when they form some manifolds in the highdimensional space (the facial expressions images are such sam-ples). The algorithm can also be applied to other similar fieldssuch as face recognition (2D or 3D), palm print recognition, irisrecognition and so on.

Acknowledgments

This paper was supported by the National Natural ScienceFoundation of China (Grant no. 60973060), the SpecializedResearch Fund for the Doctoral Program of Higher Education(Grant no. 200800040008), the Beijing Program (Grant no.YB20081000401), the Postdoctoral Foundation of China (Grantno. 20100470197) and the Fundamental Research Funds for theCentral Universities (Grant no. 2011JBM022).

Appendix

A.

Proof of /Xi,UrS¼Xi�1u1r�2u2

r � � � �nunr .

Suppose that the kth component vector of Ur is ukr ¼ ½u

kr,1,

ukr,2, � � � ,uk

r,mk�T ARmk (k¼1,y,n), then from the definition of

the mode-k product (refer to Section 2.1) we have

Xi�1u1r�2u2

r � � � �nunr

¼Xm1

q1 ¼ 1ðXiÞq1q2���qn

u1r,q1

� ��2u2

r � � � �nunr

¼Xmn

qn ¼ 1� � � ð

Xm2

q2 ¼ 1

Xm1

q1 ¼ 1ðXiÞq1q2���qn

u1r,q1Þu2

r,q2

� �� � �un

r,qn

� �¼Xm1 ,m2 ,���,mn

q1 ¼ 1,q2 ¼ 1,���,qn ¼ 1ðXiÞq1q2 ���qn

u1r,q1

u2r,q2� � �un

r,qn:

And according to the definition of tensor product (see Section 2.1)we have

ðUrÞq1q2���qn¼ u1

r,q1u2

r,q2� � �un

r,qn:

From the above and the definition about the inner product (seeSection 2.1), we haveXm1 ,m2 ,���,mn

q1 ¼ 1,q2 ¼ 1,���,qn ¼ 1ðXiÞq1q2 ���qn

u1r,q1

u2r,q2� � �un

r,qn

¼Xm1 ,m2 ,���,mn

q1 ¼ 1,q2 ¼ 1,���,qn ¼ 1ðXiÞq1q2���qn

UðUrÞq1q2���qn¼/Xi,UrS:

i:e:/Xi,UrS¼Xi�1u1r�2u2

r � � � �nunr : &

B.

Proof of f ðu1r;tÞ ¼ ðu

1r;tÞ

TG1;t;ru1r;t ¼ ðu

2r;t�1Þ

TG2;t;ru2r;t�1 and

f ðu2r;tÞ ¼ ðu

2r;tÞ

TG2;t;ru2r;t ¼ ðu

1r;tÞ

TG1;tþ1;ru1r;t .

According to the definitions in (19) and (20), we have

f ðu1r;tÞ ¼ ðu

1r;tÞ

TG1;t;ru1r;t

¼ ðu1r;tÞ

T½X1u2

r;t�1, � � � ,XNu2r;t�1�ðL

p�hLÞ½X1u2

r;t�1, � � � ,

�XNu2r;t�1�

T u1r;t

¼ ½ðu1r;tÞ

T X1u2r;t�1, � � � ,ðu1

r;tÞT XNu2

r;t�1�ðLp�hLÞ½ðX1u2

r;t�1ÞT

�u1r;t , � � � ,ðXNu2

r;t�1ÞT u1

r;t�

¼ ½ðu2r;t�1Þ

TðX1Þ

T u1r;t , � � � ,ðu

2r;t�1Þ

TðXNÞ

T u1r;t�ðL

p�hLÞ½ðu1

r;tÞT

�X1u2r;t�1, � � � ,ðu1

r;tÞT XNu2

r;t�1�

¼ ðu2r;t�1Þ

T½ðX1Þ

T u1r;t , � � � ,ðXNÞ

T u1r;t�ðL

p�hLÞ½ðX1Þ

T u1r;t ,

� � � ,ðXNÞT u1

r;t�T u2

r;t�1

¼ ðu2r;t�1Þ

TG2;t;ru2r;t�1

f ðu2r;tÞ ¼ ðu

2r;tÞ

TG2;t;ru2r;t

¼ ðu2r;tÞ

T½ðX1Þ

T u1r;t , � � � ,ðXNÞ

T u1r;t�ðL

p�hLÞ½ðX1Þ

T u1r;t ,

� � � ,ðXNÞT u1

r;t�T u2

r;t

¼ ½ðu2r;tÞ

TðX1Þ

T u1r;t , � � � ,ðu

2r;tÞ

TðXNÞ

T u1r;t�ðL

p�hLÞ½ðu1

r;tÞT X1u2

r;t ,

� � � ,ðu1r;tÞ

T XNu2r;t�

¼ ½ðu1r;tÞ

T X1u2r;t , � � � ,ðu

1r;tÞ

T XNu2r;t�ðL

p�hLÞ½ðX1u2

r;tÞT u1

r;t ,

� � � ,ðXNu2r;tÞ

T u1r;t�

¼ ðu1r;tÞ

T½X1u2

r;t , � � � ,XNu2r;t�ðL

p�hLÞ½X1u2

r;t , � � � ,XNu2r;t�

T u1r;t

¼ ðu1r;tÞ

TG1;tþ1;ru1r;t &

References

[1] M. Turk, A.P. Pentland, Face recognition using eigenfaces, in: Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, 1991,pp. 586–591.

[2] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces:recognition using class specific linear projection, IEEE Trans. Pattern Anal.Mach. Intell. 19 (1997) 711–720.

[3] X.F. He, D. Cai, S.C. Yan, H.J. Zhang, Neighborhood preserving embedding,in: Proceedings of the IEEE International Conference on Computer Vision,2005, pp. 1208–1213.

[4] X.F. He, P. Niyogi, Locality preserving projections, in: Proceedings of theConference on Advances in Neural Information Processing Systems, 2003,pp. 153–160.

[5] S. Yan, et al., Graph embedding and extension: a general framework fordimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007)40–51.

[6] X.F. He, D. Cai, P. Niyogi, Tensor subspace analysis, in: Proceedings of theNeural Information Processing Systems, 2005, pp. 499–506.

[7] G. Dai, D. Yeung, Tensor embedding methods, in: Proceedings of the NationalConference on Artificial Intelligence, 2006, pp. 330–335.

[8] L. Lathauwer, B. Moor, J. Vandewalle, On the best rank-1 and rank-(R1, R2, y,RN) approximation of high-order tensors, SIAM J. Matrix Anal. Appl. 21(2000) 1324–1342.

[9] A. Shashua, A. Levin, Linear image coding for regression and classificationusing the tensor-rank principle, in: Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, 2001,pp. 42–49.

[10] D. Tao, X. Li, X. Wu, S. Maybank, Tensor rank one discriminant analysis—aconvergent method for discriminative multilinear subspace selection, Neu-rocomputing 71 (2008) 1866–1882.

[11] G. Hua, P.A. Viola, S.M. Drucker, Face recognition using discriminativelytrained orthogonal rank one tensor projections, in: Proceedings of theComputer Vision and Pattern Recognition, 2007, pp. 1–8.

[12] S. Liu, Q.Q. Ruan, Orthogonal tensor neighborhood preservingembedding for facial expression recognition, Pattern Recognition 44 (2011)1497–1513.

[13] D. Cai, X.F. He, J.W. Han, H.J. Zhang, Orthogonal laplacianfaces for facerecognition, IEEE Trans. Image Process. 15 (2006) 3608–3614.

[14] M. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, ‘‘Coding facial expressions withGabor wavelets,’’ in: Proceedings of the 3th IEEE Conference on AutomaticFace and Gesture Recognition, Nara, Japan, 1998, pp. 200–205.

[15] T. Kanade, J. Cohn, Y. Tian, Comprehensive database for facial expressionanalysis, in: Proceedings of the 4th IEEE International Conference on Auto-matic Face and Gesture Recognition, Grenoble, France, 2000, pp. 46–53.

[16] H.T. Chen, T.L. Liu, C.S. Fuh, Learning effective image metrics from fewpairwise examples, in: Proceedings of the IEEE International Conference onComputer Vision, 2005, pp. 1371–1378.

[17] D. Liang, J. Yang, Z.L. Zheng, Y.C. Chang, A facial expression recognitionsystem based on supervised locally linear embedding, Pattern RecognitionLett. 26 (2005) 2374–2389.

[18] C. Shan, S. Gong, P.W. McOwan, Appearance manifold of facial expression,in: Proceedings of the ICCV Workshop on HCI, 2005.

[19] Y. Chang, C. Hu, M. Turk, Manifold of facial expression, in: Proceedings of theInt’l Workshop on Analysis and Modeling of Faces and Gestures, 2003.

[20] H. Wang, K.Q. Wang, Affective interaction based on person-independentfacial expression space, Neurocomputing 71 (2008) 1889–1901.

S. Liu et al. / Neurocomputing 82 (2012) 238–249 249

[21] S.C. Yan, D. Xu, Q. Yang, L. Zhang, X.O. Tang, H.J. Zhang, Discriminantanalysis with tensor representation, in: Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, 2005,pp. 526–532.

[22] F. Wang, C.S. Zhang, Feature extraction by maximizing the average neighbor-hood margin, in: Proceedings of Computer Vision and Pattern Recognition,2007, pp. 1–8.

[23] Q.Q. Gu, J. Zhou, Local relevance weighted maximum margin criterion for textclassification, in: Proceedings of the 9th SIAM International Conference onData Mining, 2009, pp. 1129–1140.

[24] F.P. Nie, S.M. Xiang, Y.Q. Song, C.S. Zhang, Extracting the optimal dimension-ality for local tensor discriminant analysis, Pattern Recognition 42 (2009)105–114.

Shuai Liu received the B.S. degree in computer sciencefrom Beijing Jiaotong University, PR China in 2006. Heis currently a Ph.D. student in Institute of InformationScience at the Beijing Jiaotong University. His researchinterests include pattern recognition, image processingand machine learning.

Qiuqi Ruan was born in 1944. He received the B.S. andM.S. degree from Northern Jiaotong University, PRChina in 1969 and 1981, respectively. From January1987 to May 1990, he was a visiting scholar in theUniversity of Pittsburgh, and the University of Cincin-nati. Subsequently, he has been a visiting professor inUSA for several times. He has published 2 books andmore than 100 papers, and achieved a national patent.Now he is a professor, doctorate supervisor. He is asenior member of the IEEE. His main research interestsinclude digital signal processing, computer vision,pattern recognition and virtual reality etc.

Yi Jin was born in Heibei, China in 1982 and receivedthe Ph.D. degree in Signal and Information Processingfrom Beijing Jiaotong University, China, in 2010. She iscurrently a post doctoral fellow in the School ofComputer Science and Information Technology, BeijingJiaotong University. Her research interests includeimage processing, pattern recognition, computervision and machine learning.


Recommended