+ All Categories
Home > Documents > KernelEntropyComponentAnalysiswithNongreedy L1...

KernelEntropyComponentAnalysiswithNongreedy L1...

Date post: 24-May-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
10
Research Article Kernel Entropy Component Analysis with Nongreedy L1-Norm Maximization Haijin Ji 1,2 and Song Huang 1 1 Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China 2 School of Computer Science and Technology, Huaiyin Normal University, Huaian 223300, China Correspondence should be addressed to Song Huang; [email protected] Received 13 April 2018; Revised 26 August 2018; Accepted 9 September 2018; Published 14 October 2018 Academic Editor: Pedro Antonio Gutierrez Copyright © 2018 Haijin Ji and Song Huang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Kernel entropy component analysis (KECA) is a newly proposed dimensionality reduction (DR) method, which has showed superiority in many pattern analysis issues previously solved by principal component analysis (PCA). e optimized KECA (OKECA) is a state-of-the-art variant of KECA and can return projections retaining more expressive power than KECA. However, OKECA is sensitive to outliers and accused of its high computational complexities due to its inherent properties of L2-norm. To handle these two problems, we develop a new extension to KECA, namely, KECA-L1, for DR or feature extraction. KECA-L1 aims to find a more robust kernel decomposition matrix such that the extracted features retain information potential as much as possible, which is measured by L1-norm. Accordingly, we design a nongreedy iterative algorithm which has much faster convergence than OKECA’s. Moreover, a general semisupervised classifier is developed for KECA-based methods and employed into the data classification. Extensive experiments on data classification and software defect prediction demonstrate that our new method is superior to most existing KECA- and PCA-based approaches. Code has been also made publicly available. 1. Introduction Curse of dimensionality is one of the major issues in ma- chine learning and pattern recognition [1]. It has motivated many scholars from different areas to properly implement dimensionality reduction (DR) to simplify the input space without degrading performances of learning algorithms. Various efficient methods associated with DR have been developed, such as independent component analysis (ICA) [2], linear discriminant analysis [3], principal component analysis (PCA) [4], projection pursuit [5], to name a few. Among these robust algorithms, PCA has been one of the most used techniques to perform feature extraction (or DR). PCA implements linear data transformation according to the projection matrix, which aims to maximize the second- order statistics of input datasets [6]. To extend PCA to nonlinear space, Sch¨ olkopf et al. [7] proposed the kernel PCA, the so-called KPCA method. e key of KPCA is to find the nonlinear relation between the input data and the kernel feature space (KFS) using the kernel matrix, which is derived from a positive semidefinite kernel function of computing inner products. Both PCA and KPCA perform data transformation by selecting the eigenvectors corre- sponding to the top eigenvalues of the projection matrix and the kernel matrix, respectively. All of them (including their variants) have experienced great success in different areas [8–12], such as image reconstruction [13], face recognition [14–17], image processing [18, 19], to name a few. However, as suggested by Zhang and Hancock [20], the DR should be performed according to the perspective of information theory for obtaining more acceptable results. To improve performances of the aforementioned ap- proaches to DR, Jessen [6] developed a new and completely different data transformation algorithm, namely, kernel entropy component analysis (KECA). e main difference between KECA and PCA or KPCA is that the optimal ei- genvectors (or called entropic components) derived from KECA can compress the most Renyi entropy of the input Hindawi Computational Intelligence and Neuroscience Volume 2018, Article ID 6791683, 9 pages https://doi.org/10.1155/2018/6791683
Transcript
Page 1: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

Research ArticleKernel Entropy Component Analysis with NongreedyL1-Norm Maximization

Haijin Ji 12 and Song Huang 1

1Command amp Control Engineering College Army Engineering University of PLA Nanjing 210007 China2School of Computer Science and Technology Huaiyin Normal University Huaian 223300 China

Correspondence should be addressed to Song Huang hs0317163com

Received 13 April 2018 Revised 26 August 2018 Accepted 9 September 2018 Published 14 October 2018

Academic Editor Pedro Antonio Gutierrez

Copyright copy 2018 Haijin Ji and Song Huang is is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Kernel entropy component analysis (KECA) is a newly proposed dimensionality reduction (DR) method which has showedsuperiority in many pattern analysis issues previously solved by principal component analysis (PCA) e optimized KECA(OKECA) is a state-of-the-art variant of KECA and can return projections retaining more expressive power than KECA HoweverOKECA is sensitive to outliers and accused of its high computational complexities due to its inherent properties of L2-norm Tohandle these two problems we develop a new extension to KECA namely KECA-L1 for DR or feature extraction KECA-L1 aimsto find a more robust kernel decomposition matrix such that the extracted features retain information potential as much aspossible which is measured by L1-norm Accordingly we design a nongreedy iterative algorithm which has much fasterconvergence than OKECArsquos Moreover a general semisupervised classifier is developed for KECA-based methods and employedinto the data classification Extensive experiments on data classification and software defect prediction demonstrate that our newmethod is superior to most existing KECA- and PCA-based approaches Code has been also made publicly available

1 Introduction

Curse of dimensionality is one of the major issues in ma-chine learning and pattern recognition [1] It has motivatedmany scholars from different areas to properly implementdimensionality reduction (DR) to simplify the input spacewithout degrading performances of learning algorithmsVarious efficient methods associated with DR have beendeveloped such as independent component analysis (ICA)[2] linear discriminant analysis [3] principal componentanalysis (PCA) [4] projection pursuit [5] to name a fewAmong these robust algorithms PCA has been one of themost used techniques to perform feature extraction (or DR)PCA implements linear data transformation according tothe projection matrix which aims to maximize the second-order statistics of input datasets [6] To extend PCA tononlinear space Scholkopf et al [7] proposed the kernelPCA the so-called KPCA method e key of KPCA is tofind the nonlinear relation between the input data and the

kernel feature space (KFS) using the kernel matrix which isderived from a positive semidefinite kernel function ofcomputing inner products Both PCA and KPCA performdata transformation by selecting the eigenvectors corre-sponding to the top eigenvalues of the projection matrix andthe kernel matrix respectively All of them (including theirvariants) have experienced great success in different areas[8ndash12] such as image reconstruction [13] face recognition[14ndash17] image processing [18 19] to name a few Howeveras suggested by Zhang and Hancock [20] the DR should beperformed according to the perspective of informationtheory for obtaining more acceptable results

To improve performances of the aforementioned ap-proaches to DR Jessen [6] developed a new and completelydifferent data transformation algorithm namely kernelentropy component analysis (KECA) e main differencebetween KECA and PCA or KPCA is that the optimal ei-genvectors (or called entropic components) derived fromKECA can compress the most Renyi entropy of the input

HindawiComputational Intelligence and NeuroscienceVolume 2018 Article ID 6791683 9 pageshttpsdoiorg10115520186791683

data instead of being associated with top eigenvalues eprocedure of selecting the eigenvectors related to the Renyientropy of the input space is started with a Parzen windowkernel-based estimator [21] en only the eigenvectorscorresponding to the most entropy of the input datasets areselected to perform DR is distinguished characteristichelps KECA achieve better performances than the classicalPCA and KPCA in face recognition and clustering [6] Inrecent years Izquierdo-Verdiguier et al [21] employed therotation matrix from ICA [2] to optimize KECA and pro-posed the optimized KECA (OKECA) OKECA not onlyshows superiority in classification of both synthetic and realdatasets but can obtain acceptable kernel density estimation(KDE) just using very fewer entropic components (just oneor two) compared with KECA [21] However OKECA issensitive to outliers for its inherent properties of L2-normIn other words if the input space follows normal distri-bution and is contaminated by nonnormal distributedoutliers this may lead to the downgrade of its performanceon DR in terms of OKECA Additionally OKECA is verytime-consuming when handling large-scale input datasets(Section 4)

erefore the main purpose of this paper is to proposea new variant of KECA and improve the proneness tooutliers and efficiency of OKECA L1-norm is well knownfor its robustness to outliers [22] Additionally Nie et al [23]established a fast iteration process to handle the general L1-normmaximization issue with nongreedy algorithm Hencewe take advantages of OKECA and propose a new L1-normversion of KECA (denoted as KECA-L1) KECA-L1 uses anefficient convergence procedure motivated by Nie et alrsquosmethod [23] to search for the entropic components con-tributing to the most Renyi entropy of input data Toevaluate the efficiency and effectiveness of KECA-L1 wedesign and conduct a series of experiments in which the datavary from single class to multiattribute and from small tolarge size e classical KECA and OKECA are also includedfor comparison

e remainder of this paper is organized as followsSection 2 reviews the general L1-norm maximization issueKECA and OKECA Section 3 presents KECA with non-greedy L1-norm maximization and semisupervised-learning-based classifier Section 4 validates the perfor-mance of the new method on different data sets Section 5ends this paper with some conclusions

2 Preliminaries

21 An Efficient Algorithm to Solving the General L1-Norm Maximization Issue e general L1-norm maxi-mization problem is first raised by Nie et al [23] isissue based on a hypothesis that there exists an upperbound for the objective function can be generally for-mulated as [23]

max]isinC

f(]) + 1113944i

gi(])1113868111386811138681113868

1113868111386811138681113868 (1)

where both f(]) and gi(]) for each i denote arbitraryfunctions and ] isin C represents an arbitrary constraint

en a sign function sign(middot) is defined as

sign(x) 1 if xge 0

minus1 if xlt 01113896 (2)

and employed to transform the maximization problem (1) asfollows

max]isinC

f(]) + 1113944i

αigi(]) (3)

where αi sign(gi(])) Nie et al [23] proposed a fast it-eration process to solve problem (3) which is shown inAlgorithm 1 It can be seen from Algorithm 1 that αi isdetermined by current solution ]t and the next solution ]t+1

is updated according to the current αi e iterative processis repeated until the procedure converges [23 24] econvergence of the Algorithm 1 has been demonstrated andthe associated details can also be read in [23]

22 Kernel Entropy Component Analysis KECA is charac-terized by its entropic components instead of the principal orvariance-based components in PCA or KPCA respectivelyHence we firstly describe the concept of the Renyi quadraticentropy Given the input dataset X [x1 xN](xi isin RD)the Renyi entropy of X is defined as [6]

H(p) minuslog1113946 p2(x)dx x isin X (4)

where p(x) is a probability density function Based on themonotonic property of logarithmic function Equation (4)can be rewritten as

V(p) 1113946 p2(x)dx (5)

We can estimate Equation (5) using the kernel kσ(x xt)

of Parzen window density estimator determined by thebandwidth coefficient σ [6] such that

V(p) asymp 1113954V(p)

1N

1113944xisinX

p(x)

1N

1113944xiisinX

1N

1113944xjisinX

kσ xi xj1113872 1113873

1

N21TK1

(6)

where Kij kσ(xi xj) constitutes the kernel matrix K and 1represents an N-dimensional vector containing all onesWith the help of the kernel decomposition [6]

K AAT ED12

1113872 1113873 D12ET1113872 1113873 (7)

Equation (6) is transformed as follows

1113954V(p) 1

N2 1113944

N

i1

λi

1113969

1Tei1113874 11138752 (8)

where the diagonal matrix D and the matrix E consist ofeigenvalues λ1 λN and the corresponding eigenvectors

2 Computational Intelligence and Neuroscience

e1 eN respectively It can be observed from Equation (7)that the entropy estimator 1113954V(p) consists of projections ontoall the KFS axes because

Kij kσ xi xj1113872 1113873 ϕ xi( 1113857Tϕ xj1113872 1113873 (9)

where the function of ϕ(middot) is to map the two samples xi andxj into the KFS Additionally only an entropic component ei

meeting the criteria of λi ne 0 and 1Tei ne 0 can contribute tothe entropy estimate [21] In a word KECA implements DRby projecting ϕ(X) into a subspace El spanned not by theeigenvectors associated with the top eigenvalues but byentropic components contributing most to the Renyi en-tropy estimator 1113954V(p) [25]

23 Optimized Kernel Entropy Component Analysis Due tothe fact that KECA is sensitive to different bandwidth co-efficients σ [21] OKECA is proposed to fill this gap andimprove performances of KECA on DR Motivated by thefast ICA method [2] an extra rotation matrix (applying W)is employed to the kernel decomposition (Equation (7)) inKECA for maximizing the information potential (the en-tropy values in Equation (8)) [21]

maxwkisinW

J(w) 1TNED

12w( 11138572

st WWT I w2 1

⎧⎪⎨

⎪⎩(10)

where middot 2 is the L2-norm and w denotes a column vector(N times 1) in W Izquierdo-Verdiguier et al [21] utilizeda gradient-ascent approach to handle the maximizationproblem (10)

w(t) w(tminus 1) + τzJ

zw(t) (11)

where τ is the step size zJzw(t) can be obtained by La-grangian multiplier

zJ

zw(t)

zL(w)

zw 2 1T

NED12w1113872 1113873 1T

NED12

1113872 1113873T (12)

e entropic components multiplied by the rotationmatrix can obtain more (or equal) information potentialthan that of the KECA even using fewer components [21]Moreover OKECA shows the capability of being robust tothe bandwidth coefficient However there exist two mainlimitations for OKECA First the new entropic componentsderived from OKECA are sensible to outliers since its

inherent properties of L2-norm (Equation (10)) Secondalthough a very simple stopping criterion is designed toavoid additional iterations OKECA is still of high com-putational complexities for its computational cost is O(N3 +

4tN2) [21] where t is the number of iterations for finding theoptimal rotation matrix compared with that the one ofKECA is O(N3) [21]

3 KECA with NongreedyL1-Norm Maximization

31 Algorithm In order to alleviate the problems existingin OKECA this section presents how to extend KECA toits nongreedy L1-norm version For readersrsquo easy under-standing the definition of L1-norm is firstly introduced asfollows

Definition 1 Given an arbitrary vector x isin RNtimes1 theL1-norm of the vector x is

x1 1113944N

j1xj

11138681113868111386811138681113868

11138681113868111386811138681113868 (13)

where middot 1 is the L1-norm and xj denotes the jth elementof x

en motivated by OKECA we attempt to developa new objective function to maximize the information po-tential (Equations (8) and (10)) based on the L1-norm

max J(W) WTED12

1 1113944N

j1sign aT

j W1113872 1113873WTaj

st WTW I

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(14)

where (a1 aN) A ED12 N is the size of samplese rotation matrix is denoted as W isin RDIMtimesm where DIMand m are the dimension of input data and dimension of theselected entropic components (or number of projection)respectively It is difficult to directly solve problem (14) butwe may regard it as a special case of problem (1) whenf(]) equiv 0 erefore the Algorithm 1 can be employed tosolve (14) Next we show the details about how to find theoptimal solution of problem (14) based on the proposal fromReferences [23 24] Let

M 1113944N

j1ajsign aT

j W1113872 1113873 (15)

us problem (14) can be simplified as

maxWTWI

J(w) Tr WTM1113872 1113873 (16)

By singular value decomposition (SVD) then

M UΛVT (17)

where U isin RDIMtimesDIM Λ isin RDIMtimesm and V isin Rmtimesm en weobtain

Initialize ]1 isin C t 1While not converge

For each i compute αti sign(gi(]t))

]t+1 argmax]isinC

f(]t) + 1113936iαigi(]t)

t t + 1endOutput ]t+1

ALGORITHM 1 Fast iteration approach to solving the generalL1-Norm maximization problem (3)

Computational Intelligence and Neuroscience 3

Tr WTM1113872 1113873 Tr WTUΛVT1113872 1113873 Tr ΛVTWTU1113872 1113873

Tr(ΛZ) 1113944i

λiizii(18)

where Z isin Rmtimesm λii and zii denote the (i i)minus th element ofmatrixΛ and Z respectively Due to the property of SVD wehave λii ge 0 Additionally Z is an orthonormal matrix [23]such that zii le 1 erefore Tr(WTM) can reach the max-imum only if Z [Im 0mtimes(DIMminusm)] where Im denotes them times m identity matrix and 0mtimes(DIMminusm) is a m times (DIMminusm)

matrix of zeros Considering that Z VTWTU thus thesolution to problem (16) is

W U Im 0(DIMminusm)timesm1113960 1113961VT (19)

Algorithm 2 (A MATLAB implementation of the al-gorithm is available at the Supporting Document for theinterested readers) shows how to utilize the nongreedy L1-norm maximization described in Algorithm 1 to computeEquation (19) Since problem (16) is a special case ofproblem (1) we can obviously obtain that the optimalsolution Wlowast to Equation (19) is a local maximum point forWTED121 based on eorem 2 in Reference [23]Moreover the Phase 1 of the Algorithm 2 spends O(N3) onthe eigen decomposition us the total of computationalcost of KECA-L1 is O(N3 + Nt) where t is the number ofiterations for convergence Considering that the compu-tational complexity of OKECA is O(N3 + 4tN2) we cansafely conclude that KECA-L1 has much faster convergencethan OKECArsquos

32 8e Convergence Analysis is subsection attempts todemonstrate the convergence of the Algorithm 2 in thefollowing theorem

Theorem 1 8e above KECA-L1 procedure can converge

Proof Motivated by References [23 24] first we showthe objective function (9) of KECA-L1 will monotonicallyincrease in each iteration t Let gi(ut) WTaj andαt

i sign(aTj W) then (9) can be simplified to

max J(W) 1113944N

j1sign aT

j W1113872 1113873WTaj 1113944N

j1αt

i gi ut

1113872 1113873

st WTW I

⎧⎪⎪⎨

⎪⎪⎩

(20)

Obviously αt+1i is parallel to gi(ut+1) but neither is αt

i erefore

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 αt+1i gi u

t+11113872 1113873ge αt

i gi ut+1

1113872 1113873

rArr gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge 0

(21)

Considering that |gi(ut)| αtigi(ut) thus

gi ut

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 0 (22)

Substituting (22) in (21) it can be obtained

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 (23)

According to the Step 3 in Algorithm 2 and the theory ofSVD for each iteration t we have

1113944

N

i1αt

igi ut+1

1113872 1113873ge 1113944N

i1αt

igi ut

1113872 1113873 (24)

Combining (23) and (24) for every i we have

1113944

N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 11138731113874 1113875ge 1113944

N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 11138731113874 1113875

rArr1113944N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868ge 1113944N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868

(25)

Table 1 UCI datasets description

Database N DIM Nc Ntrain Ntest

Ionosphere 351 33 2 30 times 2 175Letter 20000 16 26 35 times 26 3870Pendigits 10992 16 9 60 times 9 3500Pima-Indians 768 8 2 100 times 2 325WDBC 569 30 2 35 times 2 345Wine 178 12 3 30 times 3 80N number of samples DIM number of dimensions Nc number of classesNtrain number of training data and Ntest number of testing data

Input K and m

Initialize W1 isin RDIMtimesm such that WTW I t 1--------------------------------Phase 1------------------------------

(1) Eigen decomposition [ED]⟵eig(K) D⟵sort(D) E⟵sort(E) A ED12--------------------------------Phase 2------------------------------While not converge

(2) M 1113936Nj1ajsign(aT

j W)(3) Compute the SVD of M as M UΛVT Let Wt+1 U[Im 0(DIMminusm)timesm]VT(4) t t + 1

endOutput Wt+1 isin RDIMtimesm

ALGORITHM 2 KECA-L1

4 Computational Intelligence and Neuroscience

which means that Algorithm 2 is monotonically increasingAdditionally considering that objective function (14) ofKECA-L1 has an upper bound within the limited iterationsthe KECA-L1 procedure will converge

33 e Semisupervised Classier Jenssen [26] establisheda semisupervised learning (SSL) algorithm for classicationusing KECA is SSL-based classier was trained by both

labeled and unlabeled data to build the kernel matrix such thatit can map the data to KFS appropriately [26] Additionally itis based on a general modelling scheme and applicable forother variants of KECA such as OKECA and KECA-L1

More specically we are given N pairs of training dataxi yi Ni1 with samples xi isin RD and the associated labels yiIn addition there are M unlabeled data points for testingLet Xu [x1u xMu ] and Xl [x1l xNl ] denote thetesting data and training data without labels respectively

102 4 6 8

OA

()

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(a)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(b)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(c)

102 4 6 8

OA

()

55

60

65

70

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(d)

102 4 6 8Number of projection

OA

()

75

80

85

90

95

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(e)

102 4 6 8

OA

()

Number of projection

40

60

80

100

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(f )

Figure 1 Overall accuracy obtained by the PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 using dierent UCI databases with dierentnumbers of extracted features (a) Ionosphere (b) Letter (c) Pendigits (d) Pima-Indians (e) WDBC and (f) Wine

Computational Intelligence and Neuroscience 5

thus we can obtain an overall matrix X [Xu Xl] en weconstruct the kernel matrix K derived from X using (6)K isin R(N+M)times(N+M) which plays as the input of Algorithm 2After the iteration procedure of nongreedy L1-norm maxi-mization we obtain a projection ofX⟶ 1113957E [1113957Eu

1113957El]mtimes(M+N)

onto m orthogonal axes where 1113957Eu [1113957eu1 1113957eu

M] and1113957El [1113957el

1 1113957elN] In other words 1113957eu

i and 1113957elj are the low-

dimensional representations of each testing data point xiu

and the training one xj

l respectively Assume that xlowastu is anarbitrary data point to be tested If it satisfies

1113957elowastu minus 1113957e

j

l

2 min

h1N1113957elowastu minus 1113957e

hl

2 (26)

then xlowastu is assigned to the same class with the jth data pointof Xl

4 Experiments

is section shows the performance of the proposed KECA-L1 compared with the classical KECA [6] and OKECA [21]for real-world data classification using the SSL-based clas-sifier illustrated in Section 33 Several recent techniquessuch as PCA-L1 [27] and KPCA-L1 [28] are also included forcomparison e rationale to select these methods is thatprevious studies related to DR found that they can produceimpressive results [27ndash29] We implement the experimentson a wide range of real-world datasets (1) six differentdatasets from the University California Irvine (UCI)Machine Learning Repository (available at httparchiveicsuciedumldatasetshtml) and (2) 9 different software pro-jects with 34 releases from the PROMISE data repository(available at httpopenscienceusrepo) e MATLABsource code for running KECA and OKECA uploaded byIzquierdo-Verdiguier et al [21] is available at httpispuvessoft_featurehtml e coefficients set for PCA-L1 andKPCA-L1 is the same with [27 28] All of the experimentsare all performed by MATLAB R2012a on a PC with InterCore i5 CPU 4 GB memory and Windows 7 operatingsystem

41 Experiments on UCI Datasets e experiments areconducted on six datasets from the UCI the Inonospheredataset is a binary classification problem of whether theradar signal can describe the structure of free electrons in theionosphere or not the Letter dataset is to assign each black-and-white rectangular pixel display to one of the 26 capitalletters in the English alphabet the Pendigits handles therecognition of pen-based handwritten digits the Pima-Indians data set constitutes a clinical problem of diabetesdiagnosis in patients from clinical variables the WDBCdataset is another clinical problem for the diagnosis of breastcancer in malignant or benign classes and the Wine datasetis the result of a chemical analysis of wines grown in thesame region in Italy but derived from three different cul-tivars Table 1 shows the details of them In the subsequentexperiments we just utilized the simplest linear classifier[30] e theory of maximizing maximum likelihood (ML)[31] is selected as the rule for selecting bandwidth coefficientas suggested in [21]

Table 2 Descriptions of data attributes

Attribute DescriptionWMC Weighted methods per classAMC Average method ComplexityAVG_CC Mean values of methods in the same classCA Afferent couplingsCAM Cohesion among methods of classCBM Coupling between MethodsCBO Coupling between object classesCE Efferent couplingsDAM Data access MetricDIT Depth of inheritance treeIC Inheritance CouplingLCOM Lack of cohesion in MethodsLCOM3 Normalized version of LCOMLOC Lines of codeMAX_CC Maximum values of methods in the same classMFA Measure of function AbstractionMOA Measure of AggregationNOC Number of ChildrenNPM Number of public MethodsRFC Response for a classBug Number of bugs detected in the class

Table 3 Descriptions of software data

Releases Classes FP FPAnt-13 125 20 0160Ant-14 178 40 0225Ant-15 293 32 0109Ant-16 351 92 0262Ant-17 745 166 0223Camel-10 339 13 0038Camel-12 608 216 0355Camel-14 872 145 0166Camel-16 965 188 0195Ivy-11 111 63 0568Ivy-14 241 16 0066Ivy-20 352 40 0114Jedit-32 272 90 0331Jedit-40 306 75 0245Lucene-20 195 91 0467Lucene-22 247 144 0583Lucene-24 340 203 0597Poi-15 237 141 0595Poi-20 314 37 0118Poi-25 385 248 0644Poi-30 442 281 0636Synapse-10 157 16 0102Synapse-11 222 60 0270Synapse-12 256 86 0336Synapse-14 196 147 0750Synapse-15 214 142 0664Synapse-16 229 78 0341Xalan-24 723 110 0152Xalan-25 803 387 0482Xalan-26 885 411 0464Xerces-init 162 77 0475Xerces-12 440 71 0161Xerces-13 453 69 0152Xerces-14 588 437 0743

6 Computational Intelligence and Neuroscience

e implementation of KECA-L1 and other methods isrepeated using all the selected datasets with respect to dif-ferent numbers of components for 10 times We have uti-lized the overall classication accuracy (OA) to evaluate theperformance of dierent algorithms on the classicationOA is dened as the total number of samples correctlyassigned in percentage terms which is within [0 1] andindicates better quality with larger values Figure 1 presentsthe average OA curves obtained by the aforementionedalgorithms for these six real datasets It can be observed fromFigure 1 that OKECA is superior to KECA PCA-L1 andKPCA-L1 except for solving Letter issue is is probablybecause DR performed by OKECA not only can reveal thestructure related to the most Renyi entropy of the originaldata but also consider the rotational invariance property[21] In addition KECA-L1 outperforms the other methodsbesides of OKECA is may be attributed to the robustnessof L1-norm to outliers compared with that of the L2-normIn Figure 1(c) OKECA seems to obtain nearly the sameresults with KECA-L1rsquos However the average running time(in hours) of OKECA in the Pendigits is 37384 times morethan that of KECA-L1 1339

42 Experiments on Software Projects In software engi-neering it is usually dipoundcult to test a software projectcompletely and thoroughly with the limited resources [32]Software defect prediction (SDP) may provide a relativelyacceptable solution to this problem It can allocate the

limited test resources eectively by categorizing the softwaremodules into two classes nonfault-prone (NFP) or fault-prone (FP) according to 21 software metrics (Table 2)

is section aims to employ KECA-based methods toreduce the selected software data (Table 3) dimensions andthen utilize the SSL-based classier combined with thesupport vector machine [33] to classify each softwaremodule as NFP or FP e bandwidth coepoundcient set is stillrestricted to the rule of ML PCA-L1 and KPCA-L1 areinvolved as a benchmarking yardstick ere are 34 groupsof tests for each release in Table 3 e most suitable releases[34] from dierent software projects are selected as trainingdata We evaluate the performance of dierent selectedmethods on SDP in terms of recall (R) precision (P) andF-measure (F) [35 36] e F-measure is dened as

F 2 times precsion times recallprecsion + recall

(27)

where

Precsion TP

TP + FP

Recall TP

TP + FN

(28)

In (28) FN (ie false negative) means that buggyclasses are wrongly classied to be nonfaulty while FP(ie false positive) means nonbuggy classes are wronglyclassied to be faulty TP (ie true positive) refer to

R F0

01

02

03

04

05

06

07

08

09

1PCA-L1

P F0

01

02

03

04

05

06

07

08

09

1KPCA-L1

R P F0

01

02

03

04

05

06

07

08

09

1KECA

R P F0

01

02

03

04

05

06

07

08

09

1OKECA

R P F0

01

02

03

04

05

06

07

08

09

1KECA-L1

R P

Figure 2 e standardized boxplots of the performance achieved by PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 respectivelyFrom the bottom to the top of a standardized box plot minimum rst quartile median third quartile and maximum

Computational Intelligence and Neuroscience 7

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 2: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

data instead of being associated with top eigenvalues eprocedure of selecting the eigenvectors related to the Renyientropy of the input space is started with a Parzen windowkernel-based estimator [21] en only the eigenvectorscorresponding to the most entropy of the input datasets areselected to perform DR is distinguished characteristichelps KECA achieve better performances than the classicalPCA and KPCA in face recognition and clustering [6] Inrecent years Izquierdo-Verdiguier et al [21] employed therotation matrix from ICA [2] to optimize KECA and pro-posed the optimized KECA (OKECA) OKECA not onlyshows superiority in classification of both synthetic and realdatasets but can obtain acceptable kernel density estimation(KDE) just using very fewer entropic components (just oneor two) compared with KECA [21] However OKECA issensitive to outliers for its inherent properties of L2-normIn other words if the input space follows normal distri-bution and is contaminated by nonnormal distributedoutliers this may lead to the downgrade of its performanceon DR in terms of OKECA Additionally OKECA is verytime-consuming when handling large-scale input datasets(Section 4)

erefore the main purpose of this paper is to proposea new variant of KECA and improve the proneness tooutliers and efficiency of OKECA L1-norm is well knownfor its robustness to outliers [22] Additionally Nie et al [23]established a fast iteration process to handle the general L1-normmaximization issue with nongreedy algorithm Hencewe take advantages of OKECA and propose a new L1-normversion of KECA (denoted as KECA-L1) KECA-L1 uses anefficient convergence procedure motivated by Nie et alrsquosmethod [23] to search for the entropic components con-tributing to the most Renyi entropy of input data Toevaluate the efficiency and effectiveness of KECA-L1 wedesign and conduct a series of experiments in which the datavary from single class to multiattribute and from small tolarge size e classical KECA and OKECA are also includedfor comparison

e remainder of this paper is organized as followsSection 2 reviews the general L1-norm maximization issueKECA and OKECA Section 3 presents KECA with non-greedy L1-norm maximization and semisupervised-learning-based classifier Section 4 validates the perfor-mance of the new method on different data sets Section 5ends this paper with some conclusions

2 Preliminaries

21 An Efficient Algorithm to Solving the General L1-Norm Maximization Issue e general L1-norm maxi-mization problem is first raised by Nie et al [23] isissue based on a hypothesis that there exists an upperbound for the objective function can be generally for-mulated as [23]

max]isinC

f(]) + 1113944i

gi(])1113868111386811138681113868

1113868111386811138681113868 (1)

where both f(]) and gi(]) for each i denote arbitraryfunctions and ] isin C represents an arbitrary constraint

en a sign function sign(middot) is defined as

sign(x) 1 if xge 0

minus1 if xlt 01113896 (2)

and employed to transform the maximization problem (1) asfollows

max]isinC

f(]) + 1113944i

αigi(]) (3)

where αi sign(gi(])) Nie et al [23] proposed a fast it-eration process to solve problem (3) which is shown inAlgorithm 1 It can be seen from Algorithm 1 that αi isdetermined by current solution ]t and the next solution ]t+1

is updated according to the current αi e iterative processis repeated until the procedure converges [23 24] econvergence of the Algorithm 1 has been demonstrated andthe associated details can also be read in [23]

22 Kernel Entropy Component Analysis KECA is charac-terized by its entropic components instead of the principal orvariance-based components in PCA or KPCA respectivelyHence we firstly describe the concept of the Renyi quadraticentropy Given the input dataset X [x1 xN](xi isin RD)the Renyi entropy of X is defined as [6]

H(p) minuslog1113946 p2(x)dx x isin X (4)

where p(x) is a probability density function Based on themonotonic property of logarithmic function Equation (4)can be rewritten as

V(p) 1113946 p2(x)dx (5)

We can estimate Equation (5) using the kernel kσ(x xt)

of Parzen window density estimator determined by thebandwidth coefficient σ [6] such that

V(p) asymp 1113954V(p)

1N

1113944xisinX

p(x)

1N

1113944xiisinX

1N

1113944xjisinX

kσ xi xj1113872 1113873

1

N21TK1

(6)

where Kij kσ(xi xj) constitutes the kernel matrix K and 1represents an N-dimensional vector containing all onesWith the help of the kernel decomposition [6]

K AAT ED12

1113872 1113873 D12ET1113872 1113873 (7)

Equation (6) is transformed as follows

1113954V(p) 1

N2 1113944

N

i1

λi

1113969

1Tei1113874 11138752 (8)

where the diagonal matrix D and the matrix E consist ofeigenvalues λ1 λN and the corresponding eigenvectors

2 Computational Intelligence and Neuroscience

e1 eN respectively It can be observed from Equation (7)that the entropy estimator 1113954V(p) consists of projections ontoall the KFS axes because

Kij kσ xi xj1113872 1113873 ϕ xi( 1113857Tϕ xj1113872 1113873 (9)

where the function of ϕ(middot) is to map the two samples xi andxj into the KFS Additionally only an entropic component ei

meeting the criteria of λi ne 0 and 1Tei ne 0 can contribute tothe entropy estimate [21] In a word KECA implements DRby projecting ϕ(X) into a subspace El spanned not by theeigenvectors associated with the top eigenvalues but byentropic components contributing most to the Renyi en-tropy estimator 1113954V(p) [25]

23 Optimized Kernel Entropy Component Analysis Due tothe fact that KECA is sensitive to different bandwidth co-efficients σ [21] OKECA is proposed to fill this gap andimprove performances of KECA on DR Motivated by thefast ICA method [2] an extra rotation matrix (applying W)is employed to the kernel decomposition (Equation (7)) inKECA for maximizing the information potential (the en-tropy values in Equation (8)) [21]

maxwkisinW

J(w) 1TNED

12w( 11138572

st WWT I w2 1

⎧⎪⎨

⎪⎩(10)

where middot 2 is the L2-norm and w denotes a column vector(N times 1) in W Izquierdo-Verdiguier et al [21] utilizeda gradient-ascent approach to handle the maximizationproblem (10)

w(t) w(tminus 1) + τzJ

zw(t) (11)

where τ is the step size zJzw(t) can be obtained by La-grangian multiplier

zJ

zw(t)

zL(w)

zw 2 1T

NED12w1113872 1113873 1T

NED12

1113872 1113873T (12)

e entropic components multiplied by the rotationmatrix can obtain more (or equal) information potentialthan that of the KECA even using fewer components [21]Moreover OKECA shows the capability of being robust tothe bandwidth coefficient However there exist two mainlimitations for OKECA First the new entropic componentsderived from OKECA are sensible to outliers since its

inherent properties of L2-norm (Equation (10)) Secondalthough a very simple stopping criterion is designed toavoid additional iterations OKECA is still of high com-putational complexities for its computational cost is O(N3 +

4tN2) [21] where t is the number of iterations for finding theoptimal rotation matrix compared with that the one ofKECA is O(N3) [21]

3 KECA with NongreedyL1-Norm Maximization

31 Algorithm In order to alleviate the problems existingin OKECA this section presents how to extend KECA toits nongreedy L1-norm version For readersrsquo easy under-standing the definition of L1-norm is firstly introduced asfollows

Definition 1 Given an arbitrary vector x isin RNtimes1 theL1-norm of the vector x is

x1 1113944N

j1xj

11138681113868111386811138681113868

11138681113868111386811138681113868 (13)

where middot 1 is the L1-norm and xj denotes the jth elementof x

en motivated by OKECA we attempt to developa new objective function to maximize the information po-tential (Equations (8) and (10)) based on the L1-norm

max J(W) WTED12

1 1113944N

j1sign aT

j W1113872 1113873WTaj

st WTW I

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(14)

where (a1 aN) A ED12 N is the size of samplese rotation matrix is denoted as W isin RDIMtimesm where DIMand m are the dimension of input data and dimension of theselected entropic components (or number of projection)respectively It is difficult to directly solve problem (14) butwe may regard it as a special case of problem (1) whenf(]) equiv 0 erefore the Algorithm 1 can be employed tosolve (14) Next we show the details about how to find theoptimal solution of problem (14) based on the proposal fromReferences [23 24] Let

M 1113944N

j1ajsign aT

j W1113872 1113873 (15)

us problem (14) can be simplified as

maxWTWI

J(w) Tr WTM1113872 1113873 (16)

By singular value decomposition (SVD) then

M UΛVT (17)

where U isin RDIMtimesDIM Λ isin RDIMtimesm and V isin Rmtimesm en weobtain

Initialize ]1 isin C t 1While not converge

For each i compute αti sign(gi(]t))

]t+1 argmax]isinC

f(]t) + 1113936iαigi(]t)

t t + 1endOutput ]t+1

ALGORITHM 1 Fast iteration approach to solving the generalL1-Norm maximization problem (3)

Computational Intelligence and Neuroscience 3

Tr WTM1113872 1113873 Tr WTUΛVT1113872 1113873 Tr ΛVTWTU1113872 1113873

Tr(ΛZ) 1113944i

λiizii(18)

where Z isin Rmtimesm λii and zii denote the (i i)minus th element ofmatrixΛ and Z respectively Due to the property of SVD wehave λii ge 0 Additionally Z is an orthonormal matrix [23]such that zii le 1 erefore Tr(WTM) can reach the max-imum only if Z [Im 0mtimes(DIMminusm)] where Im denotes them times m identity matrix and 0mtimes(DIMminusm) is a m times (DIMminusm)

matrix of zeros Considering that Z VTWTU thus thesolution to problem (16) is

W U Im 0(DIMminusm)timesm1113960 1113961VT (19)

Algorithm 2 (A MATLAB implementation of the al-gorithm is available at the Supporting Document for theinterested readers) shows how to utilize the nongreedy L1-norm maximization described in Algorithm 1 to computeEquation (19) Since problem (16) is a special case ofproblem (1) we can obviously obtain that the optimalsolution Wlowast to Equation (19) is a local maximum point forWTED121 based on eorem 2 in Reference [23]Moreover the Phase 1 of the Algorithm 2 spends O(N3) onthe eigen decomposition us the total of computationalcost of KECA-L1 is O(N3 + Nt) where t is the number ofiterations for convergence Considering that the compu-tational complexity of OKECA is O(N3 + 4tN2) we cansafely conclude that KECA-L1 has much faster convergencethan OKECArsquos

32 8e Convergence Analysis is subsection attempts todemonstrate the convergence of the Algorithm 2 in thefollowing theorem

Theorem 1 8e above KECA-L1 procedure can converge

Proof Motivated by References [23 24] first we showthe objective function (9) of KECA-L1 will monotonicallyincrease in each iteration t Let gi(ut) WTaj andαt

i sign(aTj W) then (9) can be simplified to

max J(W) 1113944N

j1sign aT

j W1113872 1113873WTaj 1113944N

j1αt

i gi ut

1113872 1113873

st WTW I

⎧⎪⎪⎨

⎪⎪⎩

(20)

Obviously αt+1i is parallel to gi(ut+1) but neither is αt

i erefore

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 αt+1i gi u

t+11113872 1113873ge αt

i gi ut+1

1113872 1113873

rArr gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge 0

(21)

Considering that |gi(ut)| αtigi(ut) thus

gi ut

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 0 (22)

Substituting (22) in (21) it can be obtained

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 (23)

According to the Step 3 in Algorithm 2 and the theory ofSVD for each iteration t we have

1113944

N

i1αt

igi ut+1

1113872 1113873ge 1113944N

i1αt

igi ut

1113872 1113873 (24)

Combining (23) and (24) for every i we have

1113944

N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 11138731113874 1113875ge 1113944

N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 11138731113874 1113875

rArr1113944N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868ge 1113944N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868

(25)

Table 1 UCI datasets description

Database N DIM Nc Ntrain Ntest

Ionosphere 351 33 2 30 times 2 175Letter 20000 16 26 35 times 26 3870Pendigits 10992 16 9 60 times 9 3500Pima-Indians 768 8 2 100 times 2 325WDBC 569 30 2 35 times 2 345Wine 178 12 3 30 times 3 80N number of samples DIM number of dimensions Nc number of classesNtrain number of training data and Ntest number of testing data

Input K and m

Initialize W1 isin RDIMtimesm such that WTW I t 1--------------------------------Phase 1------------------------------

(1) Eigen decomposition [ED]⟵eig(K) D⟵sort(D) E⟵sort(E) A ED12--------------------------------Phase 2------------------------------While not converge

(2) M 1113936Nj1ajsign(aT

j W)(3) Compute the SVD of M as M UΛVT Let Wt+1 U[Im 0(DIMminusm)timesm]VT(4) t t + 1

endOutput Wt+1 isin RDIMtimesm

ALGORITHM 2 KECA-L1

4 Computational Intelligence and Neuroscience

which means that Algorithm 2 is monotonically increasingAdditionally considering that objective function (14) ofKECA-L1 has an upper bound within the limited iterationsthe KECA-L1 procedure will converge

33 e Semisupervised Classier Jenssen [26] establisheda semisupervised learning (SSL) algorithm for classicationusing KECA is SSL-based classier was trained by both

labeled and unlabeled data to build the kernel matrix such thatit can map the data to KFS appropriately [26] Additionally itis based on a general modelling scheme and applicable forother variants of KECA such as OKECA and KECA-L1

More specically we are given N pairs of training dataxi yi Ni1 with samples xi isin RD and the associated labels yiIn addition there are M unlabeled data points for testingLet Xu [x1u xMu ] and Xl [x1l xNl ] denote thetesting data and training data without labels respectively

102 4 6 8

OA

()

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(a)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(b)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(c)

102 4 6 8

OA

()

55

60

65

70

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(d)

102 4 6 8Number of projection

OA

()

75

80

85

90

95

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(e)

102 4 6 8

OA

()

Number of projection

40

60

80

100

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(f )

Figure 1 Overall accuracy obtained by the PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 using dierent UCI databases with dierentnumbers of extracted features (a) Ionosphere (b) Letter (c) Pendigits (d) Pima-Indians (e) WDBC and (f) Wine

Computational Intelligence and Neuroscience 5

thus we can obtain an overall matrix X [Xu Xl] en weconstruct the kernel matrix K derived from X using (6)K isin R(N+M)times(N+M) which plays as the input of Algorithm 2After the iteration procedure of nongreedy L1-norm maxi-mization we obtain a projection ofX⟶ 1113957E [1113957Eu

1113957El]mtimes(M+N)

onto m orthogonal axes where 1113957Eu [1113957eu1 1113957eu

M] and1113957El [1113957el

1 1113957elN] In other words 1113957eu

i and 1113957elj are the low-

dimensional representations of each testing data point xiu

and the training one xj

l respectively Assume that xlowastu is anarbitrary data point to be tested If it satisfies

1113957elowastu minus 1113957e

j

l

2 min

h1N1113957elowastu minus 1113957e

hl

2 (26)

then xlowastu is assigned to the same class with the jth data pointof Xl

4 Experiments

is section shows the performance of the proposed KECA-L1 compared with the classical KECA [6] and OKECA [21]for real-world data classification using the SSL-based clas-sifier illustrated in Section 33 Several recent techniquessuch as PCA-L1 [27] and KPCA-L1 [28] are also included forcomparison e rationale to select these methods is thatprevious studies related to DR found that they can produceimpressive results [27ndash29] We implement the experimentson a wide range of real-world datasets (1) six differentdatasets from the University California Irvine (UCI)Machine Learning Repository (available at httparchiveicsuciedumldatasetshtml) and (2) 9 different software pro-jects with 34 releases from the PROMISE data repository(available at httpopenscienceusrepo) e MATLABsource code for running KECA and OKECA uploaded byIzquierdo-Verdiguier et al [21] is available at httpispuvessoft_featurehtml e coefficients set for PCA-L1 andKPCA-L1 is the same with [27 28] All of the experimentsare all performed by MATLAB R2012a on a PC with InterCore i5 CPU 4 GB memory and Windows 7 operatingsystem

41 Experiments on UCI Datasets e experiments areconducted on six datasets from the UCI the Inonospheredataset is a binary classification problem of whether theradar signal can describe the structure of free electrons in theionosphere or not the Letter dataset is to assign each black-and-white rectangular pixel display to one of the 26 capitalletters in the English alphabet the Pendigits handles therecognition of pen-based handwritten digits the Pima-Indians data set constitutes a clinical problem of diabetesdiagnosis in patients from clinical variables the WDBCdataset is another clinical problem for the diagnosis of breastcancer in malignant or benign classes and the Wine datasetis the result of a chemical analysis of wines grown in thesame region in Italy but derived from three different cul-tivars Table 1 shows the details of them In the subsequentexperiments we just utilized the simplest linear classifier[30] e theory of maximizing maximum likelihood (ML)[31] is selected as the rule for selecting bandwidth coefficientas suggested in [21]

Table 2 Descriptions of data attributes

Attribute DescriptionWMC Weighted methods per classAMC Average method ComplexityAVG_CC Mean values of methods in the same classCA Afferent couplingsCAM Cohesion among methods of classCBM Coupling between MethodsCBO Coupling between object classesCE Efferent couplingsDAM Data access MetricDIT Depth of inheritance treeIC Inheritance CouplingLCOM Lack of cohesion in MethodsLCOM3 Normalized version of LCOMLOC Lines of codeMAX_CC Maximum values of methods in the same classMFA Measure of function AbstractionMOA Measure of AggregationNOC Number of ChildrenNPM Number of public MethodsRFC Response for a classBug Number of bugs detected in the class

Table 3 Descriptions of software data

Releases Classes FP FPAnt-13 125 20 0160Ant-14 178 40 0225Ant-15 293 32 0109Ant-16 351 92 0262Ant-17 745 166 0223Camel-10 339 13 0038Camel-12 608 216 0355Camel-14 872 145 0166Camel-16 965 188 0195Ivy-11 111 63 0568Ivy-14 241 16 0066Ivy-20 352 40 0114Jedit-32 272 90 0331Jedit-40 306 75 0245Lucene-20 195 91 0467Lucene-22 247 144 0583Lucene-24 340 203 0597Poi-15 237 141 0595Poi-20 314 37 0118Poi-25 385 248 0644Poi-30 442 281 0636Synapse-10 157 16 0102Synapse-11 222 60 0270Synapse-12 256 86 0336Synapse-14 196 147 0750Synapse-15 214 142 0664Synapse-16 229 78 0341Xalan-24 723 110 0152Xalan-25 803 387 0482Xalan-26 885 411 0464Xerces-init 162 77 0475Xerces-12 440 71 0161Xerces-13 453 69 0152Xerces-14 588 437 0743

6 Computational Intelligence and Neuroscience

e implementation of KECA-L1 and other methods isrepeated using all the selected datasets with respect to dif-ferent numbers of components for 10 times We have uti-lized the overall classication accuracy (OA) to evaluate theperformance of dierent algorithms on the classicationOA is dened as the total number of samples correctlyassigned in percentage terms which is within [0 1] andindicates better quality with larger values Figure 1 presentsthe average OA curves obtained by the aforementionedalgorithms for these six real datasets It can be observed fromFigure 1 that OKECA is superior to KECA PCA-L1 andKPCA-L1 except for solving Letter issue is is probablybecause DR performed by OKECA not only can reveal thestructure related to the most Renyi entropy of the originaldata but also consider the rotational invariance property[21] In addition KECA-L1 outperforms the other methodsbesides of OKECA is may be attributed to the robustnessof L1-norm to outliers compared with that of the L2-normIn Figure 1(c) OKECA seems to obtain nearly the sameresults with KECA-L1rsquos However the average running time(in hours) of OKECA in the Pendigits is 37384 times morethan that of KECA-L1 1339

42 Experiments on Software Projects In software engi-neering it is usually dipoundcult to test a software projectcompletely and thoroughly with the limited resources [32]Software defect prediction (SDP) may provide a relativelyacceptable solution to this problem It can allocate the

limited test resources eectively by categorizing the softwaremodules into two classes nonfault-prone (NFP) or fault-prone (FP) according to 21 software metrics (Table 2)

is section aims to employ KECA-based methods toreduce the selected software data (Table 3) dimensions andthen utilize the SSL-based classier combined with thesupport vector machine [33] to classify each softwaremodule as NFP or FP e bandwidth coepoundcient set is stillrestricted to the rule of ML PCA-L1 and KPCA-L1 areinvolved as a benchmarking yardstick ere are 34 groupsof tests for each release in Table 3 e most suitable releases[34] from dierent software projects are selected as trainingdata We evaluate the performance of dierent selectedmethods on SDP in terms of recall (R) precision (P) andF-measure (F) [35 36] e F-measure is dened as

F 2 times precsion times recallprecsion + recall

(27)

where

Precsion TP

TP + FP

Recall TP

TP + FN

(28)

In (28) FN (ie false negative) means that buggyclasses are wrongly classied to be nonfaulty while FP(ie false positive) means nonbuggy classes are wronglyclassied to be faulty TP (ie true positive) refer to

R F0

01

02

03

04

05

06

07

08

09

1PCA-L1

P F0

01

02

03

04

05

06

07

08

09

1KPCA-L1

R P F0

01

02

03

04

05

06

07

08

09

1KECA

R P F0

01

02

03

04

05

06

07

08

09

1OKECA

R P F0

01

02

03

04

05

06

07

08

09

1KECA-L1

R P

Figure 2 e standardized boxplots of the performance achieved by PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 respectivelyFrom the bottom to the top of a standardized box plot minimum rst quartile median third quartile and maximum

Computational Intelligence and Neuroscience 7

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 3: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

e1 eN respectively It can be observed from Equation (7)that the entropy estimator 1113954V(p) consists of projections ontoall the KFS axes because

Kij kσ xi xj1113872 1113873 ϕ xi( 1113857Tϕ xj1113872 1113873 (9)

where the function of ϕ(middot) is to map the two samples xi andxj into the KFS Additionally only an entropic component ei

meeting the criteria of λi ne 0 and 1Tei ne 0 can contribute tothe entropy estimate [21] In a word KECA implements DRby projecting ϕ(X) into a subspace El spanned not by theeigenvectors associated with the top eigenvalues but byentropic components contributing most to the Renyi en-tropy estimator 1113954V(p) [25]

23 Optimized Kernel Entropy Component Analysis Due tothe fact that KECA is sensitive to different bandwidth co-efficients σ [21] OKECA is proposed to fill this gap andimprove performances of KECA on DR Motivated by thefast ICA method [2] an extra rotation matrix (applying W)is employed to the kernel decomposition (Equation (7)) inKECA for maximizing the information potential (the en-tropy values in Equation (8)) [21]

maxwkisinW

J(w) 1TNED

12w( 11138572

st WWT I w2 1

⎧⎪⎨

⎪⎩(10)

where middot 2 is the L2-norm and w denotes a column vector(N times 1) in W Izquierdo-Verdiguier et al [21] utilizeda gradient-ascent approach to handle the maximizationproblem (10)

w(t) w(tminus 1) + τzJ

zw(t) (11)

where τ is the step size zJzw(t) can be obtained by La-grangian multiplier

zJ

zw(t)

zL(w)

zw 2 1T

NED12w1113872 1113873 1T

NED12

1113872 1113873T (12)

e entropic components multiplied by the rotationmatrix can obtain more (or equal) information potentialthan that of the KECA even using fewer components [21]Moreover OKECA shows the capability of being robust tothe bandwidth coefficient However there exist two mainlimitations for OKECA First the new entropic componentsderived from OKECA are sensible to outliers since its

inherent properties of L2-norm (Equation (10)) Secondalthough a very simple stopping criterion is designed toavoid additional iterations OKECA is still of high com-putational complexities for its computational cost is O(N3 +

4tN2) [21] where t is the number of iterations for finding theoptimal rotation matrix compared with that the one ofKECA is O(N3) [21]

3 KECA with NongreedyL1-Norm Maximization

31 Algorithm In order to alleviate the problems existingin OKECA this section presents how to extend KECA toits nongreedy L1-norm version For readersrsquo easy under-standing the definition of L1-norm is firstly introduced asfollows

Definition 1 Given an arbitrary vector x isin RNtimes1 theL1-norm of the vector x is

x1 1113944N

j1xj

11138681113868111386811138681113868

11138681113868111386811138681113868 (13)

where middot 1 is the L1-norm and xj denotes the jth elementof x

en motivated by OKECA we attempt to developa new objective function to maximize the information po-tential (Equations (8) and (10)) based on the L1-norm

max J(W) WTED12

1 1113944N

j1sign aT

j W1113872 1113873WTaj

st WTW I

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(14)

where (a1 aN) A ED12 N is the size of samplese rotation matrix is denoted as W isin RDIMtimesm where DIMand m are the dimension of input data and dimension of theselected entropic components (or number of projection)respectively It is difficult to directly solve problem (14) butwe may regard it as a special case of problem (1) whenf(]) equiv 0 erefore the Algorithm 1 can be employed tosolve (14) Next we show the details about how to find theoptimal solution of problem (14) based on the proposal fromReferences [23 24] Let

M 1113944N

j1ajsign aT

j W1113872 1113873 (15)

us problem (14) can be simplified as

maxWTWI

J(w) Tr WTM1113872 1113873 (16)

By singular value decomposition (SVD) then

M UΛVT (17)

where U isin RDIMtimesDIM Λ isin RDIMtimesm and V isin Rmtimesm en weobtain

Initialize ]1 isin C t 1While not converge

For each i compute αti sign(gi(]t))

]t+1 argmax]isinC

f(]t) + 1113936iαigi(]t)

t t + 1endOutput ]t+1

ALGORITHM 1 Fast iteration approach to solving the generalL1-Norm maximization problem (3)

Computational Intelligence and Neuroscience 3

Tr WTM1113872 1113873 Tr WTUΛVT1113872 1113873 Tr ΛVTWTU1113872 1113873

Tr(ΛZ) 1113944i

λiizii(18)

where Z isin Rmtimesm λii and zii denote the (i i)minus th element ofmatrixΛ and Z respectively Due to the property of SVD wehave λii ge 0 Additionally Z is an orthonormal matrix [23]such that zii le 1 erefore Tr(WTM) can reach the max-imum only if Z [Im 0mtimes(DIMminusm)] where Im denotes them times m identity matrix and 0mtimes(DIMminusm) is a m times (DIMminusm)

matrix of zeros Considering that Z VTWTU thus thesolution to problem (16) is

W U Im 0(DIMminusm)timesm1113960 1113961VT (19)

Algorithm 2 (A MATLAB implementation of the al-gorithm is available at the Supporting Document for theinterested readers) shows how to utilize the nongreedy L1-norm maximization described in Algorithm 1 to computeEquation (19) Since problem (16) is a special case ofproblem (1) we can obviously obtain that the optimalsolution Wlowast to Equation (19) is a local maximum point forWTED121 based on eorem 2 in Reference [23]Moreover the Phase 1 of the Algorithm 2 spends O(N3) onthe eigen decomposition us the total of computationalcost of KECA-L1 is O(N3 + Nt) where t is the number ofiterations for convergence Considering that the compu-tational complexity of OKECA is O(N3 + 4tN2) we cansafely conclude that KECA-L1 has much faster convergencethan OKECArsquos

32 8e Convergence Analysis is subsection attempts todemonstrate the convergence of the Algorithm 2 in thefollowing theorem

Theorem 1 8e above KECA-L1 procedure can converge

Proof Motivated by References [23 24] first we showthe objective function (9) of KECA-L1 will monotonicallyincrease in each iteration t Let gi(ut) WTaj andαt

i sign(aTj W) then (9) can be simplified to

max J(W) 1113944N

j1sign aT

j W1113872 1113873WTaj 1113944N

j1αt

i gi ut

1113872 1113873

st WTW I

⎧⎪⎪⎨

⎪⎪⎩

(20)

Obviously αt+1i is parallel to gi(ut+1) but neither is αt

i erefore

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 αt+1i gi u

t+11113872 1113873ge αt

i gi ut+1

1113872 1113873

rArr gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge 0

(21)

Considering that |gi(ut)| αtigi(ut) thus

gi ut

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 0 (22)

Substituting (22) in (21) it can be obtained

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 (23)

According to the Step 3 in Algorithm 2 and the theory ofSVD for each iteration t we have

1113944

N

i1αt

igi ut+1

1113872 1113873ge 1113944N

i1αt

igi ut

1113872 1113873 (24)

Combining (23) and (24) for every i we have

1113944

N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 11138731113874 1113875ge 1113944

N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 11138731113874 1113875

rArr1113944N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868ge 1113944N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868

(25)

Table 1 UCI datasets description

Database N DIM Nc Ntrain Ntest

Ionosphere 351 33 2 30 times 2 175Letter 20000 16 26 35 times 26 3870Pendigits 10992 16 9 60 times 9 3500Pima-Indians 768 8 2 100 times 2 325WDBC 569 30 2 35 times 2 345Wine 178 12 3 30 times 3 80N number of samples DIM number of dimensions Nc number of classesNtrain number of training data and Ntest number of testing data

Input K and m

Initialize W1 isin RDIMtimesm such that WTW I t 1--------------------------------Phase 1------------------------------

(1) Eigen decomposition [ED]⟵eig(K) D⟵sort(D) E⟵sort(E) A ED12--------------------------------Phase 2------------------------------While not converge

(2) M 1113936Nj1ajsign(aT

j W)(3) Compute the SVD of M as M UΛVT Let Wt+1 U[Im 0(DIMminusm)timesm]VT(4) t t + 1

endOutput Wt+1 isin RDIMtimesm

ALGORITHM 2 KECA-L1

4 Computational Intelligence and Neuroscience

which means that Algorithm 2 is monotonically increasingAdditionally considering that objective function (14) ofKECA-L1 has an upper bound within the limited iterationsthe KECA-L1 procedure will converge

33 e Semisupervised Classier Jenssen [26] establisheda semisupervised learning (SSL) algorithm for classicationusing KECA is SSL-based classier was trained by both

labeled and unlabeled data to build the kernel matrix such thatit can map the data to KFS appropriately [26] Additionally itis based on a general modelling scheme and applicable forother variants of KECA such as OKECA and KECA-L1

More specically we are given N pairs of training dataxi yi Ni1 with samples xi isin RD and the associated labels yiIn addition there are M unlabeled data points for testingLet Xu [x1u xMu ] and Xl [x1l xNl ] denote thetesting data and training data without labels respectively

102 4 6 8

OA

()

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(a)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(b)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(c)

102 4 6 8

OA

()

55

60

65

70

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(d)

102 4 6 8Number of projection

OA

()

75

80

85

90

95

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(e)

102 4 6 8

OA

()

Number of projection

40

60

80

100

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(f )

Figure 1 Overall accuracy obtained by the PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 using dierent UCI databases with dierentnumbers of extracted features (a) Ionosphere (b) Letter (c) Pendigits (d) Pima-Indians (e) WDBC and (f) Wine

Computational Intelligence and Neuroscience 5

thus we can obtain an overall matrix X [Xu Xl] en weconstruct the kernel matrix K derived from X using (6)K isin R(N+M)times(N+M) which plays as the input of Algorithm 2After the iteration procedure of nongreedy L1-norm maxi-mization we obtain a projection ofX⟶ 1113957E [1113957Eu

1113957El]mtimes(M+N)

onto m orthogonal axes where 1113957Eu [1113957eu1 1113957eu

M] and1113957El [1113957el

1 1113957elN] In other words 1113957eu

i and 1113957elj are the low-

dimensional representations of each testing data point xiu

and the training one xj

l respectively Assume that xlowastu is anarbitrary data point to be tested If it satisfies

1113957elowastu minus 1113957e

j

l

2 min

h1N1113957elowastu minus 1113957e

hl

2 (26)

then xlowastu is assigned to the same class with the jth data pointof Xl

4 Experiments

is section shows the performance of the proposed KECA-L1 compared with the classical KECA [6] and OKECA [21]for real-world data classification using the SSL-based clas-sifier illustrated in Section 33 Several recent techniquessuch as PCA-L1 [27] and KPCA-L1 [28] are also included forcomparison e rationale to select these methods is thatprevious studies related to DR found that they can produceimpressive results [27ndash29] We implement the experimentson a wide range of real-world datasets (1) six differentdatasets from the University California Irvine (UCI)Machine Learning Repository (available at httparchiveicsuciedumldatasetshtml) and (2) 9 different software pro-jects with 34 releases from the PROMISE data repository(available at httpopenscienceusrepo) e MATLABsource code for running KECA and OKECA uploaded byIzquierdo-Verdiguier et al [21] is available at httpispuvessoft_featurehtml e coefficients set for PCA-L1 andKPCA-L1 is the same with [27 28] All of the experimentsare all performed by MATLAB R2012a on a PC with InterCore i5 CPU 4 GB memory and Windows 7 operatingsystem

41 Experiments on UCI Datasets e experiments areconducted on six datasets from the UCI the Inonospheredataset is a binary classification problem of whether theradar signal can describe the structure of free electrons in theionosphere or not the Letter dataset is to assign each black-and-white rectangular pixel display to one of the 26 capitalletters in the English alphabet the Pendigits handles therecognition of pen-based handwritten digits the Pima-Indians data set constitutes a clinical problem of diabetesdiagnosis in patients from clinical variables the WDBCdataset is another clinical problem for the diagnosis of breastcancer in malignant or benign classes and the Wine datasetis the result of a chemical analysis of wines grown in thesame region in Italy but derived from three different cul-tivars Table 1 shows the details of them In the subsequentexperiments we just utilized the simplest linear classifier[30] e theory of maximizing maximum likelihood (ML)[31] is selected as the rule for selecting bandwidth coefficientas suggested in [21]

Table 2 Descriptions of data attributes

Attribute DescriptionWMC Weighted methods per classAMC Average method ComplexityAVG_CC Mean values of methods in the same classCA Afferent couplingsCAM Cohesion among methods of classCBM Coupling between MethodsCBO Coupling between object classesCE Efferent couplingsDAM Data access MetricDIT Depth of inheritance treeIC Inheritance CouplingLCOM Lack of cohesion in MethodsLCOM3 Normalized version of LCOMLOC Lines of codeMAX_CC Maximum values of methods in the same classMFA Measure of function AbstractionMOA Measure of AggregationNOC Number of ChildrenNPM Number of public MethodsRFC Response for a classBug Number of bugs detected in the class

Table 3 Descriptions of software data

Releases Classes FP FPAnt-13 125 20 0160Ant-14 178 40 0225Ant-15 293 32 0109Ant-16 351 92 0262Ant-17 745 166 0223Camel-10 339 13 0038Camel-12 608 216 0355Camel-14 872 145 0166Camel-16 965 188 0195Ivy-11 111 63 0568Ivy-14 241 16 0066Ivy-20 352 40 0114Jedit-32 272 90 0331Jedit-40 306 75 0245Lucene-20 195 91 0467Lucene-22 247 144 0583Lucene-24 340 203 0597Poi-15 237 141 0595Poi-20 314 37 0118Poi-25 385 248 0644Poi-30 442 281 0636Synapse-10 157 16 0102Synapse-11 222 60 0270Synapse-12 256 86 0336Synapse-14 196 147 0750Synapse-15 214 142 0664Synapse-16 229 78 0341Xalan-24 723 110 0152Xalan-25 803 387 0482Xalan-26 885 411 0464Xerces-init 162 77 0475Xerces-12 440 71 0161Xerces-13 453 69 0152Xerces-14 588 437 0743

6 Computational Intelligence and Neuroscience

e implementation of KECA-L1 and other methods isrepeated using all the selected datasets with respect to dif-ferent numbers of components for 10 times We have uti-lized the overall classication accuracy (OA) to evaluate theperformance of dierent algorithms on the classicationOA is dened as the total number of samples correctlyassigned in percentage terms which is within [0 1] andindicates better quality with larger values Figure 1 presentsthe average OA curves obtained by the aforementionedalgorithms for these six real datasets It can be observed fromFigure 1 that OKECA is superior to KECA PCA-L1 andKPCA-L1 except for solving Letter issue is is probablybecause DR performed by OKECA not only can reveal thestructure related to the most Renyi entropy of the originaldata but also consider the rotational invariance property[21] In addition KECA-L1 outperforms the other methodsbesides of OKECA is may be attributed to the robustnessof L1-norm to outliers compared with that of the L2-normIn Figure 1(c) OKECA seems to obtain nearly the sameresults with KECA-L1rsquos However the average running time(in hours) of OKECA in the Pendigits is 37384 times morethan that of KECA-L1 1339

42 Experiments on Software Projects In software engi-neering it is usually dipoundcult to test a software projectcompletely and thoroughly with the limited resources [32]Software defect prediction (SDP) may provide a relativelyacceptable solution to this problem It can allocate the

limited test resources eectively by categorizing the softwaremodules into two classes nonfault-prone (NFP) or fault-prone (FP) according to 21 software metrics (Table 2)

is section aims to employ KECA-based methods toreduce the selected software data (Table 3) dimensions andthen utilize the SSL-based classier combined with thesupport vector machine [33] to classify each softwaremodule as NFP or FP e bandwidth coepoundcient set is stillrestricted to the rule of ML PCA-L1 and KPCA-L1 areinvolved as a benchmarking yardstick ere are 34 groupsof tests for each release in Table 3 e most suitable releases[34] from dierent software projects are selected as trainingdata We evaluate the performance of dierent selectedmethods on SDP in terms of recall (R) precision (P) andF-measure (F) [35 36] e F-measure is dened as

F 2 times precsion times recallprecsion + recall

(27)

where

Precsion TP

TP + FP

Recall TP

TP + FN

(28)

In (28) FN (ie false negative) means that buggyclasses are wrongly classied to be nonfaulty while FP(ie false positive) means nonbuggy classes are wronglyclassied to be faulty TP (ie true positive) refer to

R F0

01

02

03

04

05

06

07

08

09

1PCA-L1

P F0

01

02

03

04

05

06

07

08

09

1KPCA-L1

R P F0

01

02

03

04

05

06

07

08

09

1KECA

R P F0

01

02

03

04

05

06

07

08

09

1OKECA

R P F0

01

02

03

04

05

06

07

08

09

1KECA-L1

R P

Figure 2 e standardized boxplots of the performance achieved by PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 respectivelyFrom the bottom to the top of a standardized box plot minimum rst quartile median third quartile and maximum

Computational Intelligence and Neuroscience 7

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 4: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

Tr WTM1113872 1113873 Tr WTUΛVT1113872 1113873 Tr ΛVTWTU1113872 1113873

Tr(ΛZ) 1113944i

λiizii(18)

where Z isin Rmtimesm λii and zii denote the (i i)minus th element ofmatrixΛ and Z respectively Due to the property of SVD wehave λii ge 0 Additionally Z is an orthonormal matrix [23]such that zii le 1 erefore Tr(WTM) can reach the max-imum only if Z [Im 0mtimes(DIMminusm)] where Im denotes them times m identity matrix and 0mtimes(DIMminusm) is a m times (DIMminusm)

matrix of zeros Considering that Z VTWTU thus thesolution to problem (16) is

W U Im 0(DIMminusm)timesm1113960 1113961VT (19)

Algorithm 2 (A MATLAB implementation of the al-gorithm is available at the Supporting Document for theinterested readers) shows how to utilize the nongreedy L1-norm maximization described in Algorithm 1 to computeEquation (19) Since problem (16) is a special case ofproblem (1) we can obviously obtain that the optimalsolution Wlowast to Equation (19) is a local maximum point forWTED121 based on eorem 2 in Reference [23]Moreover the Phase 1 of the Algorithm 2 spends O(N3) onthe eigen decomposition us the total of computationalcost of KECA-L1 is O(N3 + Nt) where t is the number ofiterations for convergence Considering that the compu-tational complexity of OKECA is O(N3 + 4tN2) we cansafely conclude that KECA-L1 has much faster convergencethan OKECArsquos

32 8e Convergence Analysis is subsection attempts todemonstrate the convergence of the Algorithm 2 in thefollowing theorem

Theorem 1 8e above KECA-L1 procedure can converge

Proof Motivated by References [23 24] first we showthe objective function (9) of KECA-L1 will monotonicallyincrease in each iteration t Let gi(ut) WTaj andαt

i sign(aTj W) then (9) can be simplified to

max J(W) 1113944N

j1sign aT

j W1113872 1113873WTaj 1113944N

j1αt

i gi ut

1113872 1113873

st WTW I

⎧⎪⎪⎨

⎪⎪⎩

(20)

Obviously αt+1i is parallel to gi(ut+1) but neither is αt

i erefore

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 αt+1i gi u

t+11113872 1113873ge αt

i gi ut+1

1113872 1113873

rArr gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge 0

(21)

Considering that |gi(ut)| αtigi(ut) thus

gi ut

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 0 (22)

Substituting (22) in (21) it can be obtained

gi ut+1

1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 1113873ge gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 1113873 (23)

According to the Step 3 in Algorithm 2 and the theory ofSVD for each iteration t we have

1113944

N

i1αt

igi ut+1

1113872 1113873ge 1113944N

i1αt

igi ut

1113872 1113873 (24)

Combining (23) and (24) for every i we have

1113944

N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t+11113872 11138731113874 1113875ge 1113944

N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868minus αtigi u

t1113872 11138731113874 1113875

rArr1113944N

i1gi u

t+11113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868ge 1113944N

i1gi u

t1113872 1113873

11138681113868111386811138681113868

11138681113868111386811138681113868

(25)

Table 1 UCI datasets description

Database N DIM Nc Ntrain Ntest

Ionosphere 351 33 2 30 times 2 175Letter 20000 16 26 35 times 26 3870Pendigits 10992 16 9 60 times 9 3500Pima-Indians 768 8 2 100 times 2 325WDBC 569 30 2 35 times 2 345Wine 178 12 3 30 times 3 80N number of samples DIM number of dimensions Nc number of classesNtrain number of training data and Ntest number of testing data

Input K and m

Initialize W1 isin RDIMtimesm such that WTW I t 1--------------------------------Phase 1------------------------------

(1) Eigen decomposition [ED]⟵eig(K) D⟵sort(D) E⟵sort(E) A ED12--------------------------------Phase 2------------------------------While not converge

(2) M 1113936Nj1ajsign(aT

j W)(3) Compute the SVD of M as M UΛVT Let Wt+1 U[Im 0(DIMminusm)timesm]VT(4) t t + 1

endOutput Wt+1 isin RDIMtimesm

ALGORITHM 2 KECA-L1

4 Computational Intelligence and Neuroscience

which means that Algorithm 2 is monotonically increasingAdditionally considering that objective function (14) ofKECA-L1 has an upper bound within the limited iterationsthe KECA-L1 procedure will converge

33 e Semisupervised Classier Jenssen [26] establisheda semisupervised learning (SSL) algorithm for classicationusing KECA is SSL-based classier was trained by both

labeled and unlabeled data to build the kernel matrix such thatit can map the data to KFS appropriately [26] Additionally itis based on a general modelling scheme and applicable forother variants of KECA such as OKECA and KECA-L1

More specically we are given N pairs of training dataxi yi Ni1 with samples xi isin RD and the associated labels yiIn addition there are M unlabeled data points for testingLet Xu [x1u xMu ] and Xl [x1l xNl ] denote thetesting data and training data without labels respectively

102 4 6 8

OA

()

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(a)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(b)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(c)

102 4 6 8

OA

()

55

60

65

70

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(d)

102 4 6 8Number of projection

OA

()

75

80

85

90

95

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(e)

102 4 6 8

OA

()

Number of projection

40

60

80

100

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(f )

Figure 1 Overall accuracy obtained by the PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 using dierent UCI databases with dierentnumbers of extracted features (a) Ionosphere (b) Letter (c) Pendigits (d) Pima-Indians (e) WDBC and (f) Wine

Computational Intelligence and Neuroscience 5

thus we can obtain an overall matrix X [Xu Xl] en weconstruct the kernel matrix K derived from X using (6)K isin R(N+M)times(N+M) which plays as the input of Algorithm 2After the iteration procedure of nongreedy L1-norm maxi-mization we obtain a projection ofX⟶ 1113957E [1113957Eu

1113957El]mtimes(M+N)

onto m orthogonal axes where 1113957Eu [1113957eu1 1113957eu

M] and1113957El [1113957el

1 1113957elN] In other words 1113957eu

i and 1113957elj are the low-

dimensional representations of each testing data point xiu

and the training one xj

l respectively Assume that xlowastu is anarbitrary data point to be tested If it satisfies

1113957elowastu minus 1113957e

j

l

2 min

h1N1113957elowastu minus 1113957e

hl

2 (26)

then xlowastu is assigned to the same class with the jth data pointof Xl

4 Experiments

is section shows the performance of the proposed KECA-L1 compared with the classical KECA [6] and OKECA [21]for real-world data classification using the SSL-based clas-sifier illustrated in Section 33 Several recent techniquessuch as PCA-L1 [27] and KPCA-L1 [28] are also included forcomparison e rationale to select these methods is thatprevious studies related to DR found that they can produceimpressive results [27ndash29] We implement the experimentson a wide range of real-world datasets (1) six differentdatasets from the University California Irvine (UCI)Machine Learning Repository (available at httparchiveicsuciedumldatasetshtml) and (2) 9 different software pro-jects with 34 releases from the PROMISE data repository(available at httpopenscienceusrepo) e MATLABsource code for running KECA and OKECA uploaded byIzquierdo-Verdiguier et al [21] is available at httpispuvessoft_featurehtml e coefficients set for PCA-L1 andKPCA-L1 is the same with [27 28] All of the experimentsare all performed by MATLAB R2012a on a PC with InterCore i5 CPU 4 GB memory and Windows 7 operatingsystem

41 Experiments on UCI Datasets e experiments areconducted on six datasets from the UCI the Inonospheredataset is a binary classification problem of whether theradar signal can describe the structure of free electrons in theionosphere or not the Letter dataset is to assign each black-and-white rectangular pixel display to one of the 26 capitalletters in the English alphabet the Pendigits handles therecognition of pen-based handwritten digits the Pima-Indians data set constitutes a clinical problem of diabetesdiagnosis in patients from clinical variables the WDBCdataset is another clinical problem for the diagnosis of breastcancer in malignant or benign classes and the Wine datasetis the result of a chemical analysis of wines grown in thesame region in Italy but derived from three different cul-tivars Table 1 shows the details of them In the subsequentexperiments we just utilized the simplest linear classifier[30] e theory of maximizing maximum likelihood (ML)[31] is selected as the rule for selecting bandwidth coefficientas suggested in [21]

Table 2 Descriptions of data attributes

Attribute DescriptionWMC Weighted methods per classAMC Average method ComplexityAVG_CC Mean values of methods in the same classCA Afferent couplingsCAM Cohesion among methods of classCBM Coupling between MethodsCBO Coupling between object classesCE Efferent couplingsDAM Data access MetricDIT Depth of inheritance treeIC Inheritance CouplingLCOM Lack of cohesion in MethodsLCOM3 Normalized version of LCOMLOC Lines of codeMAX_CC Maximum values of methods in the same classMFA Measure of function AbstractionMOA Measure of AggregationNOC Number of ChildrenNPM Number of public MethodsRFC Response for a classBug Number of bugs detected in the class

Table 3 Descriptions of software data

Releases Classes FP FPAnt-13 125 20 0160Ant-14 178 40 0225Ant-15 293 32 0109Ant-16 351 92 0262Ant-17 745 166 0223Camel-10 339 13 0038Camel-12 608 216 0355Camel-14 872 145 0166Camel-16 965 188 0195Ivy-11 111 63 0568Ivy-14 241 16 0066Ivy-20 352 40 0114Jedit-32 272 90 0331Jedit-40 306 75 0245Lucene-20 195 91 0467Lucene-22 247 144 0583Lucene-24 340 203 0597Poi-15 237 141 0595Poi-20 314 37 0118Poi-25 385 248 0644Poi-30 442 281 0636Synapse-10 157 16 0102Synapse-11 222 60 0270Synapse-12 256 86 0336Synapse-14 196 147 0750Synapse-15 214 142 0664Synapse-16 229 78 0341Xalan-24 723 110 0152Xalan-25 803 387 0482Xalan-26 885 411 0464Xerces-init 162 77 0475Xerces-12 440 71 0161Xerces-13 453 69 0152Xerces-14 588 437 0743

6 Computational Intelligence and Neuroscience

e implementation of KECA-L1 and other methods isrepeated using all the selected datasets with respect to dif-ferent numbers of components for 10 times We have uti-lized the overall classication accuracy (OA) to evaluate theperformance of dierent algorithms on the classicationOA is dened as the total number of samples correctlyassigned in percentage terms which is within [0 1] andindicates better quality with larger values Figure 1 presentsthe average OA curves obtained by the aforementionedalgorithms for these six real datasets It can be observed fromFigure 1 that OKECA is superior to KECA PCA-L1 andKPCA-L1 except for solving Letter issue is is probablybecause DR performed by OKECA not only can reveal thestructure related to the most Renyi entropy of the originaldata but also consider the rotational invariance property[21] In addition KECA-L1 outperforms the other methodsbesides of OKECA is may be attributed to the robustnessof L1-norm to outliers compared with that of the L2-normIn Figure 1(c) OKECA seems to obtain nearly the sameresults with KECA-L1rsquos However the average running time(in hours) of OKECA in the Pendigits is 37384 times morethan that of KECA-L1 1339

42 Experiments on Software Projects In software engi-neering it is usually dipoundcult to test a software projectcompletely and thoroughly with the limited resources [32]Software defect prediction (SDP) may provide a relativelyacceptable solution to this problem It can allocate the

limited test resources eectively by categorizing the softwaremodules into two classes nonfault-prone (NFP) or fault-prone (FP) according to 21 software metrics (Table 2)

is section aims to employ KECA-based methods toreduce the selected software data (Table 3) dimensions andthen utilize the SSL-based classier combined with thesupport vector machine [33] to classify each softwaremodule as NFP or FP e bandwidth coepoundcient set is stillrestricted to the rule of ML PCA-L1 and KPCA-L1 areinvolved as a benchmarking yardstick ere are 34 groupsof tests for each release in Table 3 e most suitable releases[34] from dierent software projects are selected as trainingdata We evaluate the performance of dierent selectedmethods on SDP in terms of recall (R) precision (P) andF-measure (F) [35 36] e F-measure is dened as

F 2 times precsion times recallprecsion + recall

(27)

where

Precsion TP

TP + FP

Recall TP

TP + FN

(28)

In (28) FN (ie false negative) means that buggyclasses are wrongly classied to be nonfaulty while FP(ie false positive) means nonbuggy classes are wronglyclassied to be faulty TP (ie true positive) refer to

R F0

01

02

03

04

05

06

07

08

09

1PCA-L1

P F0

01

02

03

04

05

06

07

08

09

1KPCA-L1

R P F0

01

02

03

04

05

06

07

08

09

1KECA

R P F0

01

02

03

04

05

06

07

08

09

1OKECA

R P F0

01

02

03

04

05

06

07

08

09

1KECA-L1

R P

Figure 2 e standardized boxplots of the performance achieved by PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 respectivelyFrom the bottom to the top of a standardized box plot minimum rst quartile median third quartile and maximum

Computational Intelligence and Neuroscience 7

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 5: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

which means that Algorithm 2 is monotonically increasingAdditionally considering that objective function (14) ofKECA-L1 has an upper bound within the limited iterationsthe KECA-L1 procedure will converge

33 e Semisupervised Classier Jenssen [26] establisheda semisupervised learning (SSL) algorithm for classicationusing KECA is SSL-based classier was trained by both

labeled and unlabeled data to build the kernel matrix such thatit can map the data to KFS appropriately [26] Additionally itis based on a general modelling scheme and applicable forother variants of KECA such as OKECA and KECA-L1

More specically we are given N pairs of training dataxi yi Ni1 with samples xi isin RD and the associated labels yiIn addition there are M unlabeled data points for testingLet Xu [x1u xMu ] and Xl [x1l xNl ] denote thetesting data and training data without labels respectively

102 4 6 8

OA

()

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(a)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(b)

102 4 6 8

OA

()

20

40

60

80

100

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(c)

102 4 6 8

OA

()

55

60

65

70

Number of projection

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(d)

102 4 6 8Number of projection

OA

()

75

80

85

90

95

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(e)

102 4 6 8

OA

()

Number of projection

40

60

80

100

PCA-L1KPCA-L1KECA

OKECAKECA-L1

(f )

Figure 1 Overall accuracy obtained by the PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 using dierent UCI databases with dierentnumbers of extracted features (a) Ionosphere (b) Letter (c) Pendigits (d) Pima-Indians (e) WDBC and (f) Wine

Computational Intelligence and Neuroscience 5

thus we can obtain an overall matrix X [Xu Xl] en weconstruct the kernel matrix K derived from X using (6)K isin R(N+M)times(N+M) which plays as the input of Algorithm 2After the iteration procedure of nongreedy L1-norm maxi-mization we obtain a projection ofX⟶ 1113957E [1113957Eu

1113957El]mtimes(M+N)

onto m orthogonal axes where 1113957Eu [1113957eu1 1113957eu

M] and1113957El [1113957el

1 1113957elN] In other words 1113957eu

i and 1113957elj are the low-

dimensional representations of each testing data point xiu

and the training one xj

l respectively Assume that xlowastu is anarbitrary data point to be tested If it satisfies

1113957elowastu minus 1113957e

j

l

2 min

h1N1113957elowastu minus 1113957e

hl

2 (26)

then xlowastu is assigned to the same class with the jth data pointof Xl

4 Experiments

is section shows the performance of the proposed KECA-L1 compared with the classical KECA [6] and OKECA [21]for real-world data classification using the SSL-based clas-sifier illustrated in Section 33 Several recent techniquessuch as PCA-L1 [27] and KPCA-L1 [28] are also included forcomparison e rationale to select these methods is thatprevious studies related to DR found that they can produceimpressive results [27ndash29] We implement the experimentson a wide range of real-world datasets (1) six differentdatasets from the University California Irvine (UCI)Machine Learning Repository (available at httparchiveicsuciedumldatasetshtml) and (2) 9 different software pro-jects with 34 releases from the PROMISE data repository(available at httpopenscienceusrepo) e MATLABsource code for running KECA and OKECA uploaded byIzquierdo-Verdiguier et al [21] is available at httpispuvessoft_featurehtml e coefficients set for PCA-L1 andKPCA-L1 is the same with [27 28] All of the experimentsare all performed by MATLAB R2012a on a PC with InterCore i5 CPU 4 GB memory and Windows 7 operatingsystem

41 Experiments on UCI Datasets e experiments areconducted on six datasets from the UCI the Inonospheredataset is a binary classification problem of whether theradar signal can describe the structure of free electrons in theionosphere or not the Letter dataset is to assign each black-and-white rectangular pixel display to one of the 26 capitalletters in the English alphabet the Pendigits handles therecognition of pen-based handwritten digits the Pima-Indians data set constitutes a clinical problem of diabetesdiagnosis in patients from clinical variables the WDBCdataset is another clinical problem for the diagnosis of breastcancer in malignant or benign classes and the Wine datasetis the result of a chemical analysis of wines grown in thesame region in Italy but derived from three different cul-tivars Table 1 shows the details of them In the subsequentexperiments we just utilized the simplest linear classifier[30] e theory of maximizing maximum likelihood (ML)[31] is selected as the rule for selecting bandwidth coefficientas suggested in [21]

Table 2 Descriptions of data attributes

Attribute DescriptionWMC Weighted methods per classAMC Average method ComplexityAVG_CC Mean values of methods in the same classCA Afferent couplingsCAM Cohesion among methods of classCBM Coupling between MethodsCBO Coupling between object classesCE Efferent couplingsDAM Data access MetricDIT Depth of inheritance treeIC Inheritance CouplingLCOM Lack of cohesion in MethodsLCOM3 Normalized version of LCOMLOC Lines of codeMAX_CC Maximum values of methods in the same classMFA Measure of function AbstractionMOA Measure of AggregationNOC Number of ChildrenNPM Number of public MethodsRFC Response for a classBug Number of bugs detected in the class

Table 3 Descriptions of software data

Releases Classes FP FPAnt-13 125 20 0160Ant-14 178 40 0225Ant-15 293 32 0109Ant-16 351 92 0262Ant-17 745 166 0223Camel-10 339 13 0038Camel-12 608 216 0355Camel-14 872 145 0166Camel-16 965 188 0195Ivy-11 111 63 0568Ivy-14 241 16 0066Ivy-20 352 40 0114Jedit-32 272 90 0331Jedit-40 306 75 0245Lucene-20 195 91 0467Lucene-22 247 144 0583Lucene-24 340 203 0597Poi-15 237 141 0595Poi-20 314 37 0118Poi-25 385 248 0644Poi-30 442 281 0636Synapse-10 157 16 0102Synapse-11 222 60 0270Synapse-12 256 86 0336Synapse-14 196 147 0750Synapse-15 214 142 0664Synapse-16 229 78 0341Xalan-24 723 110 0152Xalan-25 803 387 0482Xalan-26 885 411 0464Xerces-init 162 77 0475Xerces-12 440 71 0161Xerces-13 453 69 0152Xerces-14 588 437 0743

6 Computational Intelligence and Neuroscience

e implementation of KECA-L1 and other methods isrepeated using all the selected datasets with respect to dif-ferent numbers of components for 10 times We have uti-lized the overall classication accuracy (OA) to evaluate theperformance of dierent algorithms on the classicationOA is dened as the total number of samples correctlyassigned in percentage terms which is within [0 1] andindicates better quality with larger values Figure 1 presentsthe average OA curves obtained by the aforementionedalgorithms for these six real datasets It can be observed fromFigure 1 that OKECA is superior to KECA PCA-L1 andKPCA-L1 except for solving Letter issue is is probablybecause DR performed by OKECA not only can reveal thestructure related to the most Renyi entropy of the originaldata but also consider the rotational invariance property[21] In addition KECA-L1 outperforms the other methodsbesides of OKECA is may be attributed to the robustnessof L1-norm to outliers compared with that of the L2-normIn Figure 1(c) OKECA seems to obtain nearly the sameresults with KECA-L1rsquos However the average running time(in hours) of OKECA in the Pendigits is 37384 times morethan that of KECA-L1 1339

42 Experiments on Software Projects In software engi-neering it is usually dipoundcult to test a software projectcompletely and thoroughly with the limited resources [32]Software defect prediction (SDP) may provide a relativelyacceptable solution to this problem It can allocate the

limited test resources eectively by categorizing the softwaremodules into two classes nonfault-prone (NFP) or fault-prone (FP) according to 21 software metrics (Table 2)

is section aims to employ KECA-based methods toreduce the selected software data (Table 3) dimensions andthen utilize the SSL-based classier combined with thesupport vector machine [33] to classify each softwaremodule as NFP or FP e bandwidth coepoundcient set is stillrestricted to the rule of ML PCA-L1 and KPCA-L1 areinvolved as a benchmarking yardstick ere are 34 groupsof tests for each release in Table 3 e most suitable releases[34] from dierent software projects are selected as trainingdata We evaluate the performance of dierent selectedmethods on SDP in terms of recall (R) precision (P) andF-measure (F) [35 36] e F-measure is dened as

F 2 times precsion times recallprecsion + recall

(27)

where

Precsion TP

TP + FP

Recall TP

TP + FN

(28)

In (28) FN (ie false negative) means that buggyclasses are wrongly classied to be nonfaulty while FP(ie false positive) means nonbuggy classes are wronglyclassied to be faulty TP (ie true positive) refer to

R F0

01

02

03

04

05

06

07

08

09

1PCA-L1

P F0

01

02

03

04

05

06

07

08

09

1KPCA-L1

R P F0

01

02

03

04

05

06

07

08

09

1KECA

R P F0

01

02

03

04

05

06

07

08

09

1OKECA

R P F0

01

02

03

04

05

06

07

08

09

1KECA-L1

R P

Figure 2 e standardized boxplots of the performance achieved by PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 respectivelyFrom the bottom to the top of a standardized box plot minimum rst quartile median third quartile and maximum

Computational Intelligence and Neuroscience 7

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 6: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

thus we can obtain an overall matrix X [Xu Xl] en weconstruct the kernel matrix K derived from X using (6)K isin R(N+M)times(N+M) which plays as the input of Algorithm 2After the iteration procedure of nongreedy L1-norm maxi-mization we obtain a projection ofX⟶ 1113957E [1113957Eu

1113957El]mtimes(M+N)

onto m orthogonal axes where 1113957Eu [1113957eu1 1113957eu

M] and1113957El [1113957el

1 1113957elN] In other words 1113957eu

i and 1113957elj are the low-

dimensional representations of each testing data point xiu

and the training one xj

l respectively Assume that xlowastu is anarbitrary data point to be tested If it satisfies

1113957elowastu minus 1113957e

j

l

2 min

h1N1113957elowastu minus 1113957e

hl

2 (26)

then xlowastu is assigned to the same class with the jth data pointof Xl

4 Experiments

is section shows the performance of the proposed KECA-L1 compared with the classical KECA [6] and OKECA [21]for real-world data classification using the SSL-based clas-sifier illustrated in Section 33 Several recent techniquessuch as PCA-L1 [27] and KPCA-L1 [28] are also included forcomparison e rationale to select these methods is thatprevious studies related to DR found that they can produceimpressive results [27ndash29] We implement the experimentson a wide range of real-world datasets (1) six differentdatasets from the University California Irvine (UCI)Machine Learning Repository (available at httparchiveicsuciedumldatasetshtml) and (2) 9 different software pro-jects with 34 releases from the PROMISE data repository(available at httpopenscienceusrepo) e MATLABsource code for running KECA and OKECA uploaded byIzquierdo-Verdiguier et al [21] is available at httpispuvessoft_featurehtml e coefficients set for PCA-L1 andKPCA-L1 is the same with [27 28] All of the experimentsare all performed by MATLAB R2012a on a PC with InterCore i5 CPU 4 GB memory and Windows 7 operatingsystem

41 Experiments on UCI Datasets e experiments areconducted on six datasets from the UCI the Inonospheredataset is a binary classification problem of whether theradar signal can describe the structure of free electrons in theionosphere or not the Letter dataset is to assign each black-and-white rectangular pixel display to one of the 26 capitalletters in the English alphabet the Pendigits handles therecognition of pen-based handwritten digits the Pima-Indians data set constitutes a clinical problem of diabetesdiagnosis in patients from clinical variables the WDBCdataset is another clinical problem for the diagnosis of breastcancer in malignant or benign classes and the Wine datasetis the result of a chemical analysis of wines grown in thesame region in Italy but derived from three different cul-tivars Table 1 shows the details of them In the subsequentexperiments we just utilized the simplest linear classifier[30] e theory of maximizing maximum likelihood (ML)[31] is selected as the rule for selecting bandwidth coefficientas suggested in [21]

Table 2 Descriptions of data attributes

Attribute DescriptionWMC Weighted methods per classAMC Average method ComplexityAVG_CC Mean values of methods in the same classCA Afferent couplingsCAM Cohesion among methods of classCBM Coupling between MethodsCBO Coupling between object classesCE Efferent couplingsDAM Data access MetricDIT Depth of inheritance treeIC Inheritance CouplingLCOM Lack of cohesion in MethodsLCOM3 Normalized version of LCOMLOC Lines of codeMAX_CC Maximum values of methods in the same classMFA Measure of function AbstractionMOA Measure of AggregationNOC Number of ChildrenNPM Number of public MethodsRFC Response for a classBug Number of bugs detected in the class

Table 3 Descriptions of software data

Releases Classes FP FPAnt-13 125 20 0160Ant-14 178 40 0225Ant-15 293 32 0109Ant-16 351 92 0262Ant-17 745 166 0223Camel-10 339 13 0038Camel-12 608 216 0355Camel-14 872 145 0166Camel-16 965 188 0195Ivy-11 111 63 0568Ivy-14 241 16 0066Ivy-20 352 40 0114Jedit-32 272 90 0331Jedit-40 306 75 0245Lucene-20 195 91 0467Lucene-22 247 144 0583Lucene-24 340 203 0597Poi-15 237 141 0595Poi-20 314 37 0118Poi-25 385 248 0644Poi-30 442 281 0636Synapse-10 157 16 0102Synapse-11 222 60 0270Synapse-12 256 86 0336Synapse-14 196 147 0750Synapse-15 214 142 0664Synapse-16 229 78 0341Xalan-24 723 110 0152Xalan-25 803 387 0482Xalan-26 885 411 0464Xerces-init 162 77 0475Xerces-12 440 71 0161Xerces-13 453 69 0152Xerces-14 588 437 0743

6 Computational Intelligence and Neuroscience

e implementation of KECA-L1 and other methods isrepeated using all the selected datasets with respect to dif-ferent numbers of components for 10 times We have uti-lized the overall classication accuracy (OA) to evaluate theperformance of dierent algorithms on the classicationOA is dened as the total number of samples correctlyassigned in percentage terms which is within [0 1] andindicates better quality with larger values Figure 1 presentsthe average OA curves obtained by the aforementionedalgorithms for these six real datasets It can be observed fromFigure 1 that OKECA is superior to KECA PCA-L1 andKPCA-L1 except for solving Letter issue is is probablybecause DR performed by OKECA not only can reveal thestructure related to the most Renyi entropy of the originaldata but also consider the rotational invariance property[21] In addition KECA-L1 outperforms the other methodsbesides of OKECA is may be attributed to the robustnessof L1-norm to outliers compared with that of the L2-normIn Figure 1(c) OKECA seems to obtain nearly the sameresults with KECA-L1rsquos However the average running time(in hours) of OKECA in the Pendigits is 37384 times morethan that of KECA-L1 1339

42 Experiments on Software Projects In software engi-neering it is usually dipoundcult to test a software projectcompletely and thoroughly with the limited resources [32]Software defect prediction (SDP) may provide a relativelyacceptable solution to this problem It can allocate the

limited test resources eectively by categorizing the softwaremodules into two classes nonfault-prone (NFP) or fault-prone (FP) according to 21 software metrics (Table 2)

is section aims to employ KECA-based methods toreduce the selected software data (Table 3) dimensions andthen utilize the SSL-based classier combined with thesupport vector machine [33] to classify each softwaremodule as NFP or FP e bandwidth coepoundcient set is stillrestricted to the rule of ML PCA-L1 and KPCA-L1 areinvolved as a benchmarking yardstick ere are 34 groupsof tests for each release in Table 3 e most suitable releases[34] from dierent software projects are selected as trainingdata We evaluate the performance of dierent selectedmethods on SDP in terms of recall (R) precision (P) andF-measure (F) [35 36] e F-measure is dened as

F 2 times precsion times recallprecsion + recall

(27)

where

Precsion TP

TP + FP

Recall TP

TP + FN

(28)

In (28) FN (ie false negative) means that buggyclasses are wrongly classied to be nonfaulty while FP(ie false positive) means nonbuggy classes are wronglyclassied to be faulty TP (ie true positive) refer to

R F0

01

02

03

04

05

06

07

08

09

1PCA-L1

P F0

01

02

03

04

05

06

07

08

09

1KPCA-L1

R P F0

01

02

03

04

05

06

07

08

09

1KECA

R P F0

01

02

03

04

05

06

07

08

09

1OKECA

R P F0

01

02

03

04

05

06

07

08

09

1KECA-L1

R P

Figure 2 e standardized boxplots of the performance achieved by PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 respectivelyFrom the bottom to the top of a standardized box plot minimum rst quartile median third quartile and maximum

Computational Intelligence and Neuroscience 7

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 7: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

e implementation of KECA-L1 and other methods isrepeated using all the selected datasets with respect to dif-ferent numbers of components for 10 times We have uti-lized the overall classication accuracy (OA) to evaluate theperformance of dierent algorithms on the classicationOA is dened as the total number of samples correctlyassigned in percentage terms which is within [0 1] andindicates better quality with larger values Figure 1 presentsthe average OA curves obtained by the aforementionedalgorithms for these six real datasets It can be observed fromFigure 1 that OKECA is superior to KECA PCA-L1 andKPCA-L1 except for solving Letter issue is is probablybecause DR performed by OKECA not only can reveal thestructure related to the most Renyi entropy of the originaldata but also consider the rotational invariance property[21] In addition KECA-L1 outperforms the other methodsbesides of OKECA is may be attributed to the robustnessof L1-norm to outliers compared with that of the L2-normIn Figure 1(c) OKECA seems to obtain nearly the sameresults with KECA-L1rsquos However the average running time(in hours) of OKECA in the Pendigits is 37384 times morethan that of KECA-L1 1339

42 Experiments on Software Projects In software engi-neering it is usually dipoundcult to test a software projectcompletely and thoroughly with the limited resources [32]Software defect prediction (SDP) may provide a relativelyacceptable solution to this problem It can allocate the

limited test resources eectively by categorizing the softwaremodules into two classes nonfault-prone (NFP) or fault-prone (FP) according to 21 software metrics (Table 2)

is section aims to employ KECA-based methods toreduce the selected software data (Table 3) dimensions andthen utilize the SSL-based classier combined with thesupport vector machine [33] to classify each softwaremodule as NFP or FP e bandwidth coepoundcient set is stillrestricted to the rule of ML PCA-L1 and KPCA-L1 areinvolved as a benchmarking yardstick ere are 34 groupsof tests for each release in Table 3 e most suitable releases[34] from dierent software projects are selected as trainingdata We evaluate the performance of dierent selectedmethods on SDP in terms of recall (R) precision (P) andF-measure (F) [35 36] e F-measure is dened as

F 2 times precsion times recallprecsion + recall

(27)

where

Precsion TP

TP + FP

Recall TP

TP + FN

(28)

In (28) FN (ie false negative) means that buggyclasses are wrongly classied to be nonfaulty while FP(ie false positive) means nonbuggy classes are wronglyclassied to be faulty TP (ie true positive) refer to

R F0

01

02

03

04

05

06

07

08

09

1PCA-L1

P F0

01

02

03

04

05

06

07

08

09

1KPCA-L1

R P F0

01

02

03

04

05

06

07

08

09

1KECA

R P F0

01

02

03

04

05

06

07

08

09

1OKECA

R P F0

01

02

03

04

05

06

07

08

09

1KECA-L1

R P

Figure 2 e standardized boxplots of the performance achieved by PCA-L1 KPCA-L1 KECA OKECA and KECA-L1 respectivelyFrom the bottom to the top of a standardized box plot minimum rst quartile median third quartile and maximum

Computational Intelligence and Neuroscience 7

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 8: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

correctly classified buggy classes [34] Values of RecallPrecision and F-measure range from 0 to 1 and highervalues indicate better classification results

Figure 2 shows the results using box-plot analysis FromFigure 2 considering theminimummaximummedian firstquartile and third quartile of the boxes we find that KECA-L1 performs better than the other methods in generalSpecifically KECA-L1 can obtain acceptable results in ex-periments for SDP compared with the benchmarks proposedin Reference [34] since the median values of the boxes withrespect to R and F are close to 07 and more than 05 re-spectively On the contrary not only KECA and OKECA butPCA-L1 and KPCA-L1 cannot meet these criteriaereforeall of the results validate the robustness of KECA-L1

5 Conclusions

is paper proposes a new extension to the OKECA ap-proach for dimensional reduction e new method(ie KECA-L1) employs L1-norm and a rotation matrix tomaximize information potential of the input data In orderto find the optimal entropic kernel components motivatedby Nie et alrsquos algorithm [23] we design a nongreedy iterativeprocess which has much faster convergence than OKECArsquosMoreover a general semisupervised learning algorithm hasbeen established for classification using KECA-L1 Com-pared with several recently proposed KECA- and PCA-basedapproaches this SSL-based classifier can remarkably pro-mote the performance on real-world datasets classificationand software defect prediction

Although KECA-L1 has achieved impressive success onreal examples several problems still should be consideredand solved in the future researche efficiency of KECA-L1has to be optimized for it is relatively time-consumingcompared with most existing PCA-based methods Addi-tionally the utilization of KECA-L1 is expected to appear ineach pattern analysis algorithm previously based on PCAapproaches

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

is work was supported by the National Natural ScienceFoundation of China (Grant no 61702544) and NaturalScience Foundation of Jiangsu Province of China (Grant noBK20160769)

Supplementary Materials

e MATLAB toolbox of KECA-L1 is available (Supple-mentary Materials)

References

[1] Q Gao S Xu F Chen C Ding X Gao and Y Li ldquoR₁-2-DPCA and face recognitionrdquo IEEE Transactions on Cyber-netics vol 99 pp 1ndash12 2018

[2] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5 pp 411ndash430 2000

[3] P N Belhumeur J P Hespanha and D J KriegmanldquoEigenfaces vs Fisherfaces recognition using class specificlinear projectionrdquo European Conference on Computer Visionvol 1 pp 43ndash58 1996

[4] M Turk and A Pentland ldquoEigenfaces for recognitionrdquoJournal of Cognitive Neuroscience vol 3 no 1 pp 71ndash861991

[5] J H Friedman and J W Tukey ldquoA projection pursuit al-gorithm for exploratory data analysisrdquo IEEE Transactions onComputers vol 23 no 9 pp 881ndash890 1974

[6] R Jenssen ldquoKernel entropy component analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 32 no 5 pp 847ndash860 2010

[7] B Scholkopf A Smola and K R Muller ldquoNonlinear com-ponent analysis as a kernel eigenvalue problemrdquo NeuralComputation vol 10 no 5 pp 1299ndash1319 1998

[8] S Mika A Smola and M Scholz ldquoKernel PCA and de-noising in feature spacesrdquo Conference on Advances inNeural Information Processing Systems II vol 11pp 536ndash542 1999

[9] J Yang D Zhang A F Frangi and J Y Yang ldquoTwo-dimensional PCA a new approach to appearance-basedface representation and recognitionrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 26 no 1pp 131ndash137 2004

[10] K Nishino S K Nayar and T Jebara ldquoClustered blockwisePCA for representing visual datardquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 27 no 10pp 1675ndash1679 2005

[11] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition pp 506ndash513 Washington DC USAJune-July 2004

[12] A DrsquoAspremont L El Ghaoui M I Jordan andG R G Lanckriet ldquoA direct formulation for sparse PCA usingsemidefinite programmingrdquo SIAM Review vol 49 no 3pp 434ndash448 2007

[13] M Luo F Nie X Chang Y Yang A Hauptmann andQ Zheng ldquoAvoiding optimal mean robust PCA2DPCA withnon-greedy L1-norm maximizationrdquo in Proceedings of In-ternational Joint Conference on Artificial Intelligencepp 1802ndash1808 New York NY USA July 2016

[14] Q Yu R Wang X Yang B N Li and M Yao ldquoDiagonalprincipal component analysis with non-greedy L1-normmaximization for face recognitionrdquo Neurocomputingvol 171 pp 57ndash62 2016

[15] B N Li Q Yu RWang K XiangMWang and X Li ldquoBlockprincipal component analysis with nongreedy L1-normmaximizationrdquo IEEE Transactions on Cybernetics vol 46no 11 pp 2543ndash2547 2016

[16] F Nie and H Huang ldquoNon-greedy L21-norm maximizationfor principal component analysisrdquo 2016 httparxivorgabs160308293v1

[17] F Nie J Yuan and H Huang ldquoOptimal mean robustprincipal component analysisrdquo in Proceedings of International

8 Computational Intelligence and Neuroscience

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 9: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

Conference on Machine Learning pp 1062ndash1070 BeijingChina June 2014

[18] R Wang F Nie R Hong X Chang X Yang and W YuldquoFast and orthogonal locality preserving projections for di-mensionality reductionrdquo IEEE Transactions on Image Pro-cessing vol 26 no 10 pp 5019ndash5030 2017

[19] C Zhang F Nie and S Xiang ldquoA general kernelizationframework for learning algorithms based on kernel PCArdquoNeurocomputing vol 73 no 4ndash6 pp 959ndash967 2010

[20] Z Zhang and E R Hancock ldquoKernel entropy-based un-supervised spectral feature selectionrdquo International Journal ofPattern Recognition and Artificial Intelligence vol 26 no 5article 1260002 2012

[21] E Izquierdo-Verdiguier V Laparra R Jenssen L Gomez-Chova and G Camps-Valls ldquoOptimized kernel entropycomponentsrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 6 pp 1466ndash1472 2017

[22] X Li Y Pang and Y Yuan ldquoL1-norm-based 2DPCArdquo IEEETransactions on Systems Man and Cybernetics Part B vol 40no 4 pp 1170ndash1175 2010

[23] F Nie H Huang C Ding D Luo and H Wang ldquoRobustprincipal component analysis with non-greedy L1-normmaximizationrdquo in Proceedings of International Joint Confer-ence on Artificial Intelligence pp 1433ndash1438 BarcelonaCatalonia Spain July 2011

[24] R Wang F Nie X Yang F Gao and M Yao ldquoRobust2DPCA with non-greedy L1-norm maximization for imageanalysisrdquo IEEE Transactions on Cybernetics vol 45 no 5pp 1108ndash1112 2015

[25] B H Shekar M Sharmila Kumari L M Mestetskiy andN F Dyshkant ldquoFace recognition using kernel entropycomponent analysisrdquo Neurocomputing vol 74 no 6pp 1053ndash1057 2011

[26] R Jenssen ldquoKernel entropy component analysis new theoryand semi-supervised learningrdquo in Proceedings of IEEE In-ternational Workshop on Machine Learning for Signal Pro-cessing pp 1ndash6 Beijing China September 2011

[27] N Kwak ldquoPrincipal component analysis based on L1-normmaximizationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 30 no 9 pp 1672ndash1680 2008

[28] Y Xiao HWang W Xu and J Zhou ldquoL1 norm based KPCAfor novelty detectionrdquo Pattern Recognition vol 46 no 1pp 389ndash396 2013

[29] Y Xiao H Wang and W Xu ldquoParameter selection ofgaussian kernel for one-class SVMrdquo IEEE Transactions onCybernetics vol 45 no 5 pp 941ndash953 2015

[30] W Krzanowski Principles of Multivariate Analysis Vol 23Oxford University Press (OUP) Oxford UK 2000

[31] Duin ldquoOn the choice of smoothing parameters for Parzenestimators of probability density functionsrdquo IEEE Trans-actions on Computers vol 25 no 11 pp 1175ndash1179 1976

[32] W Liu S Liu Q Gu J Chen X Chen and D ChenldquoEmpirical studies of a two-stage data preprocessing approachfor software fault predictionrdquo IEEE Transactions on Re-liability vol 65 no 1 pp 38ndash53 2016

[33] S Lessmann B Baesens C Mues and S Pietsch ldquoBench-marking classification models for software defect predictiona proposed framework and novel findingsrdquo IEEE Trans-actions on Software Engineering vol 34 no 4 pp 485ndash4962008

[34] Z He F Shu Y Yang M Li and Q Wang ldquoAn in-vestigation on the feasibility of cross-project defect pre-dictionrdquo Automated Software Engineering vol 19 no 2pp 167ndash199 2012

[35] P He B Li X Liu J Chen and YMa ldquoAn empirical study onsoftware defect prediction with a simplified metric setrdquo In-formation and Software Technology vol 59 pp 170ndash190 2015

[36] Y Wu S Huang H Ji C Zheng and C Bai ldquoA novel Bayesdefect predictor based on information diffusion functionrdquoKnowledge-Based Systems vol 144 pp 1ndash8 2018

Computational Intelligence and Neuroscience 9

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 10: KernelEntropyComponentAnalysiswithNongreedy L1 ...downloads.hindawi.com/journals/cin/2018/6791683.pdf · ResearchArticle KernelEntropyComponentAnalysiswithNongreedy L1-NormMaximization

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom


Recommended