+ All Categories
Home > Documents > Genetic nearest feature plane

Genetic nearest feature plane

Date post: 22-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
6
Genetic nearest feature plane Loris Nanni * , Alessandra Lumini DEIS-University of Bologna, viale Risorgimento 2, 40126 Bologna, Italy Abstract The problem addressed in this paper concerns the complexity reduction of the nearest feature plane classifier, so that it may be applied also in dataset where the training set contains many patterns. This classifier considers, to classify a test pattern, the subspaces created by each combination of three training patterns. The main problem is that in dataset of high cardinality this method is unfeasible. A genetic algorithm is here used for dividing the training patterns in several clusters which centroids are used to build the feature planes used to classify the test set. The performance improvement with respect to other nearest neighbor based classifiers is validated through experiments with several benchmark datasets. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Nearest feature plane classifier; Clustering; Genetic algorithm 1. Introduction The nearest neighbor (NN) (Cover & Hart, 1967) classi- fication rule is the most popular pattern classification method but its performance, in several problems, is not comparable to the state-of-the-art classification methods (Parades & Vidal, 2006). To improve the performance of NN several modified nearest neighbor based classifiers are recently proposed in the literature (e.g. Li & Lu, 1999; Parades & Vidal, 2006; Zhou, Zhang, & Wang, 2004). In Parades and Vidal (2006) the authors propose a method named learning prototypes and distances (LPD), starting from a random selection of a subset of the training patterns, LPD iteratively adjusts both the position of these patterns and the local-metric, to minimize a given fitness function. In Parades and Vidal (2006) it is shown that LPD outperforms the performance of a number of popular prototype-based techniques such as the learning vector quantization methods (LVQ1, LVQ2 and LVQ3 (Koho- nen, 1990, 2001)). In Li and Lu (1999) the nearest feature line (NFL) clas- sifier is proposed, it classifies a testing sample by using the feature lines that link each two training patterns that belong to the same class. The main weakness is the compu- tation time with large dataset (Zheng, Zhao, & Zou, 2004, Zhou et al., 2004). To reduce the computation cost of NFL in Gao and Wang (in press) the center-based nearest neigh- bor (CNN) is proposed. CNN classifies a test pattern using only the feature lines that link the training patterns that belong to a given class and the centroid of the same class. The authors show that CNN has performance comparable to NFL but it takes much lower computation time. The nearest feature plane (NFP) classifier was proposed in Chien and Wu (2002). NFP assumes that at least three linearly independent prototype points are available for each class. The feature planes are built using three training patterns that belong to the same class. Each test pattern is projected onto each feature plane. If we suppose that in each class c there are n c patterns then the number of pat- tern projections is X c n c ðn c 1Þðn c 2Þ=6 0957-4174/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.10.009 * Corresponding author. E-mail address: [email protected] (L. Nanni). www.elsevier.com/locate/eswa Available online at www.sciencedirect.com Expert Systems with Applications 36 (2009) 838–843 Expert Systems with Applications
Transcript

Available online at www.sciencedirect.com

www.elsevier.com/locate/eswa

Expert Systems with Applications 36 (2009) 838–843

Expert Systemswith Applications

Genetic nearest feature plane

Loris Nanni *, Alessandra Lumini

DEIS-University of Bologna, viale Risorgimento 2, 40126 Bologna, Italy

Abstract

The problem addressed in this paper concerns the complexity reduction of the nearest feature plane classifier, so that it may be appliedalso in dataset where the training set contains many patterns. This classifier considers, to classify a test pattern, the subspaces created byeach combination of three training patterns. The main problem is that in dataset of high cardinality this method is unfeasible.

A genetic algorithm is here used for dividing the training patterns in several clusters which centroids are used to build the featureplanes used to classify the test set.

The performance improvement with respect to other nearest neighbor based classifiers is validated through experiments with severalbenchmark datasets.� 2007 Elsevier Ltd. All rights reserved.

Keywords: Nearest feature plane classifier; Clustering; Genetic algorithm

1. Introduction

The nearest neighbor (NN) (Cover & Hart, 1967) classi-fication rule is the most popular pattern classificationmethod but its performance, in several problems, is notcomparable to the state-of-the-art classification methods(Parades & Vidal, 2006). To improve the performance ofNN several modified nearest neighbor based classifiersare recently proposed in the literature (e.g. Li & Lu,1999; Parades & Vidal, 2006; Zhou, Zhang, & Wang,2004).

In Parades and Vidal (2006) the authors propose amethod named learning prototypes and distances (LPD),starting from a random selection of a subset of the trainingpatterns, LPD iteratively adjusts both the position of thesepatterns and the local-metric, to minimize a given fitnessfunction. In Parades and Vidal (2006) it is shown thatLPD outperforms the performance of a number of popularprototype-based techniques such as the learning vector

0957-4174/$ - see front matter � 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.eswa.2007.10.009

* Corresponding author.E-mail address: [email protected] (L. Nanni).

quantization methods (LVQ1, LVQ2 and LVQ3 (Koho-nen, 1990, 2001)).

In Li and Lu (1999) the nearest feature line (NFL) clas-sifier is proposed, it classifies a testing sample by using thefeature lines that link each two training patterns thatbelong to the same class. The main weakness is the compu-tation time with large dataset (Zheng, Zhao, & Zou, 2004,Zhou et al., 2004). To reduce the computation cost of NFLin Gao and Wang (in press) the center-based nearest neigh-bor (CNN) is proposed. CNN classifies a test pattern usingonly the feature lines that link the training patterns thatbelong to a given class and the centroid of the same class.The authors show that CNN has performance comparableto NFL but it takes much lower computation time.

The nearest feature plane (NFP) classifier was proposedin Chien and Wu (2002). NFP assumes that at least threelinearly independent prototype points are available foreach class. The feature planes are built using three trainingpatterns that belong to the same class. Each test pattern isprojected onto each feature plane. If we suppose that ineach class c there are nc patterns then the number of pat-tern projections isX

c

nc � ðnc � 1Þ � ðnc � 2Þ=6

TrainingSet

Clustering by GA

Clustering by GA

………………………...

Feature Planes generation

Feature Planes generation

VoteRule

………………………...

Fig. 1. System proposed.

Table 1Characteristics of the datasets used in the experimentations: number ofattributes (#A), number of examples (#E), number of classes (#c)

Dataset #A #E #c

Vehicle 18 840 3Ionosphere 34 351 2Breast 9 700 2Pima 8 768 2HIV 50 362 2CROMO 32 198 3Face (see below) 1800 10

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 838–843 839

For example, in a dataset of 3 classes with 25 patternsfor each class, the number of pattern projections, to classifya test pattern, is 6900.

It is clear that with a large training set it is not possible touse this classifier, in fact this classifier is mainly used in facerecognition (Orozco-Alzate & Castellanos-Domınguez,2006), where few patterns for each individual are available.

In this work, we propose a feature plane classifier basedmethod, named genetic feature plane classifier, which per-mits to apply NFP in each dataset. Instead to considerthe feature planes built using three training patterns eachthat belong to the same class, we consider the featureplanes built using the reduced set of prototypes obtainedby a genetic-based condensing algorithm (Ramon Cano,Herrera, & Lozano, 2003).

The paper is organized as follows: in Section 2 ourapproach is presented, in Section 3 the experimental resultsare presented. Finally, in Section 4 some concludingremarks are given.

1 Implemented as in GAOT MATLAB TOOLBOX www.ie.ncsu.edu/mirage/GAToolBox/gaot/.

2. Proposed algorithm

In this section we present our method, we use a geneticalgorithm for dividing the training patterns into K groups,the centroids of each group are used to build the featureplanes.

We first divide the training set T into c subsets accordingto the classification of the samples (c is the total number ofclasses), then we separately divide into cluster (Bezdek,1981), by genetic algorithm, each subset.

We test two different fitness functions in the geneticalgorithm:

– the fitness function is the error rate on the training set,we named this method as A;

– the fitness function is the error rate obtained using a 10-fold cross validation on the training set, we named thismethod as B.

Obviously the fitness function B is more reliable than thefitness function B, but its computation time is higher.Please note that the whole training set is used for trainingthe classifier that classifies the test data. The 10-foldcross-validation on the training set is used only for findingthe parameters of the classifiers.

The prototype set P = {p1, . . . ,pK} used to build the fea-ture planes is generated selecting each prototype pi as thecentroid of the patterns belonging to ith cluster. Given atest pattern z the distance between the test pattern z andthe feature plane built using the prototypes pci, pcj andpcm, which belong to the class c, is given by the followingequations:

dðz; F cijmÞ ¼ kz� pc

ijmkpc

ijm ¼ Pcijm � ðPc

ijmT � Pc

ijm�1 � Pc

ijmT � z

Pcijm ¼ ½pcipcjpcm�

To improve the performance, in the training phase, theprototype generation is performed N times, each testingpattern is classified by each of these N genetic feature planeclassifiers, finally these N classifiers are combined by the‘‘vote rule’’ (Kittler, Hatef, Duin, & Matas, 1998). As pre-processing (as in Orozco-Alzate & Castellanos-Domınguez,2006) the dataset is projected onto a principal componentsubspace (PC), where the preserved variance is 1, pleasenote that we use PC not for reducing the dimensionalityof the dataset but only to decorrelate the feature set. InFig. 1 our system is reported.

2.1. Genetic algorithm (Nanni & Maio, in press)1

Genetic algorithms are a class of optimization methodsinspired by the process of the natural evolution. Thesealgorithms operate iteratively on a population of chromo-somes, each of which represents a candidate solution to theproblem. In our encoding scheme, the chromosome C is astring whose length is determined by the number of pat-terns in the training set. Each value in the chromosomespecifies at which cluster this training pattern belongs.Each value in this array can be 0,1,2, . . . ,NK. A value of0 implies that the pattern does not belong to any cluster.The initial population is a randomly generated set of chro-mosomes. Moreover, we add a chromosome that encodethe partition of the training set obtained by the K-centresclustering algorithm (Pekalska, Duin, & Paclık, 2006).The basic operators used to guide this search are selection,crossover and mutation.

Selection: Our selection strategy was cross-generational.Assuming a population of size D (in this paper D = 50), theoffspring is double the size of the population and we select

Fig. 2. Examples of chromosomes.

840 L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 838–843

the best D individuals from the combined parent–offspringpopulation. Moreover, we run the genetic algorithm for 10generations.

Crossover: Uniform crossover is used here. The cross-over probability used in our experiments was 0.96.

Mutation: The mutation probability used here was 0.02.

3. Experiments

We perform experiments to compare the classificationperformance of our approaches with that of other nearestneighbor based classifiers. The experiments were conductedon four benchmark datasets from the UCI Repository2

(Vehicle (VE), Pima Indians Diabetes (PI), Ionosphere(IO), and Breast Cancer (BR)) on the HIV protease data-set3 (HIV) (Rognvaldsson & You, 2004), on the humanchromosomes dataset (CROMO) (Moradi & Setarehdan,2006; Nanni, 2006b), and on a face dataset (FACE). InTable 1 the characteristics of the datasets used in the exper-imentations are reported.

HIV: The dataset contains 362 octamer proteinsequences, each of which needs to be classified as an HIVprotease cleavable site or uncleavable site. An octamer pro-tein sequence is a peptide (small protein) denoted byP = P4P3P2P1P1 0P2 0P3 0P4 0. Where Pi is an amino-acidbelonging to

P(P¼ fA;C;D . . . V ;W ; Y g). Please note,

P3 and P4 0 are not used. The scissile bond is locatedbetween positions P1 and P1 0. The feature is the orthonor-mal representation projected onto a lower 50-dimensionalKarhunen–Loeve space (Lumini & Nanni, 2006a).

CROMO: Karyotyping is a standard method for pre-senting pictures of the human chromosomes for diagnos-tic purposes. Chromosome classification can be viewed asa pattern recognition problem, as feature vector we haveused the normalized density profile (Moradi & Setareh-dan, 2006), we have used the same dataset used inMoradi and Setarehdan (2006) and Nanni (2006b). Theimages used were produced in the Cytogenetic Labora-tory of Cancer Institute, Imam Hospital, Tehran, Iran.The images were acquired by a conventional photographysystem using a light microscope (Leitz, ortholux). Thechromosomes were segmented from the pictures by anexpert in the Cytogenetic Laboratory and then scanned

2 http://www.ics.uci.edu/mlearn/MLRepository.html.3 http://idelnx81.hh.se/bioinf/data.html; as feature extraction we use the

orthonormal encoding and then we project the data onto a lower 50-dimensional Karhunen–Loeve space (for details see (Nanni, 2006a)).

by a scanner (Microtek, ScanPlus 6) with a resolutionof 300 dpi. The gray scale resolution of the resulting dig-itized pictures was set to 256 levels. The dataset includes201 chromosomes. In Fig. 2 some examples from thisdataset are shown.

Face dataset: The face dataset is a dataset collected inour Biometric Systems Lab (Cappelli, Maio, & Maltoni,2002), which contains 10 individuals and 180 items perindividual (1800 images). Images have been capturedthrough a gray-scale wall-mounted camera; the face imageis automatically located as described in Cappelli et al.(2002). After cropping, the face-image size is 92 · 112 pix-els, the images are pre-processed using the method used inConnie, Jin, Ong, Ngo, and Ling (2005). The features areextracted by principal component analysis (as in Lumini& Nanni, 2006b), where the preserved variance is 0.98. Inthis database M = 60 images per individual have been cap-tured in 3 distinct sessions at least two weeks apart. Thetraining set is composed by session 1, the remaining ses-sions compose the test set. In Fig. 3 some examples fromthis dataset are shown.

To minimize the possible misleading caused by the train-ing data, the results on each dataset have been averagedover ten experiments. For each experiment we randomlyresampled the learning and the test sets (containing, respec-

Fig. 3. Samples from the face dataset: face images of three individualsacquired over the 3 sessions covering a period of 4 weeks.

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 838–843 841

tively, half of the patterns), maintaining the distribution ofthe patterns in the classes. In all the UCI datasets the fea-tures have been linearly normalized between 0 and 1 usingthe training data.

Fig. 4 reports the performance obtained by the methodstested in this paper. In Fig. 4 with

Err

Err

HIV

13.1

24.5

16.7

13

16

13 1311.3

12.3

1012141618202224

CNN NNLP

D

R-NFP

(5)

R-NFP

(10)

C-NFP

(5)

C-NFP

(10)

G-NFP(5)

G-NFP(10

)

G10P-N

FP(5)

G10P-N

FP(10)

Error Rate Err

VEI

29.532

29.7

49

39

30 28.2

33 33

27.2 27.3

25

30

35

40

45

50

55

CNN NNLP

D

R-NFP

(5)

R-NFP

(10)

C-NFP

(5)

C-NFP

(10)

G-NFP(5)

G-NFP(10

)

G10P-N

FP(5)

G10P-N

FP(10)

Error Rate

PIMA

31.2 30.528.5

38 40

32.5 33.531.5

30.227.85 27

252729313335373941

CNN NNLP

D

R-NFP

(5)

R-NFP

(10)

C-NFP

(5)

C-NFP

(10)

G-NFP(5)

G-NFP(10

)

G10P-N

FP(5)

G10P-N

FP(10)

Error Rate

FA

22 21

24

36

20222426283032343638

CNN NNLP

D

R-NFP

(5)

C-NFP

(

Error Rate

Fig. 4. Error rate obtained by the me

NN, the nearest neighbor classifier;CNN, the center-based nearest neighbor (Gao & Wang,in press);LPD, the learning prototypes and distances classifier(Parades & Vidal, 2006);

BR

5.2

4.43.7

7.6 7.8

5.95.2 5.1

4.7

3.54

4.55

5.56

6.57

7.58

CNN NNLP

D

R-NFP

(5)

R-NFP

(10)

C-NFP

(5)

C-NFP

(10)

G-NFP(5)

G-NFP(10

)

G10P-N

FP(5)

G10P-N

FP(10)

or Rate

IO

11.914.1

15.113.8

19.416 16

12.813.8

579

1113151719

CNN NNLP

D

R-NFP

(5)

R-NFP

(10)

C-NFP

(5)

C-NFP

(10)

G-NFP(5)

G-NFP(10

)

G1 0P-N

FP(5)

G1 0P-N

FP(10)

or Rate

or Rate CROMO

7.18.1

7

1715.2

14.1

10.6 10.18.6

5.055.05579

1113151719

CNN NNLP

D

R-NFP

(5)

R-NFP

(10)

C-NFP

(5)

C-NFP

(10)

G-NFP(5)

G-NFP(10

)

G10P-N

FP(5)

G10P-N

FP(10)

CE

23

27

23

5)

G-NFP(5)

G10P-N

FP(5)

thods tested in this paper.

Table 2Comparison among CNN, LPD and G10P-NFP(5)

Dataset CNN LPD G10P-NFP( 5)

Vehicle 29.5 29.7 27.2Pima 31.2 28.5 27.85HIV 13.1 16.7 11.3Ionosphere 11.9 15.1 12.8Breast 5.2 3.7 5.1CROMO 7.1 7 5.05Face 22 24 23Average error rate 18.82 19.61 17.87

842 L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 838–843

R-NFP(NK), we run NFP using NK (for each class) ran-domly extracted patterns from the training set for build-ing the feature planes;C-NFP(NK), we run NFP using the NK (for each class)centroids of the clusters (the clusters are found by the K-centres clustering algorithm (Pekalska et al., 2006)) ofthe training set for building the feature planes;G-NFP(NK), our method with N = 1, where in the GAis used the fitness function A;G10P-NFP(NK), our method with N = 5, where in theGA is used the fitness function B.

The comparison results show that G10P-NFP(NK)obtains the best performance among the tested methods.

Please note, in a dataset of 3 classes and 25 patterns foreach class, the number of pattern projections in G10P-NFP(5), to classify a test pattern, is 150 (while in standardNFP is 6900). Moreover, it is interesting to note that thegenetic clustering outperforms the K-centres clustering(G-NFP outperforms C-NFP) confirming the results ofRamon Cano et al. (2003).

Our results on the Face Dataset show that our methodcan also be used as a template selection method (Lumini& Nanni, 2006b). Please note that our methods (e.g.C-NFP) obtain, using only 5 template/user, the same per-formance obtained by the methods trained using 60 tem-plate/user.

In Table 2 we compare the performance of CNN, LPDand G10P-NFP(5).

As further experiment, we run the Wilcoxon Signed-Rank test (Demsar, 2006) for comparing the results ofCNN, LPD and G10P-NFP(5). The null hypothesis is thatthere is no difference between the accuracies of the twoclassifiers. We reject the null hypothesis (level of signifi-cance 0.05) and accept that the two classifiers have signifi-cant different accuracies. Also using this test G10P-NFP(5)outperforms LPD and CNN.

4. Conclusions

In this paper, we have proposed to use a prototype gen-eration method, based on a genetic algorithm, for obtain-ing a fast Feature Plane based classifier. The prototypesare generated by dividing the training set into K clusters,

the centroids of the patterns that belong to each clusterare selected as prototypes. Our tests show that our methodis faster than the original NFP and that the performance isbetter with respect to that obtained by other nearest neigh-bor based classifiers (NN, CNN and LPD).

Acknowledgements

This work has been supported by European Commis-sion IST-2002-507634 Biosecure NoE projects. The authorswould like to thank T. Rognvaldsson for sharing the HIVdataset; M. Moradi for providing the chromosome imagesand the features used in this study.

References

Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function

algorithms. New York: Plenum.Cappelli, R., Maio, D., & Maltoni, D. (2002). Subspace classification for

face recognition. In Proceedings of workshop on biometric authentica-

tion – ECCV’02 (BIOW2002), Copenhagen (pp. 133–141).Chien, J. T., & Wu, C. C. (2002). Discriminant wavelet faces and nearest

feature classifiers for face recognition. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 24, 1644–1649.Connie, T., Jin, A. T. B., Ong, M. G. K., Ngo, D., & Ling, C. (2005). An

automated palmprint recognition system. Image and Vision Computing,

23, 501–515.Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classifica-

tion. IEEE Transactions on Information Theory, 13(January), 21–27.

Demsar, J. (2006). Statistical comparisons of classifiers over multiple datasets. Journal of Machine Learning Research, 7, 1–30.

Gao, Q.-B., & Wang, Z.-Z. (2007). Center-based nearest neighbourclassifier. Pattern Recognition, 40(1), 346–349.

Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combiningclassifiers. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 20(3), 226–239.Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE,

78, 1464–1480.Kohonen, T. (2001). Self-organizing maps (3rd ed.). Springer.Li, S. Z., & Lu, J. (1999). Face Recognition using the nearest feature line

method. IEEE Transaction on Neural Networks, 10, 439–443.Lumini, A., & Nanni, L. (2006a). Machine learning for HIV-1 protease

cleavage site prediction. Pattern Recognition Letters, 27(12),1390–1396. Available online 2 May 2006.

Lumini, A., & Nanni, L. (2006b). A clustering method for automaticbiometric template selection. Pattern Recognition, 39(3), 495–497.

Moradi, M., & Setarehdan, S. K. (2006). New features for automaticclassification of human chromosomes: A feasibility study. Pattern

Recognition Letters, 27(1), 19–28.Nanni, L. (2006a). Comparison among feature extraction methods for

HIV-1 protease cleavage site prediction. Pattern Recognition, 39(4),711–713.

Nanni, L. (2006b). A reliable method for designing an automatickaryotyping system. Neurocomputing, 69(13–15), 1739–1742.

Nanni, L., & Maio, D. (2007). Weighted sub-Gabor for face recognition.Pattern Recognition Letters, 28(4), 487–492.

Orozco-Alzate, M., & Castellanos-Domınguez, C. G. (2006). Comparisonof the nearest feature classifiers for face recognition. Source Machine

Vision and Applications, 17(5), 279–285.Parades, R., & Vidal, E. (2006). Learning prototypes and distances: A

prototype reduction technique based on nearest neighbor errorminimization. Pattern Recognition, 39, 180–188.

Pekalska, E., Duin, R. P. W., & Paclık, P. (2006). Prototype selection fordissimilarity-based classifiers. Pattern Recognition, 39, 189–208.

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 838–843 843

Ramon Cano, J., Herrera, F., & Lozano, M. (2003). Using evolutionaryalgorithms as instance selection for data reduction in KDD: Anexperimental study. IEEE Transactions on Evolutionary Computation,

7(6), 561–575.Rognvaldsson, T., & You, L. (2004). Why Neural Networks Should not be

Used for HIV-1 Protease Cleavage Site Prediction. Bioinformatics,

20(11), 1702–1709.

Zheng, W., Zhao, L., & Zou, C. (2004). Locally nearest neighborclassifiers for pattern classification. Pattern Recognition, 37,1307–1309.

Zhou, Y., Zhang, C., & Wang, J. (2004). Tunable nearest neighbourclassifier. Lecture Notes in Computer Science, 3175, 204–211.


Recommended