of 12
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
24 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Enhancing Performance of KNN Classifier by Means of
Genetic Algorithm and Particle Swarm Optimization. Asha Gowda Karegowda*, Kishore B.
Department of MCA, Siddaganga Institute of Technology, Tumkur, India
A B S T R A C T
KNN is susceptible to noise in view of the fact that, it is based on the distance between the test and
the training sample. Feature weighting and significant feature selection can be the way out to
surmount this limitation of KNN classifier. This paper proposes three methods namely: binary
encoded Genetic Algorithm (GA) for identifying significant features and real encoded GA and
Particle swarm optimization (PSO) identified feature weights for enhancing the performance of
KNN classifier. The outcome of the proposed method proved to be of better-quality when
compared to KNN performance with weights, provided by information gain, gain ratio and Relief
method. Further the estimated work results also proved to be superior when compared to results
of prominent classifiers like radial basis function, support vector machine, decision tree, Bayesian
and Nave Bayes classifier. The binary encoded GA identified significant features and real encoded
GA and PSO provided weights also proved to augment the performance of fuzzy KNN classifier.
Computational work has been carried on seven different datasets availed from UCI machine
learning datasets.
Index Terms: Crisp KNN, Fuzzy KNN, Genetic Algorithm, Particle swarm optimization, feature
subset selection, feature weights
I. INTRODUCTION
Classification is a supervised model, which maps or classifies a data item into one of several predefined
classes. Data classification is a two-step process. In the first step, a model is built describing a
predetermined set of data classes or concepts. Typically the learned model is represented in the form of
classification rules, decision trees, or mathematical formulae. In the second step the model is used for
classification.
The classifiers are of two types: Instance based or lazy learners and Eager learners. Eager learners
(decision tree, Bayesian classifier, SVM, Back propagation neural network) when given set of training set,
will construct a classifier model and use the constructed model to classify the test samples/ previously
unseen samples. In contrast Instance based or lazy learners (k-nearest neighbor classifier and case based
reasoning classifier) are the one in which the classifiers store all of training samples and do not build a
classifier until a new sample with no class label needs to be classified. Lazy learner does less work when
training samples are presented and more work when making a classification or prediction for test sample
[1].
Feature subset selection is of immense importance in the field of data mining. Mining on the reduced set
of attributes, not only reduces computation time and but also helps to make the patterns easier to
understand. Wrapper model approach uses the method of classification itself to measure the significance
of features set; hence, the feature selected depends on the classifier model used in contrast to the filter
approach, which is independent of the learning induction algorithm [2]. In this paper, binary encoded
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
25 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Genetic Algorithm (GA) has been used for feature subset selection and is wrapped with KNN classifier. In
addition to feature subset selection, the performance of KNN classifier can be enhanced by finding the
weights for each feature, which measures the relevance of feature for the classification task [3]. Feature
subset selection is special case of feature weighting, where weight one is assigned to significant feature
and weight zero is assigned to non-significant feature. Binary encoded GA has been used to identify the
significant feature for k-means clustering [4]. In this paper real encoded GA [5] and Particle Swarm
optimization (PSO) [6] has been used to find the weights of features for enhancing the accuracy of KNN
classifier. For the sake of completeness crisp and fuzzy KNN classifiers are briefed in Section II, followed
by brief discussion of anticipated binary encoded GA for feature selection for KNN classifier in Section III.
The proposed real encoded GA and PSO generated weights adopted for enhancing the performance of
KNN classifier are briefed in Section IV and Section V respectively. The computational results are
presented in Section VI followed by conclusions and future enhancement in Section VII.
II. CRISP AND FUZZY K-NEAREST NEIGHBOR ALGORITHM
Crisp KNN is a simple supervised classification technique, which belongs to instance-based or lazy
learning family of methods [1, 7]. It delays the process of modeling the training data until it is needed to
classify the test samples. The training samples are described by n-dimensional numeric attributes. The
training samples are stored in an n-dimensional space. When a test sample (with unknown class label) is
given, the k-nearest neighbor classifier searches for the k training samples which are closest to the
unknown sample, followed by applying majority voting for classification. Closeness is usually defined in
terms of Euclidean distance. The Euclidean distance between two points P (p1, p2, , pn) and Q (q1, q2,
.qn) given by equation 1.
2
1)(),( ii
n
iqpQPd =
= eq. (1)
In spite of simplicity of KNN, it suffers from quite a few drawbacks such as it requires large memory
proportional to the size of training set ; high computation cost since it needs to compute distance of each
test instance to all training samples ; low accuracy rate in multidimensional data sets with irrelevant
features and there is no thumb rule to determine value of parameter k (number of nearest neighbors).
The accuracy of KNN classifier can be improved by identifying the optimal value of K neighbors, in
addition to identifying the significant inputs for KNN. The Golden section search has been used in
combination with Akaikes Information Criterion (AIC) to find the optimal number of K nearest neighbors
[8]. In addition, prototype generation and prototype selection are used to enhance the nearest neighbor
classifier through data reduction [9,10]. Further the KNN performance can be improved by identifying
the significant features and finding the feature weights. Weighted KNN is an extension of KNN classifier
which incorporates weights for individual attributes, in contrast to KNN classifier which assumes equal
weights for all the attributes. Authors have used six different methods namely: information gain, Gain
Ratio, One rule Classifier, Significance Feature Evaluator, Relief and KNNFP with one attribute for
assigning weights to enhance the performance of KNN classifier [11].
In case of KNN classifier, once an input vector is assigned to a class, there is no indication of its strength
of membership in that class [12]. The fuzzy KNN algorithm assigns class membership to the test record
rather than assigning the test record to a particular class. It assigns the membership based on the sample
records distance from its k-nearest neighbors and those neighbors membership in the possible classes.
The following properties must be true for membership matrix of size c x n where, c and n are the
number of class labels and training samples subjected to condition given by equation 2 and 3 where ik is
the membership of kth training record for ith class.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
26 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
11
==
c
iik
eq(2)
[ ]1,0 ik eq(3)
The working of fuzzy K-nearest Neighbor Algorithm [11] is as follows:
For each test sample
x repeat steps a-d
a) Compute distance between test sample x and each of the training sample
b) Find k nearest neighbors for test sample x
c) Compute membership of test sample x for each class ci (i = 1 to number of class labels) i.e i(x)
using following equation (4)
=
=
=k
j
m
j
m
j
k
j ij
i
xx
xxx
1
)1
2(
)1
2(
1
||||1||||1
)(
eq(4)
where ij is the membership of jth neighbor of test sample x for each class ci and m is the
fuzzifier value usually set to 2.
d) The results of the fuzzy classification for test sample x is specified using simple crisp partition,
where a test sample is assigned to the class of maximum membership.
Endfor
There are basically three different techniques of membership assignment ij used in equation 4 for the
training samples [12]. The first method, uses a crisp labeling, and assigns each training sample complete
membership of one in its known class and zero membership in all other classes. The second technique
works only on two class data sets. The procedure assigns membership in its known class based on its
distance from the mean of the training sample class. The third method assigns membership to the
training samples according to a K-nearest neighbor rule using equation 5. The K-nearest neighbors to
each sample x (say x belonging to class ci) are found, and then membership in each class is assigned
according to the following equation:
ijifkijifkelse
j
n
nx
j
j
=+
=
=,49.0*)(51.0
!,49.0*)()(
(j = 1 to c) eq(5)
where, value nj is the number of the neighbors belonging to the class cj and k is the total number of
neighbors of test sample x.
III. GA FOR IDENTIFYING SIGNIFICANT FEATURES AND FEATURE WIEGHTS FOR CRISP AND FUZZY
KNN CLASSIFIER
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
27 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
GA is a stochastic general search method, capable of effectively exploring large search spaces. The basic
techniques of the GAs, follow the Charles Darwin principle of survival of the fittest". The reproduction,
crossover, and mutation operations are applied to parent chromosomes to generate the next generation
offspring. Authors have used GA for optimizing the connection weights of feed forward neural network,
finding the significant features for different classifier using both filter and wrapper approach for various
classifiers and for finding the optimal centroids for k-means clustering and fuzzy k-means clustering [4,
13-16].
Authors have applied Binary encoded GA for identifying the significant features. In Binary Encoded GA,
the chromosome can have either 0s or 1s as gene value. The 1s and 0s in the binary encoded GA
represent that the feature is significant and not significant respectively. In the proposed method, with
binary encoding GA, the length of the chromosome is equal to total number of features say F. For example
with diabetic data set, total number of features F = 8. The chromosome length is 8. With 10011001
binary encoded chromosomes, 1st, 4th, 5th and 8th features are significant features and 2nd, 3rd, 6th and
7th features are not significant. The working of binary encoded GA for finding the significant feature
subset for K-Nearest Neighbor is as follows:
i. initialize the chromosome population randomly using binary encoding (each chromosome length
is equal to total number of features F for a given dataset)
ii. Repeat the steps a-d till terminating condition (maximum number of generations) is reached.
a. Apply KNearest Neighbor using individual chromosome representing the significant features
and find the classification accuracy as fitness of the chromosome.
b. Select the chromosome resulting in highest classification accuracy of KNN classifier as the
fittest chromosome and replace the low fit chromosome by highest fit chromosome
(reproduction).
c. Select any two chromosomes randomly and apply one point crossover operation.
d. Apply mutation operation by randomly selecting any random chromosome and randomly
change the bit 1 to 0 and bit 0 to 1.
The positions of bit 1 in the best-fit chromosome are considered as significant attributes for both fuzzy
and crisp K-Nearest Neighbor classifier.
In addition to significant feature selection using Binary encoded GA, authors have applied Real Encoded
GA for finding feature weights for KNN classifier. With the real encoded GA, chromosomes represent the
weights of the features in contrast to binary encoded GA which is used to find the subset of significant
features from original feature set. The binary encoded GA can be considered as special method for finding
the weights of features, where, zero weight is assigned to non significant features and weight one is
assigned to significant features. A Repair algorithm is used with real Encoded GA algorithms to guarantee
the feasibility of chromosomes (i.e. sum of weights of all features must be equal to one). This is done by
finding the sum of all feature weights and dividing each feature weight by total weight.
The functioning of real encoded GA for finding the feature weights for K-Nearest Neighbor is as follows:
i. Initialize the chromosome population randomly using real encoding. (Each chromosome length is
equal to total number of features F)
ii. Repeat the steps a-d till terminating condition (maximum number of generations) is reached.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
28 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
a) Apply weighted KNearest Neighbor to individual chromosome representing the feature
weights and find the classification accuracy as fitness of chromosome.
b) Select the chromosome resulting in highest classification accuracy of weighted KNN as the
fittest chromosome and replace the low fit chromosome by highest fit chromosome.
c) Select any two chromosomes randomly and apply one point crossover operation (call repair
algorithm if the sum of the weights of all feature exceeds one)
d) Apply mutation operation by randomly selecting any chromosome and alter any one of the
weights by multiplying with a random real number. (call repair algorithm if the sum of the
weights of all features exceeds one)
iii. The weights in the fittest chromosome are considered as features weights for both crisp and
fuzzy K-Nearest Neighbor classifier.
IV. PSO FOR FINDING FEATURES WEIGHTS FOR CRISP AND FUZZY KNN CLASSIFIER
PSO are population based search method, inspired by the social behavior of a ock of migrating birds. A
particle is analogous to a chromosome (population member) in GAs. In GA the next generation
chromosomes are generated using crossover, mutation and reproduction process using parent
chromosomes. As opposed to GAs, the evolutionary process in the PSO does not create new birds from
parent ones, instead, each particle flies in the search space with a velocity adjusted by its own knowledge:
Pbest (local search) and best among the companions knowledge: Gbest (global search) [6]. Authors have
applied PSO to optimize the connection weights of feed forward neural network [16] and to find the k-
means centroids [17].
This paper proposes real encoded PSO algorithm to find the weights of features for KNN classifier. The
dimension of each particle is equal to total number of features for a given dataset. The working of real
encoded PSO for finding the significant feature weights for K-Nearest Neighbor is as follows:
i. Initialize the particle population randomly using real encoding (Each particle length is equal to
total number of features F)
ii. Repeat the steps a-e following till terminating condition (Maximum number of iterations) is
reached.
a) Apply weighted KNearest Neighbor using individual particle representing the
significant feature weights and find the classification accuracy.
b) Find local best for each particle. (Best accuracy of individual particle)
c) Find global best from the population of particles. (Best accuracy among all particles)
d) Compute new velocity using local and global best for each particle using equation (6).
e)
vij(t + 1) = wvij (t) + c1R1(pbestij - xij (t)) + c2R2(gbestij - xij (t)) eq (6)
f) Update each particle position using old position and new velocity using equation (7)
xij (t + 1) = xij (t) + vij (t + 1) eq (7)
(Call repair algorithm (as mentioned is Section 3) if the sum of weights of all features of particle
exceeds 1)
iii. The weights represented by the global best particle are considered as final features weights for
both crisp and fuzzy K-Nearest Neighbor classifier.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
29 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
In equation 6, vij(t + 1) is the new velocity, w is the inertia weight, vij (t) is the old velocity, c1 and c2 are
constants usually set to 2, R1 and R2 are randomly generated numbers, pbestij is the particles local best
and gbestij is the population global best, xij is the old position of particle.
V. EXPERIMENTAL RESULTS
Experiments have been carried out using seven different datasets namely: Heart stat log, diabetes, wine,
Indian liver, vehicle, iris and ionosphere availed from UCI machine learning dataset. The data has been
partitioned by means of holdout method with 60-40 ratio as training set and test set. The data is
normalized using min-max normalization method. To enhance the performance the KNN classifier, binary
encoded GA has been experimented to discover the significant features as explained in section III. In
addition real encoded GA and real encoded PSO have been experimented for identifying the weights of
features for KNN classifier. The results of crisp KNN are compared with anticipated binary encoded GA
identified significant features, and with weights identified by GA and PSO versus weights identified by
information gain, gain ratio and Relief method as shown in Figure 1. Experiments with GA and have been
carried out by varying the populations size and number of generations, in addition to changing the value
of K (1-10) for crisp KNN and fuzzy KNN classifier.
Figure 1 illustrates, thats the classification accuracy of crisp KNN is improved by GA identified features
and GA and PSO identified weights for all the seven datasets. For heart stat log dataset and vehicle
dataset, the crisp KNN performance was preeminent with weights identified by real encoded GA. PSO
identified weights proved to be superlative for crisp KNN classifier for diabetes and Indian liver datasets.
For iris dataset, the binary encoded GA features proved to top for crisp KNN classifier. For wine and
ionosphere datasets, binary encoded GA features and real encoded GA and PSO weights resulted in top
and same accuracy of crisp KNN classifier.
Further GA identified feature and GA and PSO identified weights have been used to further enhance the
performance of fuzzy KNN classifier. For fuzzy KNN classifier, the membership of training samples is
computed using two methods (a) crisp assignment method (assign each training sample complete
membership in its known class and zero membership in all other classes) and is named as F1WKNN and
(b) using equation 5 and is named as F2WKNN. The relative results of F1WKNN classifier (with crisp
assignment method) and F2WKNN classifier (with assignment using equation 5) using significant
features identified by GA and using weights identified by GA and PSO versus weights identified by
information Gain, gain ratio and Relief method are shown in Figure 2 and Figure 3 respectively with
fuzzifier value m equal to 2. With F1WKNN classifier, weights identified by real encoded GA proved to be
best for heart-statlog, wine, Indian liver and iris datasets. For diabetes dataset, F1WKNN showed an
augmented accuracy by order of 2- 3 % with weights identified by Gain ratio method when compared to
proposed method. For vehicle and ionosphere datasets, weights given information gain proved to be to
some extent better for F1WKNN when compared to proposed method.
The classification accuracy of F2WKNN was paramount with real encoded GA identified weights for heart
statlog, Indian liver, vehicle and iris datasets when compared to other methods. Both binary encoded GA
features and real encoded GA weights resulted in same classification accuracy of F2WKNN for wine and
ionosphere datasets. For diabetes datasets, the binary encoded GA provided features resulted in better
accuracy of F2WKNN when compared to weights proposed by real encoded GA and PSO. However, the
weights identified by information gain and Gain ratio resulted in slight improved performance for
diabetes dataset when compared to GA identified features. The results of F2WKNN are found to enhance
when compared to F1WKNN.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
30 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
In addition the results of proposed method for improving the crisp and fuzzy KNN classifier accuracy by
using GA identified features, and PSO and GA identified weights is compared with five well known
classifiers (available in WEKA tool) namely radial basis network (RBF), Support vector machine (SVM),
Decision tree (C4.5), Bayesian and Nave Bayes methods as shown in Figure 4 Figure 6.
Figure 4 illustrates that, both binary encoded GA features and real encoded GA and PSO weights resulted
in same classification accuracy of crisp KNN classifier for wine and ionosphere datasets. For iris dataset,
GA identified features for crisp KNN classifier resulted in top accuracy for iris dataset when compared to
other classifiers. PSO identified weights proved to finest for both diabetes and Indian liver datasets and
GA identified weights proved to superlative for Heart stat-statlog and vehicle datasets for crisp KNN
when compared to other classifiers.
Figure 5 depicts that, PSO identified weights proved be most excellent for F1WKNN for diabetes, when
compared to other classifiers. SVM proved to preeminent for Heart-statlog dataset for F1WKNN. Further
GA identified weights with F1WKNN showed better performance when compared to other classifiers for
wine, Indian liver, vehicle and iris dataset. Figure 5 illustrates that, both binary encoded GA features and
real encoded GA and PSO weights resulted in same classification accuracy of F1WKNN ionosphere
datasets. However, the GA identified features showed a decline in performance for both vehicle and iris
datasets for F1WKNN classifier.
Figure 6 depicts that GA identified weights for F2WKNN proved be best for all the datasets excluding
diabetes when compared with other classifiers. For Diabetes dataset, Bayesian classifier showed
negligible enhancement when compared to F2WKNN with GA identified features. On the other hand, the
GA identified features showed a decline in performance for iris datasets with F2WKNN classifier. GA and
PSO identified weights proved to vital with F2 WKNN and resulted in same accuracy for ionosphere
dataset, where as GA recognized features and weights proved to paramount with F2WKKNN for wine
dataset when compared to other classifiers.
VI. CONCLUSIONS
This paper projected binary encoded GA for identifying significant features and real encoded GA and PSO
for finding feature weights for enhancing the performance of both crisp and fuzzy KNN classifier. Among
the three proposed methods for improving the performance of KNN classifier, there is no one single
method which is superlative for all the seven datasets. Overall the two evolutionary methods namely GA
and PSO have enhanced the performance of KNN classifier when compared to outcome of well known
classifier like radial basis function, support vector machine, decision tree, Bayesian and Nave bayes
classifiers as well as with the feature weights identified by information gain, gain ratio and Relief method
for all the experimented seven datasets. As part of future enhancement, authors would like to extend the
work on significant feature selection using binary PSO and binary cuckoo search algorithms. In addition
to PSO and GA, the basic version and modified version of Cuckoo search algorithms identified feature
weights can be applied for further improving the performance of KNN classifier.
VII. REFERENCES
[1] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, San Francisco, Morgan Kauffmann
Publishers, (2001).
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
31 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
[2] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Feature Subset Selection Problem using
Wrapper Approach in Supervised Learning, .International Journal on Computer
Applications(IJCA) ,Vol. 1, pp. 13-17, 2010.
[3] S. Cost, Salzberg, A weighted nearest neighbor algorithm for learning with symbolic features,
Machine Learning, Vol. 10, No 1, Jan.1993, pp. 57-78.
[4] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Vidya T, Shama, Genetic Algorithm based
Dimensionality Reduction for Improving Performance of k-means and fuzzy k-means clustering:
A Case study for Categorization of Medical dataset, Gwalior, India, Proceedings of Seventh
International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA
2012),Advances in Intelligent Systems and Computing, Vol.201, pp. 169-180, 2013.
[5] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine learning, Addison Wesley,
(1989).
[6] R. Eberhart and J. Kennedy, A new optimizer using particle swarm theory, Proceedings of the
Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, pp.39-43,
1995.
[7] T.M. Cover, P.E. Hart, Nearest Neighbor pattern classification, IEEE Transactions on Information
Theory, Vol. IT13, pp. 21-27, 1967.
[8] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Combining Akaikes Information
Criterion (AIC) and the Golden-Section Search Technique to find Optimal Numbers of Nearest
Neighbors, International , Journal of Computer Applications Vol. 2, pp. 80-67, May 2010.
[9] Isaac Triguero, Joaqun Derrac, Salvador Garca, Francisco Herrera, A Taxonomy and
Experimental Study on Prototype Generation for Nearest Neighbor Classification, IEEE
Transactions on Systems, Man, and Cybernetics, Part C Vol. 42(1), pp. 86-100, 2012.
[10] Salvador Garcia, Joaquin Derrac, Jose Ramon Cano, Francisco Herrera, Prototype Selection for
Nearest Neighbor Classification: Taxonomy and Empirical Study, IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 345(3), pp. 417-435, 2012.
[11] Asha Gowda Karegowda, Rakesh Kumar Singh, M.A.Jayaram, A.S .Manjunath, Improving
Weighted K-Nearest Neighbor Feature Projections Performance with Different Weights
Assigning Methods , International Conference on Computational Intelligence (ICCI 2010)
December 9 11, 2010, Coimbatore, India.
[12] James M. Keller, Michael R Gray, James A Givens JR, A fuzzy K-nearest neighbor algorithm, IEEE
Transactions on Systems, Man, and Cybernetics, Vol. SMC-15, No. 4, pp. 580-585, 1985.
[13] Asha Gowda Karegowda, M.A.Jayaram, A.S. Manjunath, Application of Genetic Algorithm
Optimized Neural Network connection weights for medical diagnosis of PIMA Indian diabetes,
International Journal of Soft Computing, Vol.2, No.2, pp. 15-22, May 2011.
[14] Asha Gowda Karegowda, M.A.Jayaram, A.S. Manjunath, Feature subset selection using cascaded
GA and CFS: a filter approach in supervised learning:, International Journal on Computer
Applications, Vol. 23(2), pp.1-10, 2011.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
32 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
[15] Asha Gowda Karegowda, Shama, Vidya T, M.A.Jayaram, A.S .Manjunath, Improving Performance
of K-Means Clustering By Initializing Cluster Centers Using Genetic Algorithm and Entropy Based
Fuzzy Clustering for Categorization Of Diabetic Patients, MSRIT, Bangalore, Proceedings of
International Conference on Advances in Computing, Advances in Intelligent Systems and
Computing, Volume 174, July 4-6, pp. 899-904, 2012.
[16] Asha Gowda Karegowda, M.A. Jayaram, Significant Feature Set Driven, Optimized FFN for
Enhanced Classification, International Journal of Computational Intelligence and Informatics
ISSN: 2231-0258, Vol. 2, No 4, Mar 2013.
[17] Asha Gowda Karegowda, Seema Kumari, Particle Swarm Optimization Algorithm Based k-means
and Fuzzy c-means clustering, International Journal of Advanced Research in Computer Science
and Software Engineering Vol. 3, Issue 7, pp. 448-451, July 2013 .
67
70
73
76
79
82
85
88
91
94
97
100
Heart-statlog Diabetes Wine Indian Liver Vehicle Iris ionosphere
Datasets
Cla
ssifi
catio
n Ac
cura
cy
KNN Binary GA-WKNN Real GA-WKNNReal PSO-WKNN Information Gain-WKNN Gain Ratio-WKNNRelief-WKNN
Figure 1. Comparative performance of crisp KNN classifier using binary encoded GA identified features
and real encoded GA and PSO identified feature weights vs. information gain, gain ratio and relief method
identified weights for different datasets
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
33 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
646770737679828588919497
100
Heart-statlog Diabetes Wine Indian Liver Vehicle Iris Ionosphere
DataSets
Cla
ssifi
catio
n Ac
cura
cyF1KNN Binary GA-F1WKNN Real GA-F1WKNN
Real PSO-F1KNN Information Gain-F1WKNN Gain Ratio-F1KNN
Releif-F1WKNN
65
70
75
80
85
90
95
100
Heart-statlog Diabetes Wine Indian Liver Vehicle Iris IonosphereDatasets
Cla
ssifi
catio
n a
ccu
racy
F2KNN Binary GA-F2WKNN Real GA-F2WKNN
Real PSO-F2WKNN Information Gain-F2WKNN Gain Ratio-F2WKNN
Relief-F2WKNN
Figure 2. Comparative performance of F1WKNN classifier (with crisp method for membership
assignment) using binary encoded GA identified features and real encoded GA and PSO identified
feature weights vs. information gain, gain ratio and relief method for different datasets
Figure 3. Comparative performance of F2WKNN classifier (with equation 5 for membership
assignment) using binary encoded GA identified features and real encoded GA and PSO identified
feature weights vs. information gain, gain ratio and relief method for different datasets
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
34 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
45
50
55
60
65
70
75
80
85
90
95
100
Heart-statlog Diabetes Wine Indian Liver Vehicle Iris ionosphereDATA SETS
Cla
ssifi
catio
n Ac
cura
cyKNN Binary GA-WKNN Real GA-WKNN Real PSO-WKNN RBF
SVM DT Bayes NaiveBayes
45
5055
60
657075
8085
9095
100
Heart-statlog Diabetes Wine Indian Liver Vehicle Iris Ionosphere
Datasets
Cla
ssifi
catio
n A
ccu
racy
F1KNN Binary GA-F1WKNN Real GA-F1WKNNReal PSO-F1KNN RBF SVMDecision Tree Bayesian NaiveBayes
Figure 5. Comparative performance of F1WKNN classifier (with crisp method for membership
assignment) using binary encoded GA identified features and real encoded GA and PSO identified weights
vs SVM, RBF, Decision Tree, Bayesian and Nave Bayes classifiers for different datasets
Figure 4. Comparative performance of Crisp KNN classifier (with crisp method for membership
assignment) using binary encoded GA identified features and real encoded GA and PSO identified
feature weights vs SVM, RBF, Decision Tree, Bayesian and Nave Bayes classifiers for different
datasets
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 5, May 2014. ISSN 2348 - 4853
35 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
45
50
55
60
65
70
75
80
85
90
95
100
Heart-statlog Diabetes Wine Indian Liver Vehicle Iris Ionosphere
Datase tsDatase tsDatase tsDatase ts
Cla
ssifi
catio
n A
ccu
racy
F2KNN Binary GA-F2WKNN Real GA-F2WKNNReal PSO-F2WKNN RBF SVMDecision Tree Bayesian NaiveBayes
ss
Figure 6. Comparative performance of F2WKNN classifier (with equation 5 for membership assignment)
using binary encoded GA identified features and real encoded GA and PSO identified weights vs SVM, RBF,
Decision Tree, Bayesian and Nave Bayes classifiers for different datasets