Post on 09-Jul-2020
transcript
1
Real-World Data Clustering Using a Hybrid of Normalized Particle Swarm
Optimization and Density Sensitive K-means Algorithm
Temitayo M. Fagbola1, Surendra C. Thakur2, Oludayo O. Olugbara3
1,2,3 ICT and Society Research Group, Durban University of Technology, P.O. Box 1334, Durban 4000, South Africa.
Abstract K-means is one of the most widely used classical partitioned clustering algorithms due to its speed of
convergence, adaptability nature to sparse data and simplicity of implementation. However, it only
guarantees convergence of sum of squares’ objective function to a local minimum while its convergence to
global optimum appears NP-hard when introduced to large, noisy and non-convex structures. This in turn
maximizes its error margin. Most currently existing improvements on K-means adopt techniques which
further introduce additional challenges including inaccurate clustering results, high time and space
complexities and sometimes premature convergence on K-means. However, high accuracy with large
datasets, robustness to noisy data, low clustering time and low sum-of-squared error are sought-after
capabilities of good clustering algorithms. In this paper, a hybrid Normalized-Particle-Swarm Optimized-
Density-Sensitive (NPSO-DS) K-means algorithm is proposed to manage the aforementioned limitations
of K-means. The proposed NPSO-DS K-means algorithm combines the global consistency property of the
normalized Particle Swarm Optimization (PSO) technique incorporating a Min-Max technique and a
clustering error as objective function with the stable properties of a density-sensitive K-means to realize
convergence of particles to global optimum with large and noisy real–world datasets. Using clustering
accuracy, sum-of-squared error and clustering time as evaluation metrics, the experimental results obtained
when the proposed algorithm was tested on Educational Process Mining (EPM) and Wine real-world
datasets show that it is significantly capable of consistently yielding high quality results. Furthermore, the
proposed NPSO-DS K-means algorithm could identify non-convex clustering structures and offers
appreciable robustness to noisy data, thus generalizing the application areas of the conventional K-means
algorithm.
Keywords – K-means, Normalized Particle Swarm Optimization, Clustering, Real World
Dataset, Density-Sensitive Distance Metric, Min-Max Normalization
1. Introduction
Clustering is a data mining technique that involves the grouping of a set of data objects
into multiple groups called clusters such that each object in a cluster share very close similarity
attributes that distinguish them from distinct objects in the other clusters (Joshi and Kaur, 2013;
Neelamadhab, Pragnyaban and Rasmita, 2012). The attribute values of each object are distinctive
characteristics used to assess the level of dissimilarities and similarities that uniquely differentiate
an object from other objects. Many applications arising from a wide range of problems including
exploratory data analysis, image segmentation, pattern recognition, medical image analysis, web
handling and mathematical programming have been developed using clustering algorithms (Chen
and Zhang, 2017; Jiawei and Micheline, 2006; Rauber, 2000). Owing to the huge amount of data
collected in databases, cluster analysis has recently become a major research area of interest to
many researchers. There are several applications where it is necessary to cluster a large collection
of patterns. For example, in document retrieval, millions of instances having high dimensionality
spanning beyond 100 have to be clustered to achieve data abstraction (Adigun, Omidiora,
Olabiyisi, Adetunji and Adedeji, 2012). In the same vein, the vagueness that characterizes the
2
border of region of most real-world data makes accurate clustering very difficult. Therefore,
clustering algorithms are expected to produce high quality results especially with large and noisy
real world datasets.
K-means is one of the most widely used classical partitioned clustering algorithm due to
its ease of interpretation, simplicity of implementation, speed of convergence with considerable
small size of clean data and adaptability to sparse data (Sharfuddin, Mohammad, Dip and
Mashiour, 2015). It uses the Euclidean distance dissimilarity measure, a non-convex objective
function, which often fails in an attempt to obtain correct clusters for data points with convex
distribution (Ling, Liefeng and Licheng, 2012). Since global consistency of data is crucial to
accurate clustering, Euclidean distance measure (EDM) is highly undesirable especially when
clusters have such complex structure and random distributions (Su and Chou, 2001).
Consequently, the error gap in K-means performance becomes widened as K-means could only
converge to local minima due to its associated EDM. In addition, K-means has a strong sensitivity
to noisy data (Adigun et al., 2012). If there is a certain amount of noise associated with a dataset,
the final clustering results produced by K-means become automatically impaired by errors (Zhou,
Bousquet, Lal, Weston and Scholkopf, 2004). In the same vein, potential errors that may evolve
when K-means is used to cluster certain real-world critical datasets emerging from medical,
security and finance sectors can be highly disastrous. This makes K-means less suitable for
clustering large and noisy real-world datasets (Li, Lei, Bo, Yue and Jin, 2015; Amita and Ashwani,
2014).
However, most currently existing improvements on K-means adopt techniques including
genetic algorithm (Jenn-Long, Yu-Tzu and Chih-Lung, 2012), principal component analysis
(Chetna and Garima, 2013), expectation maximization (Adigun et al., 2012), MapReduce and grid
(Li, Lei, Bo, Yue and Jin, 2015) to optimize the performance of K-means. However, these adopted
techniques often induce some additional performance drawbacks including longer steps before
convergence, curse of dimensionality, inaccurate clustering results, high time and space
complexities as well as premature convergence. In the same vein, most of these works were tested
only on noise-free or limited dataset size. Emphatically, insensitivity to noisy data, high accuracy
obtainable from large datasets, low clustering time and low sum-of-squared error are sought-after
capabilities of good clustering algorithms (Fagbola, Babatunde and Oyeleye, 2013). As a result,
obtaining an improved K-means that could guarantee convergence to global optimum with quality
results in the face of large and noisy real-world datasets still largely remains an open problem.
To overcome this problem, Particle Swarm Optimization (PSO) is considered to be a
leading and effective metaheuristic method that could offer improved precision, runtime efficiency
and robustness of results (Olugbara, Adetiba and Oyewole, 2015; Shinde and Gunjal, 2012) due
to its robustness to noise and its ability to efficiently find an optimal set of feature weights in large-
dimensional complex features (Oloyede, Fagbola, Olabiyisi, Omidiora and Oladosu, 2016) via a
global search. It is an evolutionary algorithm that mimics the schooling and the flocking social
behaviors of fishes and birds respectively (Kennedy and Eberhart, 1995). Characteristically, it is
fast, very simple to implement and understand, requires very few parameter settings and
computationally efficient. Furthermore, it has been adopted widely to optimize the performance of
other algorithms for solving clustering problems (Chen and Zhang, 2017; Qiang and Xinjian, 2011;
Sun, Xu and Ye, 2006), scheduling problems (Xia, Wu, Zhang and Yang, 2004; Koay and
Srinivasan, 2003), medical imaging (Jagdeep and Jatinder, 2017; Fazel and Wail, 2006) and
anomaly detection problems (Karami and Guerrero-Zapata, 2015; Abimbola, Temitayo and
Adekanmi, 2014) among others. In this study, a Normalized PSO (NPSO) based on Min-Max
3
technique and an integrated clustering error as the objective function was developed for prior pre-
processing of large, complex and noisy datasets before final clustering by K-means. The Euclidean
distance measure in K-means was replaced with a density-sensitive distance metric to maximize
the speed and improve the tendency of K-means to converge to a global optimum. Finally, a hybrid
Normalized-Particle-Swarm-Optimized-Density-Sensitive (NPSO-DS) K-means algorithm is
proposed as a major improvement over the conventional K-means and its existing modifications.
The three major contributions of this paper are mentioned as follows:
(1). Propose a modified Particle Swarm Optimization (PSO) algorithm based on Min-Max
normalization technique and termed Normalized PSO (NPSO) that uses clustering
error as the objective function. This algorithm can serve as a dimensionality reduction
technique capable of eliminating noise, managing the inherent curse of dimensionality
associated with most real-world datasets and evaluating particles’ fitness for optimal
feature subset selection in classical data mining problems domain.
(2). Propose a hybrid algorithm composed of NPSO and density-sensitive K-means. This
algorithm can be easily adapted to solving any feature selection and dimensionality
reduction problem characterized by large and noisy data with complex structures. It
can also be integrated seamlessly with any classification system to improve its quality.
(3). The proposed hybrid Normalized-Particle-Swarm-Optimized-Density-Sensitive
(NPSO-DS) K-means algorithm was evaluated quantitatively on the public
Educational Process Mining (EPM) and Wine real-world datasets using clustering
accuracy also known as Rand index, sum of squared error and clustering time as
metrics.
The rest of this paper is presented as follows: in section 2, K-means clustering algorithm, feature
selection for clustering, density-sensitive distance metric, Min-Max data normalization and trends
of improvement on K-means clustering algorithm are discussed. In section 3, real-world dataset
acquisition, development of a normalized particle swarm optimization algorithm and the hybrid
NPSO-density sensitive K-means algorithm are discussed. The results obtained are presented in
section 4 while conclusion and future works is presented in section 5.
2. Literature Review
Clustering is a common approach for statistical machine learning-based data analytics that
has been widely employed in a number of challenging domains like pattern recognition, medical
imaging, bioinformatics, social media analytics and so on (Su and Chou, 2001). Actually,
clustering is an unsupervised systematic learning approach that attempts to group a finite set of
closely related samples into one group called a cluster. Given an untagged dataset, it is required to
put like-samples in a cluster such that each cluster possesses maximum intracluster and minimum
intercluster similarities based on some indices (Joshi and Kaur, 2013). However, finding clusters
in high-dimensional spaces is computationally expensive and may degrade the learning
performance of most learning systems.
2.1 K-means clustering algorithm
K-means is one of the most commonly used algorithm in the field of data mining and
introduced to solve various clustering problems. K-means is a partitioning clustering technique in
which clusters are formed with the help of centroids. On the basis of these centroids, clusters can
vary from one another with different iterations (Nasser, Alkhaldi and Vert, 2004). Moreover, data
elements can vary from one cluster to another, as clusters are based on the random numbers known
4
as initial centroids. The clusters are fully dependent on the selection of the initial clusters centroids.
K data elements are selected as initial centers; then distances of all data elements are calculated by
squared Euclidean distance measure. Data elements having less distance to centroids are moved to
the appropriate cluster. The process is continued until no more changes occur in clusters. The
clusters generated by K-means are non-hierarchical in nature (Twinkle et al., 2014). It requires a
huge initial set to start the clustering and does not guarantee convergence. It is easy to implement
and debug its objective function by optimizing the intra-cluster similarity. K-means is applicable
only when the mean is defined and it terminates at a local optimum because it depends on gradient
descent algorithm (Hai, Yunlong, Li and Zhu, 2010). This makes it incapable of handling noise
and outliers. The pseudocode description of K-means is presented in Algorithm 1. By using
Euclidean distance as a measure of dissimilarity, K-means algorithm has a good performance on
the data with compact super-sphere distributions but tends to fail with data characterized by more
complex and unknown shapes, which indicates that this dissimilarity measure is undesirable when
clusters have random distributions. In this case, there arises the need for a more intuitive objective
function in k-means, on one hand to realize high intra-cluster (within-cluster) similarity and low
inter-cluster (between-cluster) similarity and on the other hand, for robustness to large, complex
and noisy datasets with arbitrary shaped clusters.
Input: Number of desired clusters K,
Output: A set of K clusters
Let D = {d1, d2,…,dn} be set of data objects
(1) Specify the number of clusters (k) for D
(2) Randomly select k centroids in the data space, D, or select first k instances
(3) Calculate the distance (arithmetic means) of all data points to the centroids in D
(4) Assign each data point to the nearest cluster using the shortest Euclidean distance
(5) Re-compute new cluster centers by averaging the observations assigned to a cluster
(6) Repeat steps 3, 4 and 5 until no more changes occur or convergence criterion is satisfied
(7) Stop
Algorithm 1: Conventional K-means (Azhar, Arthur and Vassilvitskii, 2012)
2.2 Feature Selection for Clustering
Feature selection is a problem pervasive in all domains of application of machine learning
and data mining including but not limited to product image classification, robotics and pattern
recognition, text categorization and medical applications especially for diagnosis, prognosis and
drug discovery (Fagbola, Olabiyisi and Adigun, 2012; Isabelle, 2008). Feature subset selection is
often a randomized or probabilistic selection of inputs which can further be formulated as an
optimization problem towards searching the solution space of subsets for an optimal or near-
optimal subset of features based on some specified criteria. Basically, feature selection algorithms
select subset of highly discriminant features. In other words, features that are capable of
discriminating samples that belong to different classes are identified and selected. This is a major
step to realizing effective utilization of computational resources and some cost savings. It often
provides better understanding of the data, the model and prediction performance (Fagbola et al.,
2012). Feature selection algorithms search for the best feature subset that reduces the feature space
dimensionality with the smallest change in classification accuracy. In other words, given a set of
D features, the algorithm chooses a subset of size d < D, which has the greatest ability to
5
discriminate between classes. The selection of the optimal subset out of all possible subsets is an
NP-hard problem. Thus, with large input spaces, the high computational overhead of optimal
methods necessitates the use of heuristic techniques to find near-optimal subsets in relatively
reduced computational times.
An exploratory study of some widely used feature selection algorithms including variants
of Sequential Forward / Backward Selection (SFS/SBS), Branch and Bound, and relaxed Branch
and Bound was carried out by Kudo and Sklansky (2000). Other approaches include genetic
algorithms (Siedlecki and Sklansky, 1988), floating search (Pudil et al., 1994), the Tabu search
metaheuristic (Zhang and Sun, 2006), simulated annealing (Siedlecki and Sklansky, 1988) and
particle swarm optimization (Shinde and Gunjal, 2012). However, Particle Swarm Optimization
(PSO) emerged as a leading metaheuristic method for feature selection and multi-thresholding
because of its ability to effectively find an optimal set of feature weights that improve precision,
runtime efficiency and robustness of results (Olugbara, Adetiba and Oyewole, 2015; Shinde and
Gunjal, 2012).
By description, PSO is a stochastic, population-based evolutionary algorithm for devising
efficient solutions to numerous general optimization problems. PSO simulates the shared behavior
happening among the flocking birds and schooling fishes (Mohammed, Pavlik, Cen, Wu and
Koedinger, 2009). It is computationally cheap due to its low memory and CPU requirements and
can easily be implemented (Qinghai, 2010; Eberhart, Simpson and Dobbins, 1996). Additionally,
it does not suffer from the problem of overfitting often encountered by other evolutionary
computation techniques (Kennedy and Eberhart, 1995). The search can be carried out by the speed
of the particle. It also depends on a population of individuals to discover favorable regions of the
search space. Every member in the population is called particle and the group of all particles is
called a swarm. The aim of the PSO is to find the particle position that results in the best evaluation
of a given objective (fitness) function. The flow diagram illustrating the PSO algorithm is
presented in Figure 1. PSO searches the problem domain by manipulating the trajectories of
moving points in a multidimensional space. The movement of each particle towards the optimal
solution is governed by the position and velocity of each individual, own previous best
performance and that of their neighbors. All particles receive the broadcast of the best position
encountered by all swarm particles. The relationship among the particles are often conceptualized
as a graph G={V, E}, where V depicts a swarm particle and E as an edge that connects the particles
together. Generally, the basic PSO algorithm consists of three steps which are the generation of
particle’s positions and velocities, velocity update and position update (Olaleye, Olabiyisi,
Olaniyan and Fagbola, 2014). First, the positions 𝑥𝑖𝑑 and velocities 𝑉𝑖𝑑 of the initial swarm of
particles are initialized randomly and generated using upper and lower bounds on the search
variables values, LB and UB, expressed as:
𝑋𝑖𝑑 = LB + rand (UB − LB) (1)
𝑉𝑖𝑑 =LB + rand (UB−LB)
∆𝑡 (2)
In equations (1 and 2), rand is a uniformly distributed random variable that takes a value between
0 and 1. This initialization process allows the swarm particles to be randomly distributed across
the search space. Afterwards, swarm updates its best value at every cycle in other to find the
optimized solution after several iterations using (Eberhart and Shi, 2001):
𝑉𝑖𝑑(𝑡 + 1) ← 𝑤 ∗ 𝑉𝑖𝑑(𝑡) + 𝑉𝑖𝑑(𝑡 + 1) ← 𝑤 ∗ 𝑉𝑖𝑑(𝑡) + 𝑐1𝑟1(𝑝𝑖𝑑(𝑡) − 𝑥𝑖𝑑(𝑡)) + 𝑐2𝑟2 (𝑝𝑔𝑑(𝑡) − 𝑥𝑖𝑑(𝑡)) (3)
and
𝑥𝑖𝑑(𝑡 + 1) ← 𝑥𝑖𝑑(𝑡) + 𝑉𝑖𝑑(𝑡 + 1) (4)
6
where 𝑉𝑖𝑑(𝑡) is the velocity of the particle 𝑖 in the time point 𝑡 in the search space along the
dimension 𝑑. 𝑝𝑖𝑑(𝑡) is the best position in which the particle previously got high fitness value
called pbest, 𝑥𝑖𝑑(𝑡) is the immediate position of the particle 𝑖 in the search space, 𝑟1 and 𝑟2 are
randomly generated numbers in the range[0,1], 𝑝𝑔𝑑(𝑡) is the overall best position in which a
particle got the best fitness value called the gbest, 𝑐1 and 𝑐2 are acceleration parameters and 𝑤 is
inertia weight whose value is decreased linearly over the time from 0.9 to 0.4. Furthermore,
𝑥𝑖𝑑(𝑡 + 1) is the new position which the particle must move to, where 𝑥𝑖𝑑(𝑡) is the immediate
position of the particle and 𝑉𝑖𝑑(𝑡 + 1) is the new velocity of the particle that actually determines the
new position of the particle (Mohammed et al., 2009). The three steps of velocity update, position
update and fitness calculations are repeated until a desired convergence criterion is met.
Figure 1: Flow Diagram Illustrating the Behaviour of Particle Swarm Optimization Algorithm
(Olaleye et al., 2014)
2.3 Density-Sensitive Distance Metric Given a density-adjusted length of line segment defined as (Ling et al., 2012):
𝐿(𝑥𝑖 , 𝑥𝑗) = 𝜌𝑑𝑖𝑠𝑡(𝑥𝑖,𝑥𝑗) − 1 (5)
where 𝑑𝑖𝑠𝑡(𝑥𝑖, 𝑥𝑗) is the Euclidean distance between xi and xj whilst ρ > 1 is the flexing factor, the
length of line segment between two points is elongated or shortened by adjusting the flexing factor
7
𝜌. To describe the global consistency of data points, let data points be the nodes of graph G =
(V,E), and 𝑝 ∈ 𝑉𝑙 be a path of length 𝑙 =: |𝑝| connecting the nodes 𝑝1 and 𝑝|𝑝|in which
(𝑝𝑘, 𝑝𝑘+1) ∈ 𝐸, 1 ≤ 𝑘 < |𝑝|. With 𝑃𝑖𝑗 denoting the set of all paths connecting nodes 𝑥𝑖 and 𝑥𝑗, the
density-sensitive distance metric between two points can be defined as (Ling, Liefeng and Licheng,
2012):
𝐷𝑖𝑗 = ∑ 𝐿(𝑃𝑘, 𝑃𝑘+1)|𝑝|−1𝑘=1𝑝∈𝑃𝑖,𝑗
𝑚𝑖𝑛 (6)
such that, 𝐷𝑖𝑗 satisfies the four conditions for a metric, that is,
𝐷𝑖𝑗 = 𝐷𝑗𝑖 : 𝐷𝑖𝑗 ≥ 0; 𝐷𝑖𝑗 ≤ 𝐷𝑖𝑘 + 𝐷𝑘𝑗 for all 𝑥𝑖, 𝑥𝑗, 𝑥𝑘; and 𝐷𝑖𝑗 = 0 𝑖𝑓𝑓 𝑥𝑖 = 𝑥𝑗 . (7)
With these conditions satisfied, the density-sensitive distance metric can measure the geodesic
distance along the manifold, which results in any two points in the same region of high density
being connected by a lot of shorter edges while any two points in different regions of high density
are connected by a longer edge through a region of low density. That is, the distance between a
pair of points is measured by seeking for the least path in the graph, 𝐺. This achieves the aim of
elongating the distance among data points in different regions of high density and simultaneously
shortening that in the same region of high density (Ling, Liefeng and Licheng, 2012). Hence, this
distance metric can help converge complex and unstructured data to global optimum.
2.4 Min-Max Data Normalization
Normalization is employed to standardize the features of a dataset using a specified
predefined criterion so that redundant and noisy objects can be eliminated and only valid and
reliable data which can improve on the quality of results are used. Normalization is sometimes
used to enhance specific feature measurement methods rather than fix problems. Data
normalizations techniques include Min-Max, Z-Score and decimal scaling (Vaishali and Rupa,
2011). However, Min-Max technique is chosen for this study because of its robustness to noise
(Xiaoyan and Yanping, 2016). Min-max normalization performs a linear transformation on the
original data. Suppose that 𝑚𝑖𝑛𝑎 and 𝑚𝑎𝑥𝑎 are the minimum and the maximum values for attribute
𝐴. Min-Max normalization maps a value 𝑣 of 𝐴 to 𝑣′ in the range [𝑚𝑖𝑛𝑎, 𝑚𝑎𝑥𝑎] by computing:
𝑣′ = 𝑣−𝑚𝑖𝑛𝑎
𝑚𝑎𝑥𝑎−𝑚𝑖𝑛𝑎 (7)
where 𝒗′ = new value for variable 𝑣, 𝑣 is the current value for variable v, 𝑚𝑖𝑛𝑎 is the minimum
value in the dataset and 𝑚𝑎𝑥𝑎 is the maximum value in the dataset.
2.5 Trends of Improvement on K-means Clustering Algorithm
Over time, several significant improvements have been made to K-means. Ming-Chuan,
Jungpin, Jin-Hua and Don-Lin (2005) modified K-means clustering algorithm using simple
partitioning method. The authors highlighted that most K-means methods require expensive
distance calculations of centroids to achieve convergence. In their work, binary splitting was used
to partition the original dataset into blocks. Each block unit (UB) containing at least one pattern
has its centroid (CUB) determined via a simple calculation in other to form a reduced dataset that
represents the original dataset. The reduced dataset was then used to compute the final centroid of
the original dataset. Each UB was examined on the boundary of candidate clusters to find the
closest final centroid for every pattern in the UB. In this manner, the time for calculating final
converged centroids was dramatically reduced. It was claimed that the algorithm showed
significant improvement in performance in terms of total execution time, the number of distance
8
calculations and the efficiency for clustering than other K-means algorithms. However, the
modified K-means need more iterations to achieve the k centroids, sometimes even spending the
maximum number of iterations do not achieve convergence.
Levent, Seyda, Ding and Lee (2007) developed a K-SVMeans for multi-type interrelated
datasets that combines K-means clustering with Support Vector Machines. In a bid to eliminate
the need for labeled training instances for SVM learning, the cluster assignments of K-means are
used to train an online SVM in the secondary data type, and the SVM effects the clustering
decisions of K-means in the primary clustering space. This heterogeneous clustering process
effectively increases the clustering performance compared to clustering using a single
homogeneous data source. The authors reported results for Euclidean and spherical K-means
averaged over ten runs. Euclidean K-means makes cluster assignment decisions based on the
Euclidean distances between the document vectors while the spherical K-means uses the cosine
distances between documents as the similarity metric. The experimental results on analysis of
citeseer and newsgroup datasets which are real-world web-based datasets show that K-SVMeans
can successfully discover topical clusters of documents and achieve better clustering solutions than
homogeneous K-means algorithm. However, K-SVMeans suffers from high computational effort
and can give inaccurate result if the initial dataset to be used for training SVM is very large. The
computational demands of SVM for parameter settings and training of dataset is also a great
challenge.
Mary and Raja (2009) used the Ant Colony Optimization (ACO) to improve K-means
clustering performance. The authors improved the cluster quality after grouping via a two-phased
process. The resultant technique uses Euclidean distance and remains highly sensitive to the
changes in the value of the initial k. This makes it less applicable for clustering real world datasets.
Qian and Xinjian (2011) developed an improved K-means algorithm in gene expression data
analysis based on the Kruskal algorithm. Firstly, the minimum spanning tree (MST) of the
clustered objects was obtained by Kruskal algorithm. Then, K-1 edges are deleted based on weights
in a descending order. At last, the average values of the objects containing the K-connected graphs
resulting from last two steps were regarded as the initial clustering centers to cluster. Results
showed that this method is less sensitive to the initial choice of K than the conventional K-means
algorithm and increased the stability and accuracy of clusters. However, the developed K-means
algorithm failed when tested on large, complex, vast and real-time datasets and suffers from high
time complexity. In addition, the developed technique has high program difficulty.
Adigun, Omidiora, Olabiyisi, Adetunji and Adedeji (2012) developed a hybrid K-means –
Expectation Maximization (KEM) algorithm to address the limitations of K-means and
Expectation Maximization (EM) algorithms. K-means converges only to local minima after large
number of trials while EM converges prematurely. The hybrid KEM algorithm was developed via
the initialization stage and the iterative stage. In the initialization stage, the weighted average
variation of the K-means algorithm was used to classify the data into the number of clusters
desired. At the iterative stage, a large number M, of uniformly distributed random cluster point
vectors for the cluster centers were selected. Any cluster point vectors that are too close to other
cluster point vectors were eliminated and M is reduced accordingly until the clusters produced
equal to the number of desired clusters. This was achieved by computing the distances between all
the clusters and eliminating the clusters with distances lesser than a predefined threshold value.
Assigning each of the feature vectors to the nearest random cluster point vector was the next step
achieved by computing the distance between each feature vector and all other cluster point vectors.
The feature vector was assigned to the cluster point vector such that the distance between them is
9
the shortest. The hybrid algorithm showed improvements over K-means and EM more accurately
in a computationally efficient manner and was tested on real world educational dataset. However,
the hybrid KEM still converges to local minima because the K-means component used Euclidean
distance metric and as such not suitable for clustering large real world dataset. The hybrid KEM
was not developed to handle noise which characterizes the real world datasets.
Momin and Yelmar (2012) developed a Rough–Fuzzy Possibilistic K-means (RFPKM).
Membership function of the fuzzy sets enables overlapping clusters and the concept of lower and
upper approximations from rough sets handles uncertainty, vagueness and incompleteness.
Possibilistic membership functions generate memberships which are compatible with the center of
the class and not coupled with centers of other classes. RFPKM can cluster categorical data by
using probability distribution of categorical values. The evaluation results obtained showed that
RFPKM gives reduced value of objective function for categorical data clustering than the
traditional K-means and variants considered. However, it produced inaccurate classification with
noisy and large datasets. Furthermore, Jenn-Long, Yu-Tzu and Chih-Lung (2012) developed a
hybrid method based on genetic algorithm (GA) and K-means algorithm termed as GAKM. The
function of GAKM is to determine the optimal weights of the attributes and centers of clusters that
are needed to classify the dataset. GA generates an optimal solution by means of reproduction,
crossover and mutation operators. In GAKM, the result of K-means algorithm was used to adjust
the GA parameters. If fitness value is satisfied, the best solution is obtained, otherwise, the GA
parameters are recombined and re-evaluated to generate an optimal number of clusters. The work
did not present any evaluation result. However, it was reported that the developed GAKM
performed better than K-means on categorical data. This improvement is at the expense of
additional computational overhead. This is because, GA used longer execution steps to obtain
optimal number of clusters. Overfitting is also a challenge of the developed GAKM because the
K-means module was implemented using Euclidean distance.
Shanmugapriya and Punithavalli (2012) developed a modified projected K-means
clustering algorithm with effective distance measure that continuously optimizes a comprehensive
objective function. In the objective function of this developed algorithm, an effective distance
measure makes use of local and non-local information to provide better clustering results in high
dimensional data. In order to avoid the value of the objective function from decreasing as a
consequence of the exclusion of dimensions, virtual dimensions were incorporated with the
objective function. It only works efficiently in principle as the developed algorithm was not
evaluated. Mohamed and Wesam (2013) addressed the problems of random initialization of
prototypes and the requirement of pre-defined number of clusters in the dataset for classical K-
means. Randomly initialized prototypes reportedly produce results that converge to local rather
than global optimum. Based on this rationale, an improved K-means clustering algorithm called
Efficient Data Clustering Algorithm (EDCA) was developed. This algorithm uses definition of
density computation of data points based on the K-Nearest Neighbor method to determine the
initial number of clusters. Furthermore, noise and outliers which affect K-means strongly were
detected. Their result showed slight improvement over the conventional K-means algorithm.
EDCA is able to detect clusters with different non-convex shapes, different sizes and densities.
This solution suffers from high computational overhead incurred from K-nearest neighbor method
and not suitable for highly complex data.
Nidhi and Ujjwal (2013) developed an incremental K-means clustering algorithm that
assigns any random data object to the first cluster of a given set of data objects. After selecting the
next random object, the distance between selected object and centroids of existing clusters was
10
determined. This distance was compared with the threshold limit so as to be able to group the
object into existing cluster or form a new cluster with that object. Experimental results revealed
that the developed algorithm produced clusters in lesser computation time but only with small and
noise-free dataset. It cannot handle large, noisy dataset in a computationally efficient manner due
to the rigid nature of the incremental approach used. Chunfei and Zhiyi (2013) modified the
traditional K-means algorithm by improving on the initial focal point and process of determining
the K value. The cluster center was initialized and adjusted. The Euclidean distance of various data
objects from each cluster center was calculated and the square error criterion function was
determined to ascertain if convergence is reached. The improved clustering algorithm added
weight of data point to the cluster center so as to reduce or even avoid the impact of the noise data
in the data set object. The final clustering result of the modified K-means showed improved
performance over some variants when evaluated. The developed technique was tested on small-
sized dataset with low noise level and thus not appropriate for real world data clustering.
Chetna and Garima (2013) developed a linear Principal Component Analysis (PCA)-based
hybrid K-means PSO algorithm for clustering large dataset. PCA module was executed to convert
high dimensional data to low dimension using covariance matrix. Then, the K-means clustering
algorithm was made to search for the clusters’ centroid locations using the Euclidean distance
similarity metric. This information was passed to the PSO module for the generation of the final
optimal clustering solution as the result. In general, PSO conducts a global search for the optimal
clustering, but more iteration numbers is required. The PSO was assisted by K-means to start with
good initial cluster centroid that converge faster thereby yielding a more compact result. The result
from the K-means module was treated as the initial seed for the PSO module to discover the
optimal solution by a globalized search to avoid high computational time complexity. Better
clustering result was obtained with PCA-based HYBRID (K-PSO) algorithm when compared with
PSO only. The hybrid system is complex, converged to local minima given clusters with wide
variation in size and shape, incurred high computational overhead and was not evaluated with other
improved K-means variants.
Furthermore, Li, Lei, Bo, Yue and Jin (2015) developed an improved K-means algorithm
based on MapReduce and grid. The improved method is divided into the same grid in space
according to the size of the data point property value and assigns it to the corresponding grid. It
counts the number of data points in each grid, selects 𝑀(𝑀 > 𝐾) grids, comprising the maximum
number of data points and calculate the central point. These M central points serve as input data
to determine the K value based on the clustering results. In the M points, it finds 𝐾 points farthest
from each other and those 𝐾 center points as the initial cluster center of K-means algorithm. At the
same time, the maximum value in M was included in 𝐾. If the number of data in the grid is less
than the threshold, then these points were considered as noise points and were removed. In order
to make the improved algorithm adapt to handle large data, the improved K-means algorithm was
paralleled and combined with the MapReduce framework. Theoretical analysis and experimental
results show that the improved algorithm compared to the traditional K-means clustering algorithm
has high quality results, less iteration and good stability.
Sharfuddin, Mohammad, Dip and Mashiour (2015) argued that the current minimum
distance in traditional K-means is not always the correct minimum distance because the distance
between a cluster center and each data point is measured in every iteration. This makes the
algorithm more complex and increases the number of computations. In the modified version of K-
means algorithm developed by the authors, a checkpoint value was added to store the center point
of the distance of two cluster centers and was used to determine the cluster an object is going to
11
be assigned to. This checkpoint value reduced the possibility of error during the clustering process.
The authors reported that the modified K-means requires less computation and has enhanced
accuracy than the traditional K-means algorithm as well as some modified variant of the traditional
K-means. However, shortage of available resources and time limited the work. The developed K-
means algorithm was not tested on large, complex, vast and real world datasets.
Min, Tommy and Rosa (2015) clustered heterogeneous data with K-means by mutual
information-based Unsupervised Feature Transformation (UFT). The work addressed the
computational complexities of K-means algorithm for datasets with large sample sizes and its
sensitivity to outliers. To address these challenges, the mutual information-based unsupervised
feature transformation which could transform non-numerical features into numerical features was
integrated with the conventional K-means to cluster the heterogeneous data. Simulation results
showed that the integrated UFT-K-means improved over other clustering algorithms with
reasonable clusters for one modified real-world dataset and five real-world benchmark datasets.
However, the developed algorithm is parameter-dependent and computationally highly inefficient.
Summarily, most existing improvements on K-means do not suffice its ability to cluster noisy and
large data accurately and in a computationally efficient manner while some others suffer from high
computational overhead. Sequel to these, clustering large and noisy dataset using K-means in a
computationally-efficient and accurate manner still remains largely an open problem which is
addressed in this study.
3. Materials and Method
The experimental architecture for the hybrid NPSO-DS K-means algorithm is presented in
3 developmental stages:
i. Real-World Dataset Acquisition
ii. Development of a Normalized Particle Swarm Optimization
iii. Integration of NPSO into a Density-Sensitive K-means algorithm
3.1 Real-World Dataset Acquisition
UCI Educational Process Mining (EPM) and wine datasets are the most widely used real
world datasets in literatures. These datasets can be accessed and downloaded from
https://archive.ics.uci.edu.ml/datasets. However, the description of the datasets is
presented in Table 1. However, sample EPM and Wine datasets are shown in Figures (2
and 3) respectively.
i. UCI Educational process mining (EPM) Dataset: This is a publicly-available learning
analytics dataset from smartlab located in Italy. It was collected in 2015 and contains the
time series of students’ activities during 6 laboratory sessions of a course on digital
electronics. There are 6 folders containing the students’ data per session. Each folder
contains up to 99 csv files each dedicated to each student log in that session. The number
of files in each folder changes due to the number of students present in each session.
However, each file contains 230318 instances and 13 integer attributes.
ii. UCI Wine dataset: The data contained are the results of a chemical analysis of wines grown
in Italy but derived from three different cultivars. The quantities of 13 constituents found
in each of the three types of wines is determined by the analysis.
12
Table 1
Description of the EPM and Wine Real World Datasets
UCI Datasets Instances Number of Attributes Type of Attribute
EPM 230318 13 integer
Wine 178 13 Integer and real
Figure 2: Sample Data of Educational Process Mining Dataset
Figure 3: Sample Data of Wine Dataset
3.2 Development of a Normalized Particle Swarm Optimization Algorithm Particle swarm optimization (PSO) technique was applied to reduce the dimension and
number of the particles to be clustered by K-means. The conventional PSO was modified such that
13
it incorporates the clustering error measure (CE) as the objective function. The clustering error
(CE) is defined as (Ling, Liefeng and Licheng, 2012):
𝐶𝐸(∆, ∆𝑡𝑟𝑢𝑒) =1
𝑛 ∑ ∑ 𝐶𝑜𝑛𝑓𝑢𝑠𝑖𝑜𝑛 (𝑖, 𝑗)𝑘
𝑗=1𝑖≠𝑗𝑘𝑡𝑟𝑢𝑒𝑖=1 (8)
where the clustering produced, ∆, is given by
∆ = {𝐶1, 𝐶2, . . . , 𝐶𝑘}, (9)
the true clustering, ∆𝑡𝑟𝑢𝑒 , expressed as
∆𝑡𝑟𝑢𝑒 = {𝐶1𝑡𝑟𝑢𝑒 , 𝐶2
𝑡𝑟𝑢𝑒 , … , 𝐶𝑘𝑡𝑟𝑢𝑒𝑡𝑟𝑢𝑒 }, (10)
and n being the total number of data points. Thus, ∀𝑖 ∈ [1, . . . , 𝑘𝑡𝑟𝑢𝑒], 𝑗 ∈[1, . . . . , 𝑘], 𝐶𝑜𝑛𝑓𝑢𝑠𝑖𝑜𝑛 (𝑖, 𝑗) denotes the number of same data points both in the true cluster 𝐶𝑖
𝑡𝑟𝑢𝑒
and in the cluster 𝐶𝑗 produced. However, there exists a renumbering problem. For example, cluster
1 in the true clustering might be assigned cluster 3 in the clustering produced and so on. To counter
that, the CE is computed for all possible renumbering of the clustering produced, and the minimum
of all those is taken. The best clustering performance is such with the smallest CE. The flowchart
and use case diagram of the proposed NPSO are presented in Figures (4 and 5) respectively. Given
a set of data point as input, the normalized PSO is expected to return a reduced set of discriminant
particles. In this study, five (5) major steps were carried out to develop the NPSO technique for a
given set of inputs which are 𝑛 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠 {𝑥𝑖} 𝑖=1𝑛 , maximum iteration number 𝑡𝑚𝑎𝑥 and stop
threshold 𝑒. Step 1: Particles are initialized via random generation to form an initial population where each
particle represents a feasible cluster solution. The number of particles is taken as a product of
dataset dimension and number of clusters to be generated. The dataset represents a swarm and the
constituent elements represent the particles. Analytically, swarm is composed of a set of particles:
𝑝 = {p1, p2, p3, … , p𝑛} (11)
where n is the dimension of the dataset. Step 2: The position and velocity of the particles are
initialized, such that, at any time step 𝑡, the particle 𝑝𝑖 has two vectors, position, 𝑋𝑖(𝑡) and velocity,
𝑉𝑖(𝑡) associated. Each candidate solution possesses a position which represents the solution in
search space and velocity for the movement of particles for finding global optimal solution. The
particles’ position and velocity were initialized using equations (1 and 2) respectively. Step 3:
Evaluation of particles’ fitness: The fitness value of each particle was computed using the
clustering error described in equation (8). However, at each generation, best fitness values were
updated using (Gursharan and Harpreet, 2014):
𝑃𝑖(𝑡 + 1) = { 𝑃𝑖(𝑡) 𝑓(𝑋𝑖(𝑡 + 1)) ≤ 𝑓(𝑋𝑖(𝑡))𝑋𝑖(𝑡 + 1) 𝑓(𝑋𝑖(𝑡 + 1)) > 𝑓(𝑋𝑖(𝑡))
} (12)
where f denotes the fitness function (clustering error), 𝑃𝑖(𝑡) stands for the best fitness values and
the coordination where the value was calculated, 𝑋𝑖(𝑡) stands for the current position and t denotes
the generation step. Step 4: Position and velocity update: The search for the global optimal solution
was carried out through a dynamic update of the particles in swarm. Equation (3) is used to update
the velocity as a function of the initial velocity, the particle own best performance and the swarm
best performance. Position update was done by adding incremental change in position at each step
using equation (4). At this step, in the conventional PSO, some particles usually move out of search
space boundary which often lead to errors and in turn affects the overall output accuracy. This is
usually due to the presence of noisy data in the dataset (Gursharan and Harpreet, 2014).
14
..
Yes
No
Start
Final global best particles’
population
{𝑥𝑖} 𝑖=1𝑚 : n > m
Dataset Input
{𝑥𝑖} 𝑖=1𝑛 , 𝑡𝑚𝑎𝑥, 𝑒
Generate initial particles
{𝑥𝑖} 𝑖=1𝑛 = 𝑝 = {p1, p2, p3, … , p𝑛}, t = 0
Initialize particles’ position and velocity
𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑋𝑖𝑑 = LB + rand (UB − LB)
𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝑉𝑒𝑙𝑜𝑐𝑖𝑡𝑦, 𝑉𝑖𝑑 =LB + rand (UB − LB)
∆𝑡
t = t + 1
Evaluate particles’ fitness using clustering error (CE)
𝐶𝐸(∆, ∆𝑡𝑟𝑢𝑒) =1
𝑛 ∑ ∑ 𝐶𝑜𝑛𝑓𝑢𝑠𝑖𝑜𝑛 (𝑖, 𝑗)𝑘
𝑗=1𝑖≠𝑗𝑘𝑡𝑟𝑢𝑒𝑖=1
∆𝑡𝑟𝑢𝑒 = {𝐶1𝑡𝑟𝑢𝑒 , 𝐶2
𝑡𝑟𝑢𝑒 , … , 𝐶𝑘𝑡𝑟𝑢𝑒𝑡𝑟𝑢𝑒 }
∆ = {𝐶1, 𝐶2, . . . , 𝐶𝑘} 𝐶𝑜𝑛𝑓𝑢𝑠𝑖𝑜𝑛 (𝑖, 𝑗) = 𝑛 {𝐶𝑖
𝑡𝑟𝑢𝑒 ∩ 𝐶𝑗 }
Compute and update best fitness values at each generation
𝑃𝑖(𝑡 + 1) = { 𝑃𝑖(𝑡) 𝑓(𝑋𝑖(𝑡 + 1)) ≤ 𝑓(𝑋𝑖(𝑡))𝑋𝑖(𝑡 + 1) 𝑓(𝑋𝑖(𝑡 + 1)) > 𝑓(𝑋𝑖(𝑡))
}
Compute and update particles’ position and velocity
New position, 𝑥𝑖𝑑(𝑡 + 1) ← 𝑥𝑖𝑑(𝑡) + 𝑉𝑖𝑑(𝑡 + 1)
New velocity, 𝑉𝑖𝑑(𝑡 + 1) ← 𝑤 ∗ 𝑉𝑖𝑑(𝑡) + 𝑉𝑖𝑑(𝑡 + 1) ← 𝑤 ∗ 𝑉𝑖𝑑(𝑡) + 𝑐1𝑟1(𝑝𝑖𝑑(𝑡) − 𝑥𝑖𝑑(𝑡)) + 𝑐2𝑟2 (𝑝𝑔𝑑(𝑡) − 𝑥𝑖𝑑(𝑡))
Normalize particles using min-max algorithm
and generate new particles’ population
𝒗′ = 𝑣−𝑚𝑖𝑛𝑎
𝑚𝑎𝑥𝑎−𝑚𝑖𝑛𝑎
𝐼𝑠 𝑡 ≤ 𝑡𝑚𝑎𝑥?
Stop
Figure 4: Flowchart of the Proposed NPSO
15
PSO
Particle
real world datasetGenerate initial
population
Initialize
position and velocity
Compute and update new
position and velocity
Evaluate fitness
value
Compute and update best
fitness value at each
generation
Generate final
population of particles
accepts*
*
*
***
*
*
*
*
Usersupply
*
*
*
*
*
*
clustering error function
adopts
normalize
Min-Max Algorithm
*
*
adopts
*
*
minimizes
noise
*
*
*
*
associates with
*
*
Figure 5: Use Cases Diagram for the NPSO
In this study, the devastating impact of the noisy data was addressed by forcing relevant particles
to remain within the boundary or reset to the boundary value by using Min-max normalization
function defined in equation (7).
Step 5: Steps 2-4 is repeated until one of following termination conditions is satisfied.
a. The maximum number of iterations is reached.
b. The mean change in centroid vectors is less than a predetermined value.
After the completion of step 5, the expected output is 𝑚 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠 {𝑥𝑖} 𝑖=1𝑚 : 𝑚 ≪ 𝑛.
3.3 The Hybrid NPSO-Density Sensitive K-means Algorithm
The hybrid NPSO-density sensitive K-means is the product of integrating the NPSO
algorithm into a density sensitive k-means algorithm. The corresponding conceptual flow and use
cases diagrams are shown in Figures (6 and 7) respectively. As presented in Algorithm 2, with DS-
K-means, a density-sensitive distance is incorporated into K-means to replace the Euclidean
distance. The justification for this step is borne out of the fact that poor assignment of particles to
clusters is inevitable especially where the particle has equal minimum Euclidean distance to a
number of clusters.
16
Figure 6: The flow diagram of the Hybrid NPSO-DS K-means algorithm
Yes
No
Start
Randomly choose k best position particles of
PSO to initialize k cluster centers
Input final particles’ population
from PSO
{𝑥𝑖} 𝑖=1𝑚 : n > m, k, tmax, 𝑒
Compute density-sensitive distance measure for any two points𝑥𝑖, 𝑥𝑗
𝐿(𝑥𝑖 , 𝑥𝑗) = 𝜌𝐷𝑖𝑗 − 1
𝐷𝑖𝑗 = 𝐿(𝑃𝑘, 𝑃𝑘+1)
|𝑝|−1
𝑘=1
𝑝∈𝑃𝑖,𝑗𝑚𝑖𝑛
Assign each particles to cluster to which the
density-sensitive distance of its center to the
point is minimum
Recalculate the cluster centroid after all
particles have been assigned
𝑡𝑚𝑎𝑥 reached or
centroids no
longer move?
Output partition of the dataset C1,…,Ck
Stop
17
Figure 7: Use Cases Diagram of the Hybrid NPSO-DS K-means algorithm
Input: 𝑚 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠 {𝑥𝑖} 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑓𝑟𝑜𝑚 𝑁𝑃𝑆𝑂𝑖=1𝑚 ; cluster number 𝑘, maximum
iteration number 𝑡𝑚𝑎𝑥, stop threshold e.
Output: Partition of the dataset 𝐶1, …, 𝐶𝑘. (1) randomly choose 𝑘 data points using the 𝐾 best position particles of PSO to initialize 𝑘
cluster centers;
(2) for any two data points 𝑥𝑖 and 𝑥𝑗 do
(3) compute the density-sensitive distance using equations (5 and 6);
(4) assign each particle to the closest centroid calculated by the minimum density-sensitive
distance;
(5) if all particles have not been assigned, then go to (4) else go to (6)
(6) recalculate new centroid for each cluster
(7) end for
(8) if centroids move or the maximum number of iterations, 𝑡𝑚𝑎𝑥, has not been reached, then
go to (2) else go to (9)
(9) stop
Algorithm 2: The hybrid NPSO-DS K-means Algorithm
Consequently, the centroids are forced to converge to local minimal and as such would be unable
to typify any group of data as desired (Olugbara, Adetiba and Oyewole, 2015). However,
employing a density-based objective function is capable of converging to global optimum even
with arbitrary and non-convex shaped clusters (Joshi and Kaur, 2013). Clusters can easily be
18
formed by data points located in dense regions while the low density regions separate data points
from different clusters.
3.4 Performance Evaluation Metrics
The performance of the developed NPSO-DS K-means algorithm was evaluated using the
following metrics:
i. Clustering time: This represents the time required to cluster all data points. This
parameter depends on the platform where the clustering is implemented and will dictate
if real-time functionality is available or not.
ii. Sum-of-Squared Error (SSE): This is the sum of squares of the departure from the
average for each calculated value of data (Jiming and Yu, 2005)
𝑆𝑆𝐸 = ∑ ( 𝑥𝑖 − �̅�)2𝑛
𝑖=1 (13)
where 𝑛 denotes the number of particles and 𝑥𝑖 represents the actual value of the 𝑖𝑡ℎ particle.
iii. Clustering Accuracy: This is also known as the Rand Index (RI), a measure that
describes the actual percentage of documents that are correctly assigned to their
corresponding clusters. It is defined as (Rand, 1971):
Clustering Accuracy = TP+TN
TP+TN+FP+FN x 100% (14)
where 𝑇𝑃, 𝑇𝑁, 𝐹𝑃 and 𝐹𝑁 represent the true positive, true negative, the false positive
and the false negative values respectively. In this study, 𝑇𝑃 defines two close particles
that are correctly assigned to the same cluster, a 𝑇𝑁 correctly assigns two contrasting
particles in different clusters. Similarly, 𝐹𝑃 defines two contrasting particles that are
wrongly placed in the same cluster while the 𝐹𝑁 wrongly assigns two close particles
in different clusters.
4. Results
In this study, a hybrid NPSO-DS K-means algorithm was developed and benchmarked with
three (3) variants which are K-means, PCA-based HYBRID (K-PSO) and UFT-K-means. All the
algorithms were implemented using MATLAB 7.7.0 (R2008b) on Windows 7 Ultimate 32-bit
operating system, AMD Athlon (tm) X2 DualCore QL-66 central processing unit with a speed of
2.2GHZ, 2GB random access memory and 320GB hard disk drive. We tested for values of K = 2,
3, 4 and the results obtained for educational process mining (EPM) and wine datasets are presented
in Tables (2 and 3) respectively. In all the evaluations, results were obtained for the three (3)
effective metrics for evaluating a good clustering algorithm which include clustering accuracy
clustering time and sum of squared error (Hai et al., 2010). In Figures (8a and 8b), the sample
visual outputs of NPSO-DS and PCA-based HYBRID (K-PSO) K-means Algorithm with EPM
Dataset are respectively shown for K = 2, 3 and 4.
19
Table 2
Evaluation results of the clustering algorithms using the EPM dataset
Number of
Clusters
Algorithm Clustering
accuracy
(%)
Clustering
time (s)
Sum of
Squared
Error
2
K-means 64.8 83.2 0.48
PCA-based HYBRID (K-PSO) 72.1 99.4 0.39
UFT-K-means 77.7 87.2 0.33
Developed NPSO-DS K-means 80.2 85.7 0.28
3
K-means 67.3 78.7 0.42
PCA-based HYBRID (K-PSO) 76.4 91.8 0.32
UFT-K-means 79.1 84.7 0.27
Developed NPSO-DS K-means 83.6 80.5 0.21
4
K-means 69.2 74.4 0.36
PCA-based HYBRID (K-PSO) 83.9 87.2 0.28
UFT-K-means 87.3 80.4 0.22
Developed NPSO-DS K-means 92.4 74.8 0.13
Table 3
Evaluation results of the clustering algorithms using the wine dataset
Number of
Clusters
Algorithm Clustering
accuracy
(%)
Clustering
time (s)
Sum of
Squared
Error
2
K-means 88.5 16.7 0.164
PCA-based HYBRID (K-PSO) 89.4 24.3 0.160
UFT-K-means 91.1 21.7 0.156
Developed NPSO-DS K-means 93.6 17.2 0.133
3
K-means 91.3 14.3 0.148
PCA-based HYBRID (K-PSO) 92.2 23.8 0.126
UFT-K-means 92.9 19.9 0.113
Developed NPSO-DS K-means 94.8 15.1 0.098
4
K-means 92.8 12.1 0.119
PCA-based HYBRID (K-PSO) 94.1 21.7 0.106
UFT-K-means 95.6 18.4 0.097
Developed NPSO-DS K-means 96.3 13.9 0.082
20
(K=2) (K=3) (K=4)
Figure 8a: Sample Output of NPSO-DS K-means Algorithm with EPM Dataset
(K=2) (K=3) (K=4)
Figure 8b: Sample Output of PCA-based HYBRID (K-PSO) with EPM Dataset
4.1 Clustering Accuracy (Rand Index)
As shown in Figure 9, the clustering accuracies produced by the original K-means, PCA-
based HYBRID (K-PSO), UFT-K-means and the developed NPSO-DS K-means for 2 clusters (K
= 2) using EPM dataset are 64.8%, 72.1%, 77.7% and 80.2% respectively. For 3 clusters (K = 3)
using EPM dataset, the accuracies obtained by the original K-means, PCA-based HYBRID (K-
PSO), UFT-K-means and the developed NPSO-DS K-means are 67.3%, 76.4%, 79.1% and 83.6%
respectively. When cluster number was increased to 4 (K = 4), the original K-means, PCA-based
HYBRID (K-PSO), UFT-K-means and the developed NPSO-DS K-means yielded accuracies of
69.2%, 83.9%, 87.3% and 92.4% respectively on EPM dataset. However, in Figure 10, the
clustering accuracies produced by the original K-means, PCA-based HYBRID (K-PSO), UFT-K-
means and the developed NPSO-DS K-means for 2 clusters (K = 2) using wine dataset are 88.5%,
89.4%, 91.1% and 93.6% respectively. In the same vein, the accuracies produced by the original
K-means, PCA-based HYBRID (K-PSO), UFT-K-means and the developed NPSO-DS K-means
are 91.3%, 92.2%, 92.9% and 94.8% respectively for 3 clusters (K = 3) with wine dataset. When
cluster number was increased to 4 (K = 4), the original K-means, PCA-based HYBRID (K-PSO),
UFT-K-means and the developed NPSO-DS K-means yielded accuracies of 92.8%, 94.1%, 95.6%
and 96.3% respectively on wine dataset.
21
Figure 9: Accuracy of the Clustering Algorithms on EPM Dataset
Figure 10: Accuracy of the Clustering Algorithms on wine Dataset
4.2 Clustering Time
The execution time of the clustering algorithms obtained on EPM dataset is presented in
Figure 11. The original K-means, PCA-based HYBRID (K-PSO), UFT-K-means and the
developed NPSO-DS K-means converged approximately in 83.2s, 99.4s, 87.2s and 85.7s
respectively when the number of clusters was 2. Similarly, the original K-means, PCA-based
64.8 67.3 69.272.176.4
83.977.7 79.1
87.380.2
83.6
92.4
0
10
20
30
40
50
60
70
80
90
100
2 Clusters 3 Clusters 4 Clusters
Clu
ste
rin
g A
ccu
racy
Number of Clusters
Clustering Accuracy on EPM Dataset of 230318 instances
K-means PCA-based HYBRID (K-PSO) UFT-k-means Developed NPSO-DS K-means
88.5
91.3
92.8
89.4
92.2
94.1
91.1
92.9
95.6
93.6
94.8
96.3
84
86
88
90
92
94
96
98
2 Clusters 3 Clusters 4 Clusters
Clu
ste
rin
g A
ccu
racy
Number of Clusters
Clustering Accuracy on Wine Dataset of 178 instances
K-means PCA-based HYBRID (K-PSO) UFT-k-means Developed NPSO-DS K-means
22
HYBRID (K-PSO), UFT-K-means and the developed NPSO-DS K-means converged in
approximately 78.7s, 91.8s, 84.7s and 80.5s respectively for 3 clusters. When cluster number was
increased to 4 (K = 4), the original K-means, PCA-based HYBRID (K-PSO), UFT-K-means and
the developed NPSO-DS K-means converged at approximate clustering time of 74.4s, 87.2s, 80.4s
and 74.8s respectively. Furthermore, the execution times used by the clustering algorithms on wine
dataset is conceptually represented in Figure 12.
Figure 11: Execution time of the Clustering Algorithms on EPM Dataset
Figure 12: Execution time of the Clustering Algorithms on Wine Dataset
83.278.7
74.4
99.491.8
87.287.2 84.780.4
85.780.5
74.8
0
20
40
60
80
100
120
2 Clusters 3 Clusters 4 Clusters
Clu
ste
rin
g Ti
me
(s)
Number of Clusters
Clustering Time on EPM Dataset of 230318 instances
K-means PCA-based HYBRID (K-PSO) UFT-k-means Developed NPSO-DS K-means
16.7
14.312.1
24.3 23.821.721.7
19.918.4
17.215.1
13.9
0
5
10
15
20
25
30
2 Clusters 3 Clusters 4 Clusters
Clu
ste
rin
g Ti
me
(s)
Number of Clusters
Clustering Time on Wine Dataset of 178 instances
K-means PCA-based HYBRID (K-PSO) UFT-k-means Developed NPSO-DS K-means
23
The original K-means, PCA-based HYBRID (K-PSO), UFT-K-means and the developed NPSO-
DS K-means converged approximately in 16.7s, 24.3s, 21.7s and 17.2s respectively when the
number of clusters was 2. Similarly, the original K-means, PCA-based HYBRID (K-PSO), UFT-
K-means and the developed NPSO-DS K-means converged in approximately 14.3s, 23.8s, 19.9s
and 15.1s respectively for 3 clusters. When cluster number was increased to 4 (K = 4), the original
K-means, PCA-based HYBRID (K-PSO), UFT-K-means and the developed NPSO-DS K-means
converged at approximate clustering time of 12.1s, 21.7s, 18.4s and 13.9s respectively.
4.3 Sum of Squared Error (SSE)
The SSE incurred by the clustering algorithms over EPM dataset is as shown in Figure 13.
The original K-means, PCA-based HYBRID (K-PSO), UFT-K-means and the developed NPSO-
DS K-means incurred error of 0.48, 0.39, 0.33 and 0.28 respectively when the number of clusters
was 2. Similarly, the original K-means, PCA-based HYBRID (K-PSO), UFT-K-means and the
developed NPSO-DS K-means yielded error of 0.42, 0.32, 0.27 and 0.21 respectively for 3 clusters.
When the cluster number was increased to 4 (K = 4), the original K-means, PCA-based HYBRID
(K-PSO), UFT-K-means and the developed NPSO-DS K-means had errors of 0.36, 0.28, 0.22 and
0.13 respectively. In Figure 14, the errors obtained by the clustering algorithms over wine dataset
are presented.
Figure 13: Error obtained from the Clustering Algorithms on EPM Dataset
Figure 14: Sum of Squared Error obtained from the Algorithms on Wine Dataset
0.480.42
0.360.390.32 0.28
0.330.27
0.220.28
0.210.13
0
0.2
0.4
0.6
2 Clusters 3 Clusters 4 Clusters
Sum
of
Squ
are
d E
rro
r
Number of Clusters
Sum of Squared Error over EPM Dataset of 230318 instances
K-means PCA-based HYBRID (K-PSO) UFT-k-means Developed NPSO-DS K-means
0.1640.148
0.119
0.160.126
0.106
0.156
0.1130.097
0.133
0.0980.082
0
0.05
0.1
0.15
0.2
2 Clusters 3 Clusters 4 Clusters
Sum
of
squ
ared
Err
or
Number of Clusters
Sum of Squared Error over Wine Dataset of 178 instances
K-means PCA-based HYBRID (K-PSO) UFT-k-means Developed NPSO-DS K-means
24
The original K-means, PCA-based HYBRID (K-PSO), UFT-K-means and the developed NPSO-
DS K-means incurred error of 0.164, 0.16, 0.156 and 0.133 respectively when the number of
clusters was 2. In addition, the original K-means, PCA-based HYBRID (K-PSO), UFT-K-means
and the developed NPSO-DS K-means yielded error of 0.148, 0.126, 0.113 and 0.098 respectively
for 3 clusters. However, when the cluster number was increased to 4 (K = 4), the original K-means,
PCA-based HYBRID (K-PSO), UFT-K-means and the developed NPSO-DS K-means had errors
of 0.119, 0.106, 0.097 and 0.082 respectively.
4.4 Discussion
The developed NPSO-DS K-mean algorithm has a dominant performance with both the
EPM and the wine real world datasets compared with the conventional K-means, UFT-K-means
and PCA-based HYBRID (K-PSO) clustering algorithms especially in terms of clustering accuracy
and sum of squared error. The least accuracies produced by K-means in all the evaluations
conducted using EPM dataset indicated that K-means is not a good candidate for clustering large
real world dataset such as EPM which contains 230318 instances. However, as the number of
clusters was increased, K-means shows some improvements in accuracy but nevertheless, its
accuracy was the least among other algorithms considered. With cluster numbers (2, 3 and 4),
accuracies (64.8%, 67.3% and 69.2%) were obtained respectively for K-means. This indicates that
the higher the number of clusters, the better the clustering accuracy of K-means algorithm. This
was also a general behaviour of all the algorithms evaluated.
The results obtained for K-means corroborates with the assertion of Li et al. (2015) that K-
means can fail with large and noisy dataset because it only converges to local minima and suffers
the limitation imposed on it by Euclidean distance similarity metric by default. However, on the
wine dataset which contains only 178 instances, K-means drastically improved as accuracies
(88.5%, 91.3% and 92.8%) were obtained for cluster numbers (2, 3 and 4) respectively. This
implies that K-means is a very good algorithm for small datasets as stated by Twinkle et al. (2014).
It is worthy of mentioning that K-means is the most computationally efficient algorithm as it
produces the least clustering time in all the evaluations conducted on EPM and wine datasets,
followed by the developed NPSO-DS K-means, UFT-K-means and the PCA-based HYBRID (K-
PSO) algorithm in that order. In all the evaluations conducted on wine and EPM datasets, the
developed NPSO-DS K-means algorithm is the most accurate and with the least sum of squared
error followed by UFT-K-means, PCA-based HYBRID (K-PSO) and the original K-means in that
order. This challenging performance by the developed NPSO-DS K-means algorithm is due to the
fact that data normalization and relevant particle selection procedures as well as a globally
converging density-sensitive distance measure were incorporated into the developed NPSO-DS K-
means algorithm. Olaleye et al. (2014) and Fagbola et al. (2012) stated that improvements obtained
for feature selection and data normalization procedures invariably impacts on the effectiveness of
data mining algorithms which justifies the results obtained for the NPSO-DS K-means algorithm.
5. Conclusion and Future Works
This research work presents a NPSO-DS K-means algorithm based on relevant optimal
particle selection and density-sensitive distance measure. The results reveal that the developed
NPSO-DS K-means algorithm has a more dominant performance over the conventional K-means,
UFT-K-means and PCA-based HYBRID (K-PSO) algorithms especially in terms of clustering
accuracy. This challenging performance by the developed NPSO-DS K-means algorithm is due to
the fact that relevant particle selection procedure as well as the globally converging density-
25
sensitive distance measure were incorporated into the developed NPSO-DS K-means algorithm.
Olaleye et al. (2014) and Fagbola et al. (2012) stated that improvements obtained for efficient and
effective feature selection procedures invariably impact on the effectiveness of clustering
algorithms which justifies the results obtained for the NPSO-DS K-means algorithm. On the other
hand, the least accuracies produced by K-means in all the evaluations corroborated with the
assertion of Li et al. (2015) that K-means is not a good candidate for clustering large real world
datasets. The developed NPSO-DS K-means can identify non-convex clustering structures, thus
generalizing the application area of the conventional K-means algorithm. The experimental results
on EPM world dataset which contains 230318 instances validate the effectiveness of the developed
algorithm. The developed NPSO-DS K-means algorithm can be applied in situations where the
distributions of data points are not compact super-spheres. However, the near-optimal clustering
time produced by the developed NPSO-DS K-means can be further investigated for possible
improvements. Based on the results obtained, the developed NPSO-DS K-means clustering
algorithm performs best in all the evaluation conducted on EPM and wine datasets in terms of
clustering accuracy and sum of squared error. However, it yielded higher clustering time than the
conventional K-means only. This could be due to the time required to normalize and select relevant
features at each generation of NPSO technique before final clustering of resultant particles by DS-
K-means. Though, it is more computationally efficient than UFT-K-means and PCA-based
HYBRID (K-PSO), nevertheless, further research can be directed along this direction.
References
1. Abimbola Adebisi Adigun, Temitayo Matthew Fagbola and Adekanmi Adegun (2014).
Swarmdroid: Swarm Optimized Intrusion Detection System for the Android Mobile
Enterprise. International Journal of Computer Science Issues, Mauritius (IJCSI), Mauritius,
11 (3): pp 62-69.
2. Adigun A.A, Omidiora E.O, Olabiyisi S.O, Adetunji A.B, Adedeji O.T (2012): “Development
of a Hybrid K-means-expectation maximization clustering algorithm”, Journal of
computations & modeling, 2(4):55-65.
3. Amita V. and Ashwani K., (2014): “Performance Enhancement of K-means Clustering
Algorithms for High Dimensional Data sets”, International Journal of Advanced Research in
Computer Science and Software Engineering, 4(1), ISSN: 2277 128.
4. Azhar T., Arthur D. and S. Vassilvitskii (2012): “K-means++: The advantages of careful
seeding”, Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms
(DA ’07), PA, USA, 1027-1035.
5. Chen J. Y. and H. Y. Zhang (2017). "Research on application of clustering algorithm based
on PSO for the web usage pattern," International Conference on Wireless Communications,
Networking and Mobile Computing, pp.3705-3708.
6. Chetna Sethi and Garima Mishra (2013): “A Linear PCA based hybrid K-means PSO
algorithm for clustering large dataset”, International Journal of Scientific & Engineering
Research, 4(6), 1559-1566.
7. Chunfei Zhang and Zhiyi Fang (2013): “An Improved K-means Clustering Algorithm”,
Journal of Information & Computational Science, 10(1), 193–199.
8. Eberhart R. C. and Shi Y. (2001): “Particle swarm optimization: Developments, Applications
and Resources”, IEEE Proceedings of the Congress on Evolutionary Computation, 27-30.
9. Eberhart R.C., Simpson P. and Dobbins R. (1996): “Computational Intelligence PC Tools”,
A Book of Intelligent Systems, Academic Press.
26
10. Fagbola Temitayo Matthew, Babatunde Ronke Seyi and Oyeleye Akinwale (2013). Image
Clustering Using a Hybrid GA-FCM Algorithm. International Journal of Engineering and
Technology, UK, 3(2): pp 99-107.
11. Fagbola Temitayo, Olabiyisi Stephen and Adigun Abimbola (2012): “Hybrid GA-SVM for
Efficient Feature Selection in E-mail Classification”, Computer Engineering and Intelligent
Systems, 3(3): 17-28.
12. Fazel Keshtkar and Wail Gueaieb (2006). Segmentation of Dental Radiographs using a
Swarm Intelligence Approach, in proceedings of IEEE Canadian Conference on Electrical
and Computer Engineering, Ottawa, Canada, pp. 328– 331. DOI:
10.1109/CCECE.2006.277656.
13. Gursharan Saini and Harpreet Kaur (2014): “A Novel Approach towards K-Mean Clustering
Algorithm with PSO”, International Journal of Computer Science and Information
Technologies, 5 (4), 5978-5986.
14. Hai Shen, Yunlong Zhu, Li Jin and Zhu Zhu (2010). Hybridization of Particle Swarm
Optimization with the K-Means Algorithm for Clustering Analysis. pp. 531-535, 978-1-4244-
6439-5/10/IEEE.
15. Isabelle Guyon (2008). Practical Feature Selection: from Correlation to Causality. Pattern
Recognition. Letter, 28(12):1438–1444.
16. Jagdeep Kaur and Jatinder Singh Bal (2017). A Study of Particle Swarm Optimization based
K-means Clustering for Detection of Dental Caries in Dental X-ray Images, International
Journal of Advanced Research in Computer Science, 8(4).
17. Jenn-Long, Yu-Tzu H. and Chih-Lung, G. (2012): Mining Student Behavior Models in
Learning-by-Teaching Environments. In Proceedings of the 1st International Conference on
Educational Data Mining, 127-136.
18. Jiawei Han and Micheline Kamber (2006): “Data Mining Concepts and Techniques”, Morgan
Kaufmann Publishers In; 2nd Revised edition edition.
19. Jiming Peng and Yu Xia (2005). A Cutting Algorithm for the Minimum Sum-of-Squared
Error Clustering. In Proceedings of the 2005 SIAM International Conference on Data
Mining, Society for Industrial and Applied Mathematics, ISBN: 978-0—89871-593-4
20. Joshi A., and Kaur R. (2013): A review: Comparative Study of Various Clustering Techniques
in Data Mining. International Journal of Advanced Research in Computer Science and
Software Engineering, 3(3), 55-57.
21. Karami A. and Guerrero-Zapata M. (2015). “A Fuzzy Anomaly Detection System based on
Hybrid PSO-K-means Algorithm in Content-centric Networks,” Neurocomputing, Vol. 149,
pp. 1253–1269.
22. Kennedy J. and Eberhart R.C. (1995): “Particle Swarm Optimization”, Proceedings IEEE
International Conference on Neural Networks, IV, p. 1942-1948.
23. Koay C. A. and D. Srinivasan (2003). “Particle Swarm Optimization-based Approach for
Generator Maintenance Scheduling,” in Proceedings of the IEEE Swarm Intelligence
Symposium (SIS ’03), pp. 167–173, IEEE.
24. Levent Bolelli, Ertekin Seyda, Zhou Ding and Clyde Lee (2007): A Clustering Method for
Web Data with Multi-type Interrelated Components. In: Proceedings of the International
Conference on the World Wide Web,1121-1122.
http://doi.acm.org/10.1145/1242572.1242725.
27
25. Li Zheng, Lei Tao, Yue Yin and Jin Ding (2015): “A Framework for Hierarchical Ensemble
Clustering”, ACM Transactions on Knowledge Discovery from Data (ACM TKDD), 9(2): 9-
16.
26. Ling Wang, Liefeng Bo, Licheng Jiao (2012): “A Modified K-means Clustering with a
Density-Sensitive Distance Metric”, Technical report, University of California, Department
of Information and Computer Science, Ir-vine, CA.
27. Mary C. Immaculate and Raja Kasmir (2009): “A Modified Ant-based Clustering for Medical
Data”, International Journal on Computer Science and Engineering, 2(7), 2253-2257.
28. Min Wei, Tommy W. S. Chow and Rosa H. M. Chan (2015): “Clustering Heterogeneous Data
with K-means by Mutual Information-Based Unsupervised Feature Transformation”, Entropy
2015, 17(3), 1535-1548; doi:10.3390/e17031535.
29. Ming-Chuan Hung, Jungpin Wu, Jin-Hua Chang and Don-Lin Yang (2005): “An Efficient K-
means Clustering Algorithm Using Simple Partitioning”, Journal of Information Science and
Engineering, 21, 1157-1177.
30. Mohammed T. H. Elbatta and Wesam M. Ashou (2013): “A Dynamic Method for Discovering
Density Varied Clusters”, International Journal of Signal Processing, Image Processing and
Pattern Recognition, 6(1), 123-134.
31. Mohammed Tiri, Pavlik, P., Cen, H., Wu, L. and Koedinger, K. (2009): Using Item-type
Performance Covariance to Improve the Skill Model of an Existing Tutor. In Proceedings of
the 1st International Conference on Educational Data Mining: 77-86.
32. Momin, B.F. and Yelmar, P.M. (2012). Modifications in K-means Clustering Algorithm',
International Journal of Soft Computing and Engineering, 2(3), pp.6297-6316.
33. Nasser S., Alkhaldi R. and Vert G. (2004): Semi-supervised learning literature survey,
University of Wisconsin-Madison.
34. Neelamadhab Padhy, Pragnyaban Mishra and Rasmita Panigrahi (2012): The Survey of Data
Mining Applications and Feature Scope. CoRR abs/1211.5723.
35. Nidhi Gupta and Ujjwal R. L. (2013), "An Efficient Incremental Clustering Algorithm" in
World of Computer Science and Information Technology Journal (WCSIT), 3(5),97-99.
36. Olaleye Oludare, Olabiyisi Stephen, Olaniyan Ayodele and Fagbola Temitayo (2014): “An
Optimized Feature Selection Technique for Email Classification”, International Journal of
Scientific and Technology Research, 3(10): 286-293.
37. Oloyede Ayodele, Fagbola Temitayo, Olabiyisi Stephen, Omidiora Elijah and Oladosu John
(2016): Development of a Modified Local Binary Pattern-Gabor Wavelet Transform Aging
Invariant Face Recognition System. In Proceedings of ACM International Conference on
Computing Research & Innovations, University of Ibadan, Nigeria, pp. 108-114, 7-9
September 2016.
38. Oludayo O. Olugbara, Emmanuel Adetiba and Stanley A. Oyewole (2015). Pixel Intensity
Clustering Algorithm for Multilevel Image Segmentation, Mathematical Problems in
Engineering, Volume 2015, pp. 1-19, http://dx.doi.org/10.1155/2015/649802.
39. Pudil P., Ferri F. J., Novovicova J. and Kittler J (1994). Floating Search Methods for Feature
Selection with Nonmonotonic Criterion Functions. Proceedings of the 12th IAPR International
Conference on Computer Vision and Image Processing, Pattern Recognition Letters, volume
15(11), pp. 1119-1125.
40. Qiang Niu and Xinjian Huang (2011): “An improved fuzzy C-means clustering algorithm
based on PSO”, Journal of Software. 6(5), 873-879.
28
41. Qinghai Bai (2010). “Analysis of Particle Swarm Optimization Algorithm”, Computer and
Information Science (CCSE), 3(1), 180-184.
42. Rand W. M. (1971). Objective Criteria for the Evaluation of Clustering Methods. Journal of
the American Statistical Association, 66:846-850.
43. Rauber (2000): Educational Data Mining: A Survey from 1995 to 1999. Expert Systems with
Applications; 33; 125-146.
44. Shanmugapriya B. and Punithavalli M. (2012): “A Modified Projected K-means Clustering
Algorithm with Effective Distance Measure”, International Journal of Computer Applications
44(8):32-36.
45. Sharfuddin Mahmood, Mohammad Saiedur Rahaman, Dip Nandi, Mashiour Rahmann
(2015): “A Proposed Modification of K-means Algorithm”, IJMECS, 7(6), 37-42.
46. Shinde P. V. and Gunjal B. L. (2012): “Particle Swarm Optimization - Best Feature Selection
method for Face Images”, International Journal of Scientific & Engineering Research, 3(8):
1-5.
47. Siedlecki W. and Sklansky J. (1988). “On Automatic Feature Selection”, Int. J. Patt. Recog.
Art. Intell. 2(2): 197-220.
48. Su M.C. and Chou C.H. (2001). A Modified Version of the K-means Algorithm with a
Distance based on Cluster Symmetry. IEEE Trans. Pattern Anal. Machine Intel. 23 (6), 674–
680.
49. Sun J., W. B. Xu and B. Ye (2006). "Quantum-behaved particle swarm optimization clustering
algorithm," Lecture Notes in Computer Science, Vol.4093, pp. 340-347.
50. Twinkle G., Lofter F. and Arun M. (2014): Survey on various enhanced K-means Algorithms,
International Journal of Advanced Research in Computer and Communication Engineering,
3(2):43-61.
51. Vaishali R. Patel and Rupa G. Mehta (2011). Impact of Outlier Removal and Normalization
Approach in Modified k-Means Clustering Algorithm, IJCSI International Journal of
Computer Science Issues, 8(5):2, pp. 331-336.
52. Xia W., Z. Wu, W. Zhang, and G. Yang (2004). “A New Hybrid Optimization Algorithm for
the Job-Shop Scheduling Problem,” in Proceedings of the American Control Conference
(AAC ’04), pp. 5552–5557, IEEE, Boston, Mass, USA.
53. Xiaoyan Wang and Yanping Bai (2016). A Modified MinMax 𝑘-Means Algorithm Based on
PSO. Computational Intelligence and Neuroscience, pp. 1-13,
http://dx.doi.org/10.1155/2016/4606384.
54. Zhang H. and Sun G. (2006): “Feature Selection Using Tabu Search Method,” Pattern
Recognition, 35(3): pp. 701-711.
55. Zhou D., Bousquet O., Lal T.N., Weston J. and Scholkopf B. (2004): Learning with Local and
Global Consistency. In: Thrun, S., Saul, L., Scholkopf B, Eds., Advances in Neural
Information Processing Systems 16. MIT Press, Cambridge, MA, USA, 321-328.