European Journal of Scientific Research
ISSN 1450-216X / 1450-202X Vol. 151 No 1 December, 2018, pp. 5-21
http://www. europeanjournalofscientificresearch.com
An Improved Partitioned Clustering Technique for Identifying
Optimum Number of Dissimilar Groups in Multimedia Dataset
Sreedhar Kumar S
Corresponding Author, Department of CSE
Dr.T.Thimmaiah Institute of Technology, KGF
Karnataka-563120, India
E-mail: [email protected]
Tel: +91-9538293406
Madheswaran M
Department of ECE, Mahendra Engineering College
Namakkal-637503, Tamilnadu, India; 2
E-mail: [email protected]
Abstract
This paper presents an improved partitioned clustering technique called Optimum
N-Means (ONM), aims to robotically identify finest number of dissimilar groups in a large
multimedia (gray scale image) dataset based on distinct number of centroid objects for
deeper data elements enrichment without predetermined number of clusters. It involves
two stages, in the first stage, a method Search Distinct Centroid Objects (SDCO) is
introduced and it inevitably identifies optimum number of centroid objects in the input
image dataset based on rate of object repetition in the image dataset. After, it divides the
image dataset into appropriate number of unrelated clusters based on distinct centroid
objects which is obtained by SDCO stage and finally it follows to validate the clustering
result by cluster validation scheme. Experimental results show that the ONM technique is
better and efficient for inevitably identifying the optimum number of dissimilar clusters
over the grey scale image with higher intra thickness and lesser intra separation compared
to existing K-Means technique.
Keywords: Centroid Object, Distinct Clusters, Optimum N-Means, Intra Thickness, Intra
Separation, Multimedia Spatial Dataset, Search Distinct Centroid Objects.
1. Introduction Clustering technique is the process of partitioning a volume of dataset into distinct number of
dissimilar groups for deeper investigation and analysis [1-6]. In general, the unsupervised clustering
schemes are classified into two major categories: Agglomerative and Partition. The agglomerative
technique continuously splits the dataset into smaller clusters until each cluster consists of a single data
element based on sequence of merging operation. K-Means technique is a well-known partition-
clustering technique and is an iterative procedure that directly decomposes the dataset into a number of
disjoint clusters by minimizing the criterion function (e.g., sum-of-square-error) [7-8].
Generally, the K-Means scheme is an back bone of varies data analysis research fields namely
data mining, medical image processing, big data analysis, machine learning etc., for helping to identify
distinct patterns over the volume of real time observations for deeper investigation. Since, there is a
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 6
limitation in the K-Means technique is that it failed to automatically identify the optimum number of
distinct clusters over the large volume of dataset and the entire result quality is based on the number of
different centroid objects in the dataset, which could be predetermined indiscriminately by the user [9-
12]. To overcome this, in this paper, an enhanced partitioned clustering scheme called ONM is
presented. It intends to separate the large multimedia dataset into discrete clusters by automatically
identifying distinct number of centroid objects on the dataset based on SDCO method.
2. Previous Research In the recent decade, few techniques have been reported to solve the specific issues in the traditional K-
Means technique. Bin Zhang et al reported in [13] a clustering technique called K-Harmonic Means
(KHM). It is a center based clustering technique that followed the harmonic averages of the distances
from each data object to the centers as components. They claimed this KHM technique is insensitive to
initialize the centers and it has improved the quality of clustering results compared to k-means and
Expectation (EM) techniques. The drawback in the KHM technique is that it suffered from being
trapped into local optima.
Wei-Chang et al. [14] reported a hybrid technique namely RMSSOKHM based on KHM and
Modified Simplified Swarm Optimization (MSSO). It consists of two strategy’s Rapid Centralized
Strategy (RCS) and Minimum Movement Strategy (MMS). The RCS is used to increase the
convergence speed and the MMS to search better solutions without trapping in local optima. In
addition the RMSSOKHM technique is followed by the Taguchi method that is used to optimize the
parameter setting. They described that the RMSSOKHM was produced outstanding result compared to
KHM, PSOKHM and IGSAKHM techniques.
Anup Bhattacharya et al. [15] reported a k-means++ seeding technique that is used to find the
initial k centers over the any multi-dimensional dataset. This technique uses sampling procedure to
pick the centroid data point on the dataset. Initially, it chooses the first centroid data point randomly on
the dataset then it pick the data point to be the thi centroid with probability proportional to the square
of the Euclidean distance of this data point which is close to the previously chosen centroid. The
authors Faliu and Inkyu [16] have presented another extended K-Means scheme that uses to adjust
number of data points in each cluster based on greedy algorithm.
In (Sebastian et al. 2015) [17], the authors have reported a kernel Penalized K-means (KPKM)
technique which is an unsupervised learning technique for embedded feature selection in combination
with Kernel K-means. The scope of this technique is that it minimized the violation compared to the
initial cluster structure while simultaneously penalized the use of features. Another enhanced k-means
scheme was presented by Fabregas et al. in [18] to identify the initial centroid data points based on
computed highest pair and the lowest pair of weighted average mean instead of selecting randomly.
Haiyang Li et al. [19] reported a hybrid technique called Dynamic Particle Swarm Optimization
(DPSOK) to improve the global search capability of k-means clustering. This technique uses two
existing schemes improved by Dynamic Particle Swarm Optimization (DPSO) and k-means clustering.
The calculation methods of its inter-tie weight and learning factors have been improved to ensure that
this technique is keeping an equilibrium optimization capability. They claimed this DPSOK is
produced a better visual result in image segmentation compared to the K-Means and PSOK clustering
techniques. Hence, these above reported existing techniques are failed to automatically identify the
distinct number of clusters on large dataset without predetermine number of clusters centroids. A brief
discussion of methods and steps involved in the proposed ONM approach is presented in the next
section.
7 Sreedhar Kumar S and Madheswaran M
3. Proposed Clustering Approach This section, a detail of the proposed ONM clustering approach is presented. It consists of two stages
SDCO and Clustering. In the SDCO stage, the ONM approach identifies the distinct number of
centroid objects over input image dataset based on rate of repetition of objects in the dataset. The
clustering stage, it partitions the input image dataset into optimum number of dissimilar clusters based
on distinct number of centroid objects. Finally, validate the each individual cluster in cluster set of
ONM approach by improved Effective Cluster Validation Method (ECVM) scheme. The stages are
involved in the ONM approach is discussed in the subsections below.
3.1. SDCO Stage
In this subsection, a detail description of SDCO method is presented and it aims to trace optimum
number of distinct centroid objects over the input image dataset. Initially, the SDCO method divides
the gray-scale image )(I into (2*2) or (3*3) sizes of non-overlapping spatial blocks and subsequently
the image contains n spatial data blocks or data objects with m data pixels or data points and is defined
as ixX = , ifi xx = , Ixi ⊆∀ for ni ,..,2,1= , mf ,..,1,0= where X represents the n spatial data objects of
multimedia gray scale image I , ifx is the thf data point or data pixel in thi data object in X Next, it
measures the rate of repetition of each spatial object )( ixrr over image dataset ixX = , for ni ,..,0= and
is defined in equation (1) as:
−<−
−≥−
>−
<−
≠∈∀∈∀
−=
= =
20
21
,0
1
,,,,
)(
0 0
mxxif
mxxif
Txxif
Txxifwhere
ijXxxxx
xxxrr
jfif
jfif
jfif
jfif
jiiif
n
j
m
f
jfifi (1)
where, jfif xx − denotes the difference between thf data point (pixel) of th
i and thj data object that
belongs to the input multimedia dataset X , n denotes the size of X , m is the number of data pixels
or data points in data object(block) and T is the threshold that limit the similarity distance between th
i and thj objects. In case of the difference between th
i and thj objects is lesser thanT, it means the
data point of thj object is similar to data point of thi object that belongs to the input dataset X . Next, it
finds the distinct number of Centroid Object ( )Y over the input dataset X based on object repetition
)( ixrr and is computed by
{ }
≥
∈∀
==
CCrrifxwhere
rrrrni
rr
Y
ii
ii
l
,,..,1 (2)
Here, irr denotes the frequency of repetition of thi object in X and CC represents the control
centroid threshold that intends to control the choice of increasing and decreasing the distinct number of
centroid objects with maximum occurrence over the dataset X and is determined in lyY = for Nl ,..,1= .
For example, if the CC is determined by lesser number of occurrences and large number of dissimilar
spatial centroid objects would generate. On the other hand, if the CC is defined by higher number of
occurrences and then only a few number of distinct spatial centroid objects would produce.
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 8
3.2. Clustering Stage
In this stage, the ONM approach partitions the input spatial multimedia dataset into optimum number
of discrete clusters based on distinct centroid objects that obtained in the SDCO stage. Primarily, the
clustering stage mapping the each individual spatial object in X over N dissimilar centroid objects in
lyY = for Nl ,..,0= based on Euclidean distance ),( YxD i where, Y denotes the centroid object set and
is defined in Equation (3) as:
∈∈=
= YyXxNl
yxdYxD li
lii ,
,..,1,0
),(),( (3)
where, ),( li yxd represents the Euclidean distance between thi object in Xxi ∈ and th
l centroid object
in centroid object set Yyl ∈ and is computed by
( )
−=
=
21
0
2),(
m
flfifli yxyxd
(4)
Next, it finds that the thi object ix is closed to which centroid objects in distinct centroid
objects lyY = for Nl ,..,2,1= based on minimum Euclidean distance between th
i object )( ix and
centroid objects },..,{ 1 Nyy that expressed in Equation (5)
∈∀=
= ),(),(,..,1,0
),(YxDyxd
Nl
yxdMinc ili
lil
(5)
where, ),( YxD i represents the list of distances between thi spatial object )( ix with N centroid objects
lyY = for Nl ,..,1,0= . Then, the thi object ix is placed into its respective closest thl cluster in cluster
set lcC = and il xc ← . Afterward, it repeats the above clustering steps until to segregate the n objects in
X for ni ,..,1,0= into N distinct clusters lcC = for Nl ,..,1,0= based on different centroid objects. Later,
it modifies the centroid ly ′ of each individual cluster in cluster set lcC = , for Nl ,..,0= based on
standard arithmetic mean operation , where ly′ is the modified centroid of thl cluster in cluster set
C and is defined in Equation (6) as:
∈∀∈∀∈∀=′ = =
CccccccR
y lllililif
m
f
R
ilif
ll
l,,
1
0 0
(6)
Here, lifc denotes the thf data point of thi object in thl cluster in cluster set C and lR is the
number of similar objects in thl cluster lc
for lRi ,..,0= . Finally, the partitioned result of the multimedia
dataset X has obtained in the cluster set C with N distinct clusters. For the visual representation, the
contents (pixels) of each data object (block) Xxx iif ∈∈ in each individual cluster li cx ∈ in the cluster
set Ccl ∈ has heightened based on standard arithmetic operations (addition and subtraction) and
subsequently obtained the improved cluster set Xcl ′∈′∀ in the output gray scale image IX ′→′ , where
I ′ denotes the output clustered gray scale image, X ′ represents the improved spatial image object
(block) set and lc′ describes the content (spatial similar image blocks) of the cluster is improved. The
stages involved in the ONM approach are presented as an algorithm hereunder.
9 Sreedhar Kumar S and Madheswaran M
3.3. ONM Algorithm
Input: Input Image Dataset { }nxxX ,...,0= of Original Gray-Scale Image )(I , Control Centroid
}{CC Output: Clustered Gray scale Image )(I ′ and Distinct clusters { }Nccc ,..,, 21
Begin
1. Measures the rate of each object occurrence ( )ixrr in X as described in Equation (1)
2. Select distinct number of centroid objects lyY = for Nl ,..,0= on input dataset X based on
object repetition in )( ixrr as described in Equation (2).
3. Measure the distance of each individual spatial object in X over the N distinct centroids
objects lyY = for Nl ,..,0= based on Euclidean distance as described in Equation (4).
4. iREPEAT −
5. lREPEAT −
6. Computes distance of each individual spatial object in Xxi ∈ with distinct centroids
objects lcC = for Nl ,..,0= using Equation (4).
7. UNTIL Nl >
8. Find the thl closest centroid object of th
i spatial object ix in X with higher similarity using
Equation (5).
9. Place the thi object ix in X in to th
l cluster
10. UNTIL ni >
11. Update the centroid lyY ′=′ of each individual cluster in cluster set lcC =
for Nl ,..,1,0=
using Equation (6).
12. Improve the content of spatial data objects in each individual cluster iif xx ∈ , li cx ∈ in
cluster set Ccl ∈ and Xcl ⊆∀ for Nl ..,1,0= using standard arithmetic operation.
13. Obtain the improved cluster set Xcl ′⊆′∀ in spatial clustered gray scale image XI ′←′ .
End
4. Complexity Analysis This section discusses in detail the computational complexity of the proposed ONM approach. The first
stage in the ONM approach requires time ( )Nn −Ο to automatically identify the distinct number of centroid
objects lyY = for Nl ,..,2,1= over the input image dataset ixX = for ni ,..,2,1= , where X is the input image
dataset with n data objects, ix represents the th
i data object with m data pixels in X and N denotes the
number of centroid objects and n represents the number of objects in sample input image X . The second
stage in the ONM approach consumes time ( )nNΟ to partitions the input image dataset X into optimum
number of distinct clusters lcC = for Nl ,..2,1= , where C describes the distinct clusters that belongs in to
input image dataset X , lc denotes the th
l cluster in the resulting cluster C and N is the number of clusters
in the cluster set C . Overall the proposed clustering scheme (ONM) is required time ))(( nNNnO +− to
partitions the input dataset X into optimum number of N distinct highly relative clusters.
5. Cluster Validation This section, measures the association and divergence among the data objects (blocks) of each
individual cluster in clustering result of image dataset based on Effective Cluster Validation Measure
(ECVM) (Krishnamoorthy 2016; Madheswaran 2017) is presented. The ECVM technique is slightly
enhanced to measure the intra thickness and intra separation between the spatial data objects or data
blocks (set of pixels) in each individual cluster in the clustered gray scale image of unsupervised
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 10
clustering scheme. The validation method contains two measures; Intra Association and Intra
Divergence. The two validation measures are described in the following subsections.
5.1. Intra Association
This method estimate the intra thickness among the spatial data objects in each individual cluster in
cluster set of multimedia (image) dataset and it is expressed in the Equation (7)
= =
)(1
)(1
N
llcIT
NCA
(7)
where, N represents the number of clusters in resulting cluster C for Nl ,..,1= , )( lcIT is the intra
thickness measures of thl individual cluster in cluster set Ccl ∈ and is defined in equation (8).
>′−
<′−
∈∀×
′−
=
=
SLycif
SLycifwhere
ccyCR
cIT
llj
llj
llj
R
j
llj
l
l
l
0
1
,1001
)(1
(8)
Here, ljc denotes th
j object or block (set of pixels) in thl cluster in C , lR represents the size
of the thl cluster in the cluster set C , SL is the similarity limit threshold which preserve the similarity
difference among the objects in cluster lc that determined by user based on nature of spatial dataset
and ly′ is the centroid point of thl cluster that computed by Equation (6).
5.2. Intra Divergence
This method calculates the intra separation between the spatial objects in each individual cluster in the
cluster set C of ONM scheme and is expressed in Equation (9).
= =
N
ilcIS
NCD
0
)(1
)(
(9)
where, )( lcIS is the intra separation measures of thl individual cluster in Ccl ∈ and is defined in
equation (10)
<′−
>′−
∈∀×
′−
=
=
SLycif
SLycifwhere
ccycR
cIS
llj
llj
llj
R
j
llj
l
l
l
0
1
,1001
)(0
(10)
here, ljc denotes th
j object or block (set of pixels) in thl cluster in C , SL is the similarity limit
threshold which preserve the lower similarity objects in cluster lc and ly′ is the centroid point of thl
cluster that belongs to C .
6. Results and Discussion The performance of the proposed ONM technique, experimented on multimedia gray scale image is
presented in this section. This multimedia dataset consists of 100 2-D gray scale images with different
sizes such as (256*256), (240*240), (480*480), respectively and the gray values in the range 0-255. A
subset of this dataset containing eleven sample standard images via Lena, Airplane, Fruit, House,
Tree, Gems, Milk, Baboon, Cameraman, Texture and Clock are reported as representative in this
subsection, as they are used in many research experiments as reported in [22-25].
11 Sreedhar Kumar S and Madheswaran M
Figure 1: Original Gray Scale Images
(a) (b) (c)
(d) (e) (f)
(g) (h)
(i) (j) (k)
Table 1: Distinct Clusters identified by ONM Scheme when (CC=75) over the sample grey scale images in
Figure 1
Multimedia Datasets Result of ONM with (CC=75)
N-Centroid Objects Number of Clusters Identified (N)
Lena 58 58
Airplane 44 44
Fruit 49 49
House 33 33
Tree 54 54
Gems 33 33
Milk 49 49
Baboon 65 65
Cameraman 36 36
Texture 50 50
Clock 47 47
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 12
Figure 2: Clustering result of proposed ONM scheme with (CC=75) identified dissimilar clusters on eleven
sample gray scale images shown in Figure 1
(a) (b) (c)
(d) (e) (f)
(g) (h)
(i) (j) (k)
In this experiment, each block of size (2*2) and (3*3) are considered as an object and hence
each sample image contains 16384, 14400, 14400, 14400, 14400, 14400, 14400 and 25600, 16384,
16384 and 16384 objects respectively. The detail description of the eight sample image datasets as
illustrated in Figure 1. The investigation of optimum number of distinct centroid objects over input
dataset as described in subsection (SNDCO stage). The SNDCO method is inevitably traced the finest
number of dissimilar centroid objects of eleven multimedia datasets based on threshold (CC=75) and
the results are obtained as 58, 44, 49, 33, 54, 33, 49, 65, 36, 50, 47 respectively in Table 1. Then the
clustering process is followed and partitioned the dataset into finest number of distinct clusters based
on distance metric as described in subsection (3.2). In the case of sample multimedia datasets are
illustrated in Figure 1, the ONM clustering scheme could produce the result in 58, 44, 49, 33, 54, 33,
13 Sreedhar Kumar S and Madheswaran M
49, 65, 36, 50 and 47 clusters. These results are incorporated in the Table 1. Subsequently, each
individual cluster substances in the results are improved by simple arithmetic operations and visually
demonstrated in Figure 2.
Table 2: Result of Intra Cluster Validation Obtained with ECVM Scheme on Result of ONM Approach when
(CC=75)
Multimedia
Datasets
Number of
Clusters
Intra Thickness Measure Among Each
Cluster )( lcIT in %
Intra Separation Measure Among
Each Cluster )( lcIS in %
Lena 58
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 76.77, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
89.54, 100, 100, 100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
23.28, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 10.45, 0.0, 0.0,
0.0, 0.0,
Airplane 43
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 48.117, 100, 100, 100,
100, 100, 100, 100, 100, 100, 98.251
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 51.88, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.74
Fruit 49
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 58.13, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 90.82, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 92.22,
100, 100, 100, 100, 100
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.86,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 9.17, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 7.77, 0.0, 0.0, 0.0, 0.0, 0.0
House 33
83.05, 71.29, 100, 100, 75.07, 100, 71.05, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 67.62, 100, 100, 100, 100,
100, 67.62, 100, 100, 65.23, 100, 100,
16.94, 28.90, 0.0, 0.0, 24.92, 0.0, 28.94,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 32.37, 0.0, 0.0,
0.0, 0.0, 0.0, 67.62, 0.0, 0.0, 34.7, 0.0,
0.0,
Tree 54
100, 100, 100, 100, 100, 100, 100, 87.79, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.20,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0,
Gems 33
66.75, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 70.55, 71.61,
100, 100, 100, 66.30, 100, 100, 87.27
33.24, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 29.44,
28.38, 0.0, 0.0, 0.0, 33.69, 0.0, 0.0,
12.72
Milk 49
100, 95.75, 100, 100, 74.45, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 74.44, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 59.25, 85.0, 100, 100, 100,
100, 62.33, 100, 100,
0.0, 4.24, 0.0, 0.0, 25.54, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 25.55, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.74,
15.0, 0.0, 0.0, 0.0, 0.0, 37.66, 0.0, 0.0,
Baboon 65
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 62.26, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 81.61
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 37.73, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 18.35
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 14
Multimedia
Datasets
Number of
Clusters
Intra Thickness Measure Among Each
Cluster )( lcIT in %
Intra Separation Measure Among
Each Cluster )( lcIS in %
Cameraman 36
100, 100, 100, 100, 100, 100, 100, 100, 76.132,
100, 100, 100, 100, 100, 77.33, 100, 100, 100,
90.75, 100, 100, 100, 100, 83.20, 100, 100,
100, 100, 90.37, 85.78, 96.103, 100, 100, 100,
100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
76.132, 0.0, 0.0, 0.0, 0.0, 0.0, 22.66,
0.0, 0.0, 0.0, 9.411, 0.0, 0.0, 0.0, 0.0,
16.79, 0.0, 0.0, 0.0, 0.0, 9.62, 14.21,
3.89, 0.0, 0.0, 0.0, 0.0, 0.0,
Texture 50
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 57.90, 100, 100, 100, 100,
100, 100, 100, 100, 73.08,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 42.09, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 26.91,
Clock 47
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 85.98, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 96.55,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 14.012, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 3.44,
Table 3: Result of Overall Cluster Validation Obtained with ECVM Scheme on Cluster Set of ONM
Approach when (MO=75)
Multimedia Datasets
Result of Cluster Validation
Number of Clusters
Identified )(N Intra Association )(CA in
(%)
Intra Divergence )(CD in
(%)
Lena 58 98.41 1.5
Airplane 44 98.78 1.21
Fruit 49 98.79 1.20
House 34 94.94 5.05
Tree 54 99.7 0.22
Gems 33 95.83 4.16
Milk 49 96.96 3.035
Baboon 65 98.9 1.00
Cameraman 36 97.2 2.7
Texture 50 98.6 1.38
Clock 47 99.6 0.31
The cluster validation of result of ONM approach is measured by ECVM scheme as described in equations (6) and (9).
Initially, the intra thickness ))(( lcIT and intra separation ))(( lcIS is calculated among the data blocks in each individual
cluster in the clustering results of eleven multimedia (image) datasets based on improved ECVM technique and the
calculated results are incorporated in Table 2. Subsequently, the proposed ONM scheme could give overall intra
association )(CA of 98.41, 98.78, 98.79, 94.94, 99.7, 95.83, 96.96 and 98.9 for the clustered results of gray scale images
Lena, Airplane, Fruit, House, Tree, Gems, Milk, Baboon, Cameraman, Texture, Clock, respectively and the valued results
are obtained in Table 3. Next, the overall intra divergence )(CD is also calculated on the results of same image datasets as
expressed in Equation (9) and found to be 0.5, 1.21, 1.20, 5.05, 0.22, 4.16, 3.035, 1.0, 2.7, 1.38 and 0.31 for the same
datasets. These results are presented in Table 3.
Table 4: Result of ONM Scheme Identified Dissimilar Clusters on Image Dataset when (CC=50)
Multimedia Datasets Result of ONM with (CC=50)
N- Centroid Objects Number of Clusters Identified )(N
Lena 60 60
Airplane 44 44
Fruit 64 64
House 42 42
Tree 59 59
15 Sreedhar Kumar S and Madheswaran M
Multimedia Datasets Result of ONM with (CC=50)
N- Centroid Objects Number of Clusters Identified )(N
Gems 49 49
Milk 66 66
Baboon 73 73
Cameraman 42 42
Texture 54 54
Clock 48 48
Table 5: Intra Thickness Validation Obtained with ECVM Scheme on Cluster Set of ONM Approach when
(CC=50)
Multimedia
Datasets
Number of Clusters
Identified
Intra Thickness Measure Among Each
Cluster )( lcIT in %
Intra Separation Measure Among
Each Cluster )( lcIS in %
Lena 60 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 76.717, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 89.54, 100, 100,
100, 100, 100, 100, 100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
23.28, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 10.45, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0,
Airplane 44 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
48.117, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 98.251
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 51.88, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
1.74
Fruit 64 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 66.43, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 33.56, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
House 42 100, 80.80, 100, 100, 75.70, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 67.62, 100, 100, 100,
100, 100, 100, 100, 80.13, 100, 100, 100,
100, 100, 100,
0.0, 19.2, 0.0, 0.0, 24.92, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 32.37, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 19.86, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
Tree 59 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 90.90, 100, 100, 100,
100, 100, 100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 9.09, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
Gems 49 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 85.84,
100, 92.15, 100, 100, 100, 100, 100,
100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 76.717,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 14.15, 0.0, 7.84, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
Milk 66 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 16
Multimedia
Datasets
Number of Clusters
Identified
Intra Thickness Measure Among Each
Cluster )( lcIT in %
Intra Separation Measure Among
Each Cluster )( lcIS in %
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100,
78.531, 97.84, 100, 77.51, 95.12, 100,
100 , 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 21.46, 2.150, 0.0, 22.5,
4.807, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
Baboon 73 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 89.54, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100,
81.94, 57.89, 100
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 89.54, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 18.05, 42.105, 0.0
Cameraman 42 100, 100, 100, 100, 100, 100, 100, 100,
76.132, 100, 100, 100, 100, 100, 100,
100, 100, 100, 93.75, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 48.117, 100, 100, 100, 100, 100,
100, 100, 100, 100,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
23.86, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 6.25, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
48.117, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0,
Texture 54 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 64.28, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 75.15
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 35.71, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 24.84
Clock 48 100, 100, 100, 100, 100, 100, 100, 100,
100, 100,100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 85.98,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100,
98.97,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 14.012, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 1.02,
Table 6: Performance Measures of Overall Intra Cluster Validation Obtained with ECVM Technique
Estimated on Result of ONM Scheme when (CC=50)
Multimedia
Datasets
Number of Clusters
Identified (N)
Result of Overall Cluster Validation in (%)
Intra Association )(CA Intra Divergence )(CD
Lena 60 99.41 0.5
Airplane 44 98.78 1.21
Fruit 64 99.4 0.52
House 42 97.11 2.88
Tree 59 99.84 0.15
Gems 49 99.5 0.44
Milk 66 99.2 0.771
Baboon 73 99.13 0.86
Cameraman 42 99.2 0.71
Texture 54 98.87 1.121
Clock 48 99.6 0.37
17 Sreedhar Kumar S and Madheswaran M
The experimentation is extended and we tested the proposed clustering scheme with another
control centroid (CC=50) over the same eleven image datasets which presented in the Figure 1. The
SDCO method has identified N distinct centroid objects over the eleven multimedia datasets are 60,
44, 64, 42, 59, 49, 66, 73, 42, 54, 48, respectively based on (CC=50). It is indicated in Table 4 that the
proposed clustering scheme has produced optimum number of highly relative clusters are obtained as
60, 44, 64, 42, 59, 49, 66, 73, 42, 54 and 48 in the result based on N centroid objects. Similarly, each
cluster substances are enhanced and visually obtained in Figure 3. Next, these clustering results are
evaluated through the cluster validation measures and the estimated results are presented in Tables 5
and 6. It is clearly demonstrated in Figure 2 and Figure 3 that the control centroid (CC) is acted as a
major key factor in the proposed clustering scheme and it could directly affect the performance of the
clustering result.
Figure 3: Result of ONM with (CC=50) tested on eleven sample gray scale images shown in Figure 1
(a) (b) (c)
(d) (e) (f)
(g) (h)
(i) (j) (k)
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 18
6.1. Comparison with Existing Technique
Comparison purpose, the traditional K-Means scheme [7-9] is implemented and tested on same eleven
multimedia (gray scale images) datasets. Initially, the k-centroid objects of eleven multimedia datasets
are predetermined indiscriminately as 15, 15, 18, 14, 15, 18, 16, 34, 20, 22, 17, respectively and it is
presented in Table 7. Next, the K-Means scheme is followed and divided the image dataset into k
distinct clusters in that order of k-centroid objects and the results are obtained in Table 5.
Subsequently, these same results are illustrated in Figure 4. The performance measure of K-Means
scheme is calculated and the validation results are presented in Table 7. It is clearly indicated in the
Table 5, that the existing technique has identified limited number of distinct clusters over the
multimedia datasets based on external parameters (k) with lower intra association and higher intra
divergence. It is evidenced from the Tables 3, 6 and 7 that the ONM scheme has spontaneously
identified supreme number of separate clusters with higher intra association and lesser intra divergence
over the eleven multimedia (grayscale images) datasets without prior knowledge compared to existing
K-Means scheme. It is clearly demonstrated from the experiment that the ONM scheme has produced
outperformance clustering results compared to existing K-Means technique. All these techniques are
experimented on the Dell/ T4500 machine with 2 GB RAM running windows7.
Table 7: Clustering Result of Existing K-Means Scheme Tested on Eleven Gray Scale Images in Figure 1
Multimedia
Datasets
User Defined K-
Centroids
Number of Clusters
Identified )(k
Result of Cluster Validation in (%)
Intra Association
)(CA
Intra Divergence )(CD
Lena 15 15 83.30 16.69
Airplane 15 15 84.09 15.90
Fruit 18 18 81.34 18.65
House 14 14 80.41 19.58
Tree 15 15 81.02 18.92
Gems 18 18 90.0 10.0
Milk 16 16 86.33 13.66
Baboon 34 34 92.46 7.53
Cameraman 20 20 90.54 9.45
Texture 22 22 91.416 8.58
Clock 17 17 90.45 9.52
19 Sreedhar Kumar S and Madheswaran M
Figure 4: Result of existing K-Means technique tested on eleven sample gray scale images shown in Figure 1
(a) (b) (c)
(d) (e) (f)
(g) (h)
(i) (j) (k)
7. Concluding A simple two stage ONM approach that could produce automatically distinct clusters for a large image
dataset is presented in this paper. In the first stage, the ONM scheme instinctively identifies the
supreme number of centroid objects over the multimedia dataset based on SDCO method. Next stage, a
centroid based clustering processes that could robotically partition the image dataset into finest number
of discrete clusters based on distinct centroid objects which is obtained by SDCO method without prior
knowledge. The individuality of the ONM is the automatic production of finest number of separate
clusters, which is a contradiction to the existing schemes, where it is a user predetermined number of
centroid data points arbitrarily. For the investigation, the ONM clustering scheme is tested on eleven
standard bench mark gray scale images Lena, Airplane, Fruit, House, Tree, Gems, Milk, Baboon,
Cameraman, Texture, Clock, respectively and the clustered results are compared to existing K-Means
technique. The ONM could be better utilized as a pre-process to determine the prime number of
An Improved Partitioned Clustering Technique for Identifying Optimum
Number of Dissimilar Groups in Multimedia Dataset 20
dissimilar clusters and be augmented compared to existing schemes with outstanding result. Hence, the
ONM scheme is consumed little more time compared to existing K-Means technique.
References [1] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann
Publishers, San Francisco, CA, 2006.
[2] D. R. Cutting, D. R. Karger, J. O. Pedersen and J. W. Tukey, “Scatter / Gather : A Cluster-
based approach to Browsing Large Document Collections,” Proc. of the 15th Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval,
1992, pp. 318-329.
[3] Michael Steinbach, George Karypis and Vipin Kumar, “A Comparison of Document Clustering
techniques,” KDD Workshop on Text Mining, 2000, pp. 1-2.
[4] Martin Ester , Alexander Frommelt , Hans-Peter Kriegel and Jöorg Sander, “Spatial Data
Mining: Database primitives Algorithms and Efficient DBMS Support,” Data Mining &
Knowledge Discovery, vol. 4, no. 2-3, pp. 193-216, July 2000.
[5] I. Cadez, P. Smyth and H. Mannik, “Probabilistic modeling of transactional data with
applications to profiling visualization and prediction,” Proc. of 7th
ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Francisco, California, 2001, pp.
37-46.
[6] A. Fogs, W. Warg and O. Zaane, (2001). A non-parametric approach to web log analysis. First
SAMICDM Workshop on Web Mining, Chicago, 41-50.
[7] P.K. Malay, “A Modified k- means Algorithm to avoid empty clusters,” International Journal
of Recent Trends in Engineering, vol. 1, no. 1, pp. 1-8, May 2009.
[8] Alireza Norouzi, Mohd Shafry Mohd Rahim, Ayman Altameen, Tanzila Saba, Abdolvhab
Ehsani Rad, Amjad Rehman and Mueen Uddin, “Medical Image Segmentation Methods,
Algorithms, and Application,” IETE Technical Review, vol. 31, no. 3 pp. 199-213, Jun. 2014.
[9] J. K. Anil, “Data clustering: 50 Years beyond K-means,” Pattern Recognition Letters, vol. 31,
no. 8, pp. 651-666, June. 2010.
[10] A.K. Jain, M,N. Murty and P.J. Flynn, “Data Clustering: A Review,” ACM Computer Surveys,
vol. 31, no. 3, pp. 264-323. Sept. 1999.
[11] Preeti Arora, Deepali and Shipra Varshney, “Analysis of k-means and k-medoids algorithm for
big data,” Procedia Computer Science, vol. 78, pp. 507-512, 2015.
[12] Sepideh Yazdani, Rubiyah Yusof, Alireza Karimian, Mohsen Pashna, and Amirshahram
Hematian, “Image Segmentation Methods and Application in MRI Brain Images,” IETE
Technical Review, vol. 32, no. 6, pp. 1-15, Jul. 2015.
[13] B. Zhang, M. Hsu and U. Dayal, K-harmonic means - a data clustering algorithm, Technical
Report, HPL-1999-124, Hewlett-Packard Laboratories, 1999.
[14] Wei-Chang Yeh, Chy-ming Lai, and Kuei-Hu Chang, “A novel hybrid clustering approach
based on k-harmonic means using robust design,” Neurocomputing, vol. 173, no. P3, pp. 1720-
1732, Jan. 2015.
[15] Anup Bhuttacharya, Ragesh Jaiswal and Nir Ailos, “Tight lower bound instances for k-
means++ in two dimension,” Theoretical Computer Science, vol. 634, no. C, pp. 55-56, June.
2016.
[16] Faliu Yi and Inkyu Moon, “Extended K-Means Algorithm,” Proc. of 5th
International
Conference on Intelligent Human-Machine Systems and Cybernetics, 2013, pp. 263- 219.
[17] Sebastian Maldonnado, Emilio Carrizosa and Richard Weber, “Kernel Penalized k-means: A
feature selection method based on kernel k-means,” Information Science, vol. 322, no. 20, pp.
150-160, Nov. 2015.
21 Sreedhar Kumar S and Madheswaran M
[18] A.C. Fabregas, B. D. Gerardo and B. C. Tanguilig III, “Enhanced Initial Centroids for K-means
Algorithm,” International Journal of Information Technology and Computer Science, vol. 1,
pp. 26-33, Jan. 2017.
[19] Haiyang Li, Hongzhou He and Yongge Wen, “Dynamic Particle Swarm Optimization and k-
means clustering algorithm for image segmentation,” Optik – International Journal for Light
and Electron Optics, vol. 126, no. 24, 4817-4822, Dec. 2015.
[20] R. Krishnamoorthy and S. Sreedhar Kumar, “ An Improved Agglomerative Clustering
Algorithm For Outlier Detection,” Applied Mathematics and Information Science, vol. 10, no.
3, pp. 1141-1154, May 2016.
[21] Madheswaran M and Sreedhar Kumar S, “An Improved Frequency Based Agglomerative
Clustering Algorithm for Detecting Distinct Clusters on Two Dimensional Dataset,” J. Eng.
Technol. Res. (Academic Journal), vol. 9, no. 4, pp. 30-41, Dec. 2017.
[22] http://www.sipi.usc.edu
[23] J.Z.C. Lai and T.J. Huang, “An agglomerative clustering algorithm using a dynamic k-nearest-
neighbor lis,” Information Sciences, vol. 181, no. 9, pp. 1722-1734, May 2011.
[24] Qi Yu, Xumin Liu, Xiangmin Zhon and Andy Song, “Efficient agglomerative hierarchical
clustering,” Expert Systems with Application, vol. 42, no. 5, pp. 2785-2797, Apr. 2015.
[25] Yong Yang and Shuying Huang, “Image Segmentation by Fuzzy C-Means Clustering
Algorithm with a novel penalty term,” Computing and Informatics, vol. 26, no. 1, pp. 17-31,
2007.