+ All Categories
Home > Documents > An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE,...

An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE,...

Date post: 04-Nov-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
17
European Journal of Scientific Research ISSN 1450-216X / 1450-202X Vol. 151 No 1 December, 2018, pp. 5-21 http://www. europeanjournalofscientificresearch.com An Improved Partitioned Clustering Technique for Identifying Optimum Number of Dissimilar Groups in Multimedia Dataset Sreedhar Kumar S Corresponding Author, Department of CSE Dr.T.Thimmaiah Institute of Technology, KGF Karnataka-563120, India E-mail: [email protected] Tel: +91-9538293406 Madheswaran M Department of ECE, Mahendra Engineering College Namakkal-637503, Tamilnadu, India; 2 E-mail: [email protected] Abstract This paper presents an improved partitioned clustering technique called Optimum N-Means (ONM), aims to robotically identify finest number of dissimilar groups in a large multimedia (gray scale image) dataset based on distinct number of centroid objects for deeper data elements enrichment without predetermined number of clusters. It involves two stages, in the first stage, a method Search Distinct Centroid Objects (SDCO) is introduced and it inevitably identifies optimum number of centroid objects in the input image dataset based on rate of object repetition in the image dataset. After, it divides the image dataset into appropriate number of unrelated clusters based on distinct centroid objects which is obtained by SDCO stage and finally it follows to validate the clustering result by cluster validation scheme. Experimental results show that the ONM technique is better and efficient for inevitably identifying the optimum number of dissimilar clusters over the grey scale image with higher intra thickness and lesser intra separation compared to existing K-Means technique. Keywords: Centroid Object, Distinct Clusters, Optimum N-Means, Intra Thickness, Intra Separation, Multimedia Spatial Dataset, Search Distinct Centroid Objects. 1. Introduction Clustering technique is the process of partitioning a volume of dataset into distinct number of dissimilar groups for deeper investigation and analysis [1-6]. In general, the unsupervised clustering schemes are classified into two major categories: Agglomerative and Partition. The agglomerative technique continuously splits the dataset into smaller clusters until each cluster consists of a single data element based on sequence of merging operation. K-Means technique is a well-known partition- clustering technique and is an iterative procedure that directly decomposes the dataset into a number of disjoint clusters by minimizing the criterion function (e.g., sum-of-square-error) [7-8]. Generally, the K-Means scheme is an back bone of varies data analysis research fields namely data mining, medical image processing, big data analysis, machine learning etc., for helping to identify distinct patterns over the volume of real time observations for deeper investigation. Since, there is a
Transcript
Page 1: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

European Journal of Scientific Research

ISSN 1450-216X / 1450-202X Vol. 151 No 1 December, 2018, pp. 5-21

http://www. europeanjournalofscientificresearch.com

An Improved Partitioned Clustering Technique for Identifying

Optimum Number of Dissimilar Groups in Multimedia Dataset

Sreedhar Kumar S

Corresponding Author, Department of CSE

Dr.T.Thimmaiah Institute of Technology, KGF

Karnataka-563120, India

E-mail: [email protected]

Tel: +91-9538293406

Madheswaran M

Department of ECE, Mahendra Engineering College

Namakkal-637503, Tamilnadu, India; 2

E-mail: [email protected]

Abstract

This paper presents an improved partitioned clustering technique called Optimum

N-Means (ONM), aims to robotically identify finest number of dissimilar groups in a large

multimedia (gray scale image) dataset based on distinct number of centroid objects for

deeper data elements enrichment without predetermined number of clusters. It involves

two stages, in the first stage, a method Search Distinct Centroid Objects (SDCO) is

introduced and it inevitably identifies optimum number of centroid objects in the input

image dataset based on rate of object repetition in the image dataset. After, it divides the

image dataset into appropriate number of unrelated clusters based on distinct centroid

objects which is obtained by SDCO stage and finally it follows to validate the clustering

result by cluster validation scheme. Experimental results show that the ONM technique is

better and efficient for inevitably identifying the optimum number of dissimilar clusters

over the grey scale image with higher intra thickness and lesser intra separation compared

to existing K-Means technique.

Keywords: Centroid Object, Distinct Clusters, Optimum N-Means, Intra Thickness, Intra

Separation, Multimedia Spatial Dataset, Search Distinct Centroid Objects.

1. Introduction Clustering technique is the process of partitioning a volume of dataset into distinct number of

dissimilar groups for deeper investigation and analysis [1-6]. In general, the unsupervised clustering

schemes are classified into two major categories: Agglomerative and Partition. The agglomerative

technique continuously splits the dataset into smaller clusters until each cluster consists of a single data

element based on sequence of merging operation. K-Means technique is a well-known partition-

clustering technique and is an iterative procedure that directly decomposes the dataset into a number of

disjoint clusters by minimizing the criterion function (e.g., sum-of-square-error) [7-8].

Generally, the K-Means scheme is an back bone of varies data analysis research fields namely

data mining, medical image processing, big data analysis, machine learning etc., for helping to identify

distinct patterns over the volume of real time observations for deeper investigation. Since, there is a

Page 2: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 6

limitation in the K-Means technique is that it failed to automatically identify the optimum number of

distinct clusters over the large volume of dataset and the entire result quality is based on the number of

different centroid objects in the dataset, which could be predetermined indiscriminately by the user [9-

12]. To overcome this, in this paper, an enhanced partitioned clustering scheme called ONM is

presented. It intends to separate the large multimedia dataset into discrete clusters by automatically

identifying distinct number of centroid objects on the dataset based on SDCO method.

2. Previous Research In the recent decade, few techniques have been reported to solve the specific issues in the traditional K-

Means technique. Bin Zhang et al reported in [13] a clustering technique called K-Harmonic Means

(KHM). It is a center based clustering technique that followed the harmonic averages of the distances

from each data object to the centers as components. They claimed this KHM technique is insensitive to

initialize the centers and it has improved the quality of clustering results compared to k-means and

Expectation (EM) techniques. The drawback in the KHM technique is that it suffered from being

trapped into local optima.

Wei-Chang et al. [14] reported a hybrid technique namely RMSSOKHM based on KHM and

Modified Simplified Swarm Optimization (MSSO). It consists of two strategy’s Rapid Centralized

Strategy (RCS) and Minimum Movement Strategy (MMS). The RCS is used to increase the

convergence speed and the MMS to search better solutions without trapping in local optima. In

addition the RMSSOKHM technique is followed by the Taguchi method that is used to optimize the

parameter setting. They described that the RMSSOKHM was produced outstanding result compared to

KHM, PSOKHM and IGSAKHM techniques.

Anup Bhattacharya et al. [15] reported a k-means++ seeding technique that is used to find the

initial k centers over the any multi-dimensional dataset. This technique uses sampling procedure to

pick the centroid data point on the dataset. Initially, it chooses the first centroid data point randomly on

the dataset then it pick the data point to be the thi centroid with probability proportional to the square

of the Euclidean distance of this data point which is close to the previously chosen centroid. The

authors Faliu and Inkyu [16] have presented another extended K-Means scheme that uses to adjust

number of data points in each cluster based on greedy algorithm.

In (Sebastian et al. 2015) [17], the authors have reported a kernel Penalized K-means (KPKM)

technique which is an unsupervised learning technique for embedded feature selection in combination

with Kernel K-means. The scope of this technique is that it minimized the violation compared to the

initial cluster structure while simultaneously penalized the use of features. Another enhanced k-means

scheme was presented by Fabregas et al. in [18] to identify the initial centroid data points based on

computed highest pair and the lowest pair of weighted average mean instead of selecting randomly.

Haiyang Li et al. [19] reported a hybrid technique called Dynamic Particle Swarm Optimization

(DPSOK) to improve the global search capability of k-means clustering. This technique uses two

existing schemes improved by Dynamic Particle Swarm Optimization (DPSO) and k-means clustering.

The calculation methods of its inter-tie weight and learning factors have been improved to ensure that

this technique is keeping an equilibrium optimization capability. They claimed this DPSOK is

produced a better visual result in image segmentation compared to the K-Means and PSOK clustering

techniques. Hence, these above reported existing techniques are failed to automatically identify the

distinct number of clusters on large dataset without predetermine number of clusters centroids. A brief

discussion of methods and steps involved in the proposed ONM approach is presented in the next

section.

Page 3: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

7 Sreedhar Kumar S and Madheswaran M

3. Proposed Clustering Approach This section, a detail of the proposed ONM clustering approach is presented. It consists of two stages

SDCO and Clustering. In the SDCO stage, the ONM approach identifies the distinct number of

centroid objects over input image dataset based on rate of repetition of objects in the dataset. The

clustering stage, it partitions the input image dataset into optimum number of dissimilar clusters based

on distinct number of centroid objects. Finally, validate the each individual cluster in cluster set of

ONM approach by improved Effective Cluster Validation Method (ECVM) scheme. The stages are

involved in the ONM approach is discussed in the subsections below.

3.1. SDCO Stage

In this subsection, a detail description of SDCO method is presented and it aims to trace optimum

number of distinct centroid objects over the input image dataset. Initially, the SDCO method divides

the gray-scale image )(I into (2*2) or (3*3) sizes of non-overlapping spatial blocks and subsequently

the image contains n spatial data blocks or data objects with m data pixels or data points and is defined

as ixX = , ifi xx = , Ixi ⊆∀ for ni ,..,2,1= , mf ,..,1,0= where X represents the n spatial data objects of

multimedia gray scale image I , ifx is the thf data point or data pixel in thi data object in X Next, it

measures the rate of repetition of each spatial object )( ixrr over image dataset ixX = , for ni ,..,0= and

is defined in equation (1) as:

−<−

−≥−

>−

<−

≠∈∀∈∀

−=

= =

20

21

,0

1

,,,,

)(

0 0

mxxif

mxxif

Txxif

Txxifwhere

ijXxxxx

xxxrr

jfif

jfif

jfif

jfif

jiiif

n

j

m

f

jfifi (1)

where, jfif xx − denotes the difference between thf data point (pixel) of th

i and thj data object that

belongs to the input multimedia dataset X , n denotes the size of X , m is the number of data pixels

or data points in data object(block) and T is the threshold that limit the similarity distance between th

i and thj objects. In case of the difference between th

i and thj objects is lesser thanT, it means the

data point of thj object is similar to data point of thi object that belongs to the input dataset X . Next, it

finds the distinct number of Centroid Object ( )Y over the input dataset X based on object repetition

)( ixrr and is computed by

{ }

∈∀

==

CCrrifxwhere

rrrrni

rr

Y

ii

ii

l

,,..,1 (2)

Here, irr denotes the frequency of repetition of thi object in X and CC represents the control

centroid threshold that intends to control the choice of increasing and decreasing the distinct number of

centroid objects with maximum occurrence over the dataset X and is determined in lyY = for Nl ,..,1= .

For example, if the CC is determined by lesser number of occurrences and large number of dissimilar

spatial centroid objects would generate. On the other hand, if the CC is defined by higher number of

occurrences and then only a few number of distinct spatial centroid objects would produce.

Page 4: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 8

3.2. Clustering Stage

In this stage, the ONM approach partitions the input spatial multimedia dataset into optimum number

of discrete clusters based on distinct centroid objects that obtained in the SDCO stage. Primarily, the

clustering stage mapping the each individual spatial object in X over N dissimilar centroid objects in

lyY = for Nl ,..,0= based on Euclidean distance ),( YxD i where, Y denotes the centroid object set and

is defined in Equation (3) as:

∈∈=

= YyXxNl

yxdYxD li

lii ,

,..,1,0

),(),( (3)

where, ),( li yxd represents the Euclidean distance between thi object in Xxi ∈ and th

l centroid object

in centroid object set Yyl ∈ and is computed by

( )

−=

=

21

0

2),(

m

flfifli yxyxd

(4)

Next, it finds that the thi object ix is closed to which centroid objects in distinct centroid

objects lyY = for Nl ,..,2,1= based on minimum Euclidean distance between th

i object )( ix and

centroid objects },..,{ 1 Nyy that expressed in Equation (5)

∈∀=

= ),(),(,..,1,0

),(YxDyxd

Nl

yxdMinc ili

lil

(5)

where, ),( YxD i represents the list of distances between thi spatial object )( ix with N centroid objects

lyY = for Nl ,..,1,0= . Then, the thi object ix is placed into its respective closest thl cluster in cluster

set lcC = and il xc ← . Afterward, it repeats the above clustering steps until to segregate the n objects in

X for ni ,..,1,0= into N distinct clusters lcC = for Nl ,..,1,0= based on different centroid objects. Later,

it modifies the centroid ly ′ of each individual cluster in cluster set lcC = , for Nl ,..,0= based on

standard arithmetic mean operation , where ly′ is the modified centroid of thl cluster in cluster set

C and is defined in Equation (6) as:

∈∀∈∀∈∀=′ = =

CccccccR

y lllililif

m

f

R

ilif

ll

l,,

1

0 0

(6)

Here, lifc denotes the thf data point of thi object in thl cluster in cluster set C and lR is the

number of similar objects in thl cluster lc

for lRi ,..,0= . Finally, the partitioned result of the multimedia

dataset X has obtained in the cluster set C with N distinct clusters. For the visual representation, the

contents (pixels) of each data object (block) Xxx iif ∈∈ in each individual cluster li cx ∈ in the cluster

set Ccl ∈ has heightened based on standard arithmetic operations (addition and subtraction) and

subsequently obtained the improved cluster set Xcl ′∈′∀ in the output gray scale image IX ′→′ , where

I ′ denotes the output clustered gray scale image, X ′ represents the improved spatial image object

(block) set and lc′ describes the content (spatial similar image blocks) of the cluster is improved. The

stages involved in the ONM approach are presented as an algorithm hereunder.

Page 5: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

9 Sreedhar Kumar S and Madheswaran M

3.3. ONM Algorithm

Input: Input Image Dataset { }nxxX ,...,0= of Original Gray-Scale Image )(I , Control Centroid

}{CC Output: Clustered Gray scale Image )(I ′ and Distinct clusters { }Nccc ,..,, 21

Begin

1. Measures the rate of each object occurrence ( )ixrr in X as described in Equation (1)

2. Select distinct number of centroid objects lyY = for Nl ,..,0= on input dataset X based on

object repetition in )( ixrr as described in Equation (2).

3. Measure the distance of each individual spatial object in X over the N distinct centroids

objects lyY = for Nl ,..,0= based on Euclidean distance as described in Equation (4).

4. iREPEAT −

5. lREPEAT −

6. Computes distance of each individual spatial object in Xxi ∈ with distinct centroids

objects lcC = for Nl ,..,0= using Equation (4).

7. UNTIL Nl >

8. Find the thl closest centroid object of th

i spatial object ix in X with higher similarity using

Equation (5).

9. Place the thi object ix in X in to th

l cluster

10. UNTIL ni >

11. Update the centroid lyY ′=′ of each individual cluster in cluster set lcC =

for Nl ,..,1,0=

using Equation (6).

12. Improve the content of spatial data objects in each individual cluster iif xx ∈ , li cx ∈ in

cluster set Ccl ∈ and Xcl ⊆∀ for Nl ..,1,0= using standard arithmetic operation.

13. Obtain the improved cluster set Xcl ′⊆′∀ in spatial clustered gray scale image XI ′←′ .

End

4. Complexity Analysis This section discusses in detail the computational complexity of the proposed ONM approach. The first

stage in the ONM approach requires time ( )Nn −Ο to automatically identify the distinct number of centroid

objects lyY = for Nl ,..,2,1= over the input image dataset ixX = for ni ,..,2,1= , where X is the input image

dataset with n data objects, ix represents the th

i data object with m data pixels in X and N denotes the

number of centroid objects and n represents the number of objects in sample input image X . The second

stage in the ONM approach consumes time ( )nNΟ to partitions the input image dataset X into optimum

number of distinct clusters lcC = for Nl ,..2,1= , where C describes the distinct clusters that belongs in to

input image dataset X , lc denotes the th

l cluster in the resulting cluster C and N is the number of clusters

in the cluster set C . Overall the proposed clustering scheme (ONM) is required time ))(( nNNnO +− to

partitions the input dataset X into optimum number of N distinct highly relative clusters.

5. Cluster Validation This section, measures the association and divergence among the data objects (blocks) of each

individual cluster in clustering result of image dataset based on Effective Cluster Validation Measure

(ECVM) (Krishnamoorthy 2016; Madheswaran 2017) is presented. The ECVM technique is slightly

enhanced to measure the intra thickness and intra separation between the spatial data objects or data

blocks (set of pixels) in each individual cluster in the clustered gray scale image of unsupervised

Page 6: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 10

clustering scheme. The validation method contains two measures; Intra Association and Intra

Divergence. The two validation measures are described in the following subsections.

5.1. Intra Association

This method estimate the intra thickness among the spatial data objects in each individual cluster in

cluster set of multimedia (image) dataset and it is expressed in the Equation (7)

= =

)(1

)(1

N

llcIT

NCA

(7)

where, N represents the number of clusters in resulting cluster C for Nl ,..,1= , )( lcIT is the intra

thickness measures of thl individual cluster in cluster set Ccl ∈ and is defined in equation (8).

>′−

<′−

∈∀×

′−

=

=

SLycif

SLycifwhere

ccyCR

cIT

llj

llj

llj

R

j

llj

l

l

l

0

1

,1001

)(1

(8)

Here, ljc denotes th

j object or block (set of pixels) in thl cluster in C , lR represents the size

of the thl cluster in the cluster set C , SL is the similarity limit threshold which preserve the similarity

difference among the objects in cluster lc that determined by user based on nature of spatial dataset

and ly′ is the centroid point of thl cluster that computed by Equation (6).

5.2. Intra Divergence

This method calculates the intra separation between the spatial objects in each individual cluster in the

cluster set C of ONM scheme and is expressed in Equation (9).

= =

N

ilcIS

NCD

0

)(1

)(

(9)

where, )( lcIS is the intra separation measures of thl individual cluster in Ccl ∈ and is defined in

equation (10)

<′−

>′−

∈∀×

′−

=

=

SLycif

SLycifwhere

ccycR

cIS

llj

llj

llj

R

j

llj

l

l

l

0

1

,1001

)(0

(10)

here, ljc denotes th

j object or block (set of pixels) in thl cluster in C , SL is the similarity limit

threshold which preserve the lower similarity objects in cluster lc and ly′ is the centroid point of thl

cluster that belongs to C .

6. Results and Discussion The performance of the proposed ONM technique, experimented on multimedia gray scale image is

presented in this section. This multimedia dataset consists of 100 2-D gray scale images with different

sizes such as (256*256), (240*240), (480*480), respectively and the gray values in the range 0-255. A

subset of this dataset containing eleven sample standard images via Lena, Airplane, Fruit, House,

Tree, Gems, Milk, Baboon, Cameraman, Texture and Clock are reported as representative in this

subsection, as they are used in many research experiments as reported in [22-25].

Page 7: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

11 Sreedhar Kumar S and Madheswaran M

Figure 1: Original Gray Scale Images

(a) (b) (c)

(d) (e) (f)

(g) (h)

(i) (j) (k)

Table 1: Distinct Clusters identified by ONM Scheme when (CC=75) over the sample grey scale images in

Figure 1

Multimedia Datasets Result of ONM with (CC=75)

N-Centroid Objects Number of Clusters Identified (N)

Lena 58 58

Airplane 44 44

Fruit 49 49

House 33 33

Tree 54 54

Gems 33 33

Milk 49 49

Baboon 65 65

Cameraman 36 36

Texture 50 50

Clock 47 47

Page 8: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 12

Figure 2: Clustering result of proposed ONM scheme with (CC=75) identified dissimilar clusters on eleven

sample gray scale images shown in Figure 1

(a) (b) (c)

(d) (e) (f)

(g) (h)

(i) (j) (k)

In this experiment, each block of size (2*2) and (3*3) are considered as an object and hence

each sample image contains 16384, 14400, 14400, 14400, 14400, 14400, 14400 and 25600, 16384,

16384 and 16384 objects respectively. The detail description of the eight sample image datasets as

illustrated in Figure 1. The investigation of optimum number of distinct centroid objects over input

dataset as described in subsection (SNDCO stage). The SNDCO method is inevitably traced the finest

number of dissimilar centroid objects of eleven multimedia datasets based on threshold (CC=75) and

the results are obtained as 58, 44, 49, 33, 54, 33, 49, 65, 36, 50, 47 respectively in Table 1. Then the

clustering process is followed and partitioned the dataset into finest number of distinct clusters based

on distance metric as described in subsection (3.2). In the case of sample multimedia datasets are

illustrated in Figure 1, the ONM clustering scheme could produce the result in 58, 44, 49, 33, 54, 33,

Page 9: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

13 Sreedhar Kumar S and Madheswaran M

49, 65, 36, 50 and 47 clusters. These results are incorporated in the Table 1. Subsequently, each

individual cluster substances in the results are improved by simple arithmetic operations and visually

demonstrated in Figure 2.

Table 2: Result of Intra Cluster Validation Obtained with ECVM Scheme on Result of ONM Approach when

(CC=75)

Multimedia

Datasets

Number of

Clusters

Intra Thickness Measure Among Each

Cluster )( lcIT in %

Intra Separation Measure Among

Each Cluster )( lcIS in %

Lena 58

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 76.77, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

89.54, 100, 100, 100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

23.28, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 10.45, 0.0, 0.0,

0.0, 0.0,

Airplane 43

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 48.117, 100, 100, 100,

100, 100, 100, 100, 100, 100, 98.251

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 51.88, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.74

Fruit 49

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 58.13, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 90.82, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 92.22,

100, 100, 100, 100, 100

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.86,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 9.17, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 7.77, 0.0, 0.0, 0.0, 0.0, 0.0

House 33

83.05, 71.29, 100, 100, 75.07, 100, 71.05, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 67.62, 100, 100, 100, 100,

100, 67.62, 100, 100, 65.23, 100, 100,

16.94, 28.90, 0.0, 0.0, 24.92, 0.0, 28.94,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 32.37, 0.0, 0.0,

0.0, 0.0, 0.0, 67.62, 0.0, 0.0, 34.7, 0.0,

0.0,

Tree 54

100, 100, 100, 100, 100, 100, 100, 87.79, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.20,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0,

Gems 33

66.75, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 70.55, 71.61,

100, 100, 100, 66.30, 100, 100, 87.27

33.24, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 29.44,

28.38, 0.0, 0.0, 0.0, 33.69, 0.0, 0.0,

12.72

Milk 49

100, 95.75, 100, 100, 74.45, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 74.44, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 59.25, 85.0, 100, 100, 100,

100, 62.33, 100, 100,

0.0, 4.24, 0.0, 0.0, 25.54, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 25.55, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.74,

15.0, 0.0, 0.0, 0.0, 0.0, 37.66, 0.0, 0.0,

Baboon 65

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 62.26, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 81.61

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 37.73, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 18.35

Page 10: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 14

Multimedia

Datasets

Number of

Clusters

Intra Thickness Measure Among Each

Cluster )( lcIT in %

Intra Separation Measure Among

Each Cluster )( lcIS in %

Cameraman 36

100, 100, 100, 100, 100, 100, 100, 100, 76.132,

100, 100, 100, 100, 100, 77.33, 100, 100, 100,

90.75, 100, 100, 100, 100, 83.20, 100, 100,

100, 100, 90.37, 85.78, 96.103, 100, 100, 100,

100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

76.132, 0.0, 0.0, 0.0, 0.0, 0.0, 22.66,

0.0, 0.0, 0.0, 9.411, 0.0, 0.0, 0.0, 0.0,

16.79, 0.0, 0.0, 0.0, 0.0, 9.62, 14.21,

3.89, 0.0, 0.0, 0.0, 0.0, 0.0,

Texture 50

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 57.90, 100, 100, 100, 100,

100, 100, 100, 100, 73.08,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 42.09, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 26.91,

Clock 47

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 85.98, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100, 100,

100, 96.55,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 14.012, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 3.44,

Table 3: Result of Overall Cluster Validation Obtained with ECVM Scheme on Cluster Set of ONM

Approach when (MO=75)

Multimedia Datasets

Result of Cluster Validation

Number of Clusters

Identified )(N Intra Association )(CA in

(%)

Intra Divergence )(CD in

(%)

Lena 58 98.41 1.5

Airplane 44 98.78 1.21

Fruit 49 98.79 1.20

House 34 94.94 5.05

Tree 54 99.7 0.22

Gems 33 95.83 4.16

Milk 49 96.96 3.035

Baboon 65 98.9 1.00

Cameraman 36 97.2 2.7

Texture 50 98.6 1.38

Clock 47 99.6 0.31

The cluster validation of result of ONM approach is measured by ECVM scheme as described in equations (6) and (9).

Initially, the intra thickness ))(( lcIT and intra separation ))(( lcIS is calculated among the data blocks in each individual

cluster in the clustering results of eleven multimedia (image) datasets based on improved ECVM technique and the

calculated results are incorporated in Table 2. Subsequently, the proposed ONM scheme could give overall intra

association )(CA of 98.41, 98.78, 98.79, 94.94, 99.7, 95.83, 96.96 and 98.9 for the clustered results of gray scale images

Lena, Airplane, Fruit, House, Tree, Gems, Milk, Baboon, Cameraman, Texture, Clock, respectively and the valued results

are obtained in Table 3. Next, the overall intra divergence )(CD is also calculated on the results of same image datasets as

expressed in Equation (9) and found to be 0.5, 1.21, 1.20, 5.05, 0.22, 4.16, 3.035, 1.0, 2.7, 1.38 and 0.31 for the same

datasets. These results are presented in Table 3.

Table 4: Result of ONM Scheme Identified Dissimilar Clusters on Image Dataset when (CC=50)

Multimedia Datasets Result of ONM with (CC=50)

N- Centroid Objects Number of Clusters Identified )(N

Lena 60 60

Airplane 44 44

Fruit 64 64

House 42 42

Tree 59 59

Page 11: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

15 Sreedhar Kumar S and Madheswaran M

Multimedia Datasets Result of ONM with (CC=50)

N- Centroid Objects Number of Clusters Identified )(N

Gems 49 49

Milk 66 66

Baboon 73 73

Cameraman 42 42

Texture 54 54

Clock 48 48

Table 5: Intra Thickness Validation Obtained with ECVM Scheme on Cluster Set of ONM Approach when

(CC=50)

Multimedia

Datasets

Number of Clusters

Identified

Intra Thickness Measure Among Each

Cluster )( lcIT in %

Intra Separation Measure Among

Each Cluster )( lcIS in %

Lena 60 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 76.717, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 89.54, 100, 100,

100, 100, 100, 100, 100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

23.28, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 10.45, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0,

Airplane 44 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

48.117, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 98.251

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 51.88, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

1.74

Fruit 64 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 66.43, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 33.56, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

House 42 100, 80.80, 100, 100, 75.70, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 67.62, 100, 100, 100,

100, 100, 100, 100, 80.13, 100, 100, 100,

100, 100, 100,

0.0, 19.2, 0.0, 0.0, 24.92, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 32.37, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 19.86, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

Tree 59 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 90.90, 100, 100, 100,

100, 100, 100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 9.09, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

Gems 49 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 85.84,

100, 92.15, 100, 100, 100, 100, 100,

100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 76.717,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 14.15, 0.0, 7.84, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

Milk 66 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

Page 12: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 16

Multimedia

Datasets

Number of Clusters

Identified

Intra Thickness Measure Among Each

Cluster )( lcIT in %

Intra Separation Measure Among

Each Cluster )( lcIS in %

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100,

78.531, 97.84, 100, 77.51, 95.12, 100,

100 , 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 21.46, 2.150, 0.0, 22.5,

4.807, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

Baboon 73 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 89.54, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100,

81.94, 57.89, 100

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 89.54, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 18.05, 42.105, 0.0

Cameraman 42 100, 100, 100, 100, 100, 100, 100, 100,

76.132, 100, 100, 100, 100, 100, 100,

100, 100, 100, 93.75, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 48.117, 100, 100, 100, 100, 100,

100, 100, 100, 100,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

23.86, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 6.25, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

48.117, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0,

Texture 54 100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 64.28, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 75.15

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 35.71, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 24.84

Clock 48 100, 100, 100, 100, 100, 100, 100, 100,

100, 100,100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 85.98,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100, 100,

100, 100, 100, 100, 100, 100, 100,

98.97,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 14.012, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

0.0, 0.0, 0.0, 1.02,

Table 6: Performance Measures of Overall Intra Cluster Validation Obtained with ECVM Technique

Estimated on Result of ONM Scheme when (CC=50)

Multimedia

Datasets

Number of Clusters

Identified (N)

Result of Overall Cluster Validation in (%)

Intra Association )(CA Intra Divergence )(CD

Lena 60 99.41 0.5

Airplane 44 98.78 1.21

Fruit 64 99.4 0.52

House 42 97.11 2.88

Tree 59 99.84 0.15

Gems 49 99.5 0.44

Milk 66 99.2 0.771

Baboon 73 99.13 0.86

Cameraman 42 99.2 0.71

Texture 54 98.87 1.121

Clock 48 99.6 0.37

Page 13: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

17 Sreedhar Kumar S and Madheswaran M

The experimentation is extended and we tested the proposed clustering scheme with another

control centroid (CC=50) over the same eleven image datasets which presented in the Figure 1. The

SDCO method has identified N distinct centroid objects over the eleven multimedia datasets are 60,

44, 64, 42, 59, 49, 66, 73, 42, 54, 48, respectively based on (CC=50). It is indicated in Table 4 that the

proposed clustering scheme has produced optimum number of highly relative clusters are obtained as

60, 44, 64, 42, 59, 49, 66, 73, 42, 54 and 48 in the result based on N centroid objects. Similarly, each

cluster substances are enhanced and visually obtained in Figure 3. Next, these clustering results are

evaluated through the cluster validation measures and the estimated results are presented in Tables 5

and 6. It is clearly demonstrated in Figure 2 and Figure 3 that the control centroid (CC) is acted as a

major key factor in the proposed clustering scheme and it could directly affect the performance of the

clustering result.

Figure 3: Result of ONM with (CC=50) tested on eleven sample gray scale images shown in Figure 1

(a) (b) (c)

(d) (e) (f)

(g) (h)

(i) (j) (k)

Page 14: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 18

6.1. Comparison with Existing Technique

Comparison purpose, the traditional K-Means scheme [7-9] is implemented and tested on same eleven

multimedia (gray scale images) datasets. Initially, the k-centroid objects of eleven multimedia datasets

are predetermined indiscriminately as 15, 15, 18, 14, 15, 18, 16, 34, 20, 22, 17, respectively and it is

presented in Table 7. Next, the K-Means scheme is followed and divided the image dataset into k

distinct clusters in that order of k-centroid objects and the results are obtained in Table 5.

Subsequently, these same results are illustrated in Figure 4. The performance measure of K-Means

scheme is calculated and the validation results are presented in Table 7. It is clearly indicated in the

Table 5, that the existing technique has identified limited number of distinct clusters over the

multimedia datasets based on external parameters (k) with lower intra association and higher intra

divergence. It is evidenced from the Tables 3, 6 and 7 that the ONM scheme has spontaneously

identified supreme number of separate clusters with higher intra association and lesser intra divergence

over the eleven multimedia (grayscale images) datasets without prior knowledge compared to existing

K-Means scheme. It is clearly demonstrated from the experiment that the ONM scheme has produced

outperformance clustering results compared to existing K-Means technique. All these techniques are

experimented on the Dell/ T4500 machine with 2 GB RAM running windows7.

Table 7: Clustering Result of Existing K-Means Scheme Tested on Eleven Gray Scale Images in Figure 1

Multimedia

Datasets

User Defined K-

Centroids

Number of Clusters

Identified )(k

Result of Cluster Validation in (%)

Intra Association

)(CA

Intra Divergence )(CD

Lena 15 15 83.30 16.69

Airplane 15 15 84.09 15.90

Fruit 18 18 81.34 18.65

House 14 14 80.41 19.58

Tree 15 15 81.02 18.92

Gems 18 18 90.0 10.0

Milk 16 16 86.33 13.66

Baboon 34 34 92.46 7.53

Cameraman 20 20 90.54 9.45

Texture 22 22 91.416 8.58

Clock 17 17 90.45 9.52

Page 15: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

19 Sreedhar Kumar S and Madheswaran M

Figure 4: Result of existing K-Means technique tested on eleven sample gray scale images shown in Figure 1

(a) (b) (c)

(d) (e) (f)

(g) (h)

(i) (j) (k)

7. Concluding A simple two stage ONM approach that could produce automatically distinct clusters for a large image

dataset is presented in this paper. In the first stage, the ONM scheme instinctively identifies the

supreme number of centroid objects over the multimedia dataset based on SDCO method. Next stage, a

centroid based clustering processes that could robotically partition the image dataset into finest number

of discrete clusters based on distinct centroid objects which is obtained by SDCO method without prior

knowledge. The individuality of the ONM is the automatic production of finest number of separate

clusters, which is a contradiction to the existing schemes, where it is a user predetermined number of

centroid data points arbitrarily. For the investigation, the ONM clustering scheme is tested on eleven

standard bench mark gray scale images Lena, Airplane, Fruit, House, Tree, Gems, Milk, Baboon,

Cameraman, Texture, Clock, respectively and the clustered results are compared to existing K-Means

technique. The ONM could be better utilized as a pre-process to determine the prime number of

Page 16: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

An Improved Partitioned Clustering Technique for Identifying Optimum

Number of Dissimilar Groups in Multimedia Dataset 20

dissimilar clusters and be augmented compared to existing schemes with outstanding result. Hence, the

ONM scheme is consumed little more time compared to existing K-Means technique.

References [1] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann

Publishers, San Francisco, CA, 2006.

[2] D. R. Cutting, D. R. Karger, J. O. Pedersen and J. W. Tukey, “Scatter / Gather : A Cluster-

based approach to Browsing Large Document Collections,” Proc. of the 15th Annual

International ACM SIGIR Conference on Research and Development in Information Retrieval,

1992, pp. 318-329.

[3] Michael Steinbach, George Karypis and Vipin Kumar, “A Comparison of Document Clustering

techniques,” KDD Workshop on Text Mining, 2000, pp. 1-2.

[4] Martin Ester , Alexander Frommelt , Hans-Peter Kriegel and Jöorg Sander, “Spatial Data

Mining: Database primitives Algorithms and Efficient DBMS Support,” Data Mining &

Knowledge Discovery, vol. 4, no. 2-3, pp. 193-216, July 2000.

[5] I. Cadez, P. Smyth and H. Mannik, “Probabilistic modeling of transactional data with

applications to profiling visualization and prediction,” Proc. of 7th

ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, San Francisco, California, 2001, pp.

37-46.

[6] A. Fogs, W. Warg and O. Zaane, (2001). A non-parametric approach to web log analysis. First

SAMICDM Workshop on Web Mining, Chicago, 41-50.

[7] P.K. Malay, “A Modified k- means Algorithm to avoid empty clusters,” International Journal

of Recent Trends in Engineering, vol. 1, no. 1, pp. 1-8, May 2009.

[8] Alireza Norouzi, Mohd Shafry Mohd Rahim, Ayman Altameen, Tanzila Saba, Abdolvhab

Ehsani Rad, Amjad Rehman and Mueen Uddin, “Medical Image Segmentation Methods,

Algorithms, and Application,” IETE Technical Review, vol. 31, no. 3 pp. 199-213, Jun. 2014.

[9] J. K. Anil, “Data clustering: 50 Years beyond K-means,” Pattern Recognition Letters, vol. 31,

no. 8, pp. 651-666, June. 2010.

[10] A.K. Jain, M,N. Murty and P.J. Flynn, “Data Clustering: A Review,” ACM Computer Surveys,

vol. 31, no. 3, pp. 264-323. Sept. 1999.

[11] Preeti Arora, Deepali and Shipra Varshney, “Analysis of k-means and k-medoids algorithm for

big data,” Procedia Computer Science, vol. 78, pp. 507-512, 2015.

[12] Sepideh Yazdani, Rubiyah Yusof, Alireza Karimian, Mohsen Pashna, and Amirshahram

Hematian, “Image Segmentation Methods and Application in MRI Brain Images,” IETE

Technical Review, vol. 32, no. 6, pp. 1-15, Jul. 2015.

[13] B. Zhang, M. Hsu and U. Dayal, K-harmonic means - a data clustering algorithm, Technical

Report, HPL-1999-124, Hewlett-Packard Laboratories, 1999.

[14] Wei-Chang Yeh, Chy-ming Lai, and Kuei-Hu Chang, “A novel hybrid clustering approach

based on k-harmonic means using robust design,” Neurocomputing, vol. 173, no. P3, pp. 1720-

1732, Jan. 2015.

[15] Anup Bhuttacharya, Ragesh Jaiswal and Nir Ailos, “Tight lower bound instances for k-

means++ in two dimension,” Theoretical Computer Science, vol. 634, no. C, pp. 55-56, June.

2016.

[16] Faliu Yi and Inkyu Moon, “Extended K-Means Algorithm,” Proc. of 5th

International

Conference on Intelligent Human-Machine Systems and Cybernetics, 2013, pp. 263- 219.

[17] Sebastian Maldonnado, Emilio Carrizosa and Richard Weber, “Kernel Penalized k-means: A

feature selection method based on kernel k-means,” Information Science, vol. 322, no. 20, pp.

150-160, Nov. 2015.

Page 17: An Improved Partitioned Clustering Technique for ...€¦ · Madheswaran M Department of ECE, Mahendra Engineering College ... N-Means (ONM), aims to robotically identify finest number

21 Sreedhar Kumar S and Madheswaran M

[18] A.C. Fabregas, B. D. Gerardo and B. C. Tanguilig III, “Enhanced Initial Centroids for K-means

Algorithm,” International Journal of Information Technology and Computer Science, vol. 1,

pp. 26-33, Jan. 2017.

[19] Haiyang Li, Hongzhou He and Yongge Wen, “Dynamic Particle Swarm Optimization and k-

means clustering algorithm for image segmentation,” Optik – International Journal for Light

and Electron Optics, vol. 126, no. 24, 4817-4822, Dec. 2015.

[20] R. Krishnamoorthy and S. Sreedhar Kumar, “ An Improved Agglomerative Clustering

Algorithm For Outlier Detection,” Applied Mathematics and Information Science, vol. 10, no.

3, pp. 1141-1154, May 2016.

[21] Madheswaran M and Sreedhar Kumar S, “An Improved Frequency Based Agglomerative

Clustering Algorithm for Detecting Distinct Clusters on Two Dimensional Dataset,” J. Eng.

Technol. Res. (Academic Journal), vol. 9, no. 4, pp. 30-41, Dec. 2017.

[22] http://www.sipi.usc.edu

[23] J.Z.C. Lai and T.J. Huang, “An agglomerative clustering algorithm using a dynamic k-nearest-

neighbor lis,” Information Sciences, vol. 181, no. 9, pp. 1722-1734, May 2011.

[24] Qi Yu, Xumin Liu, Xiangmin Zhon and Andy Song, “Efficient agglomerative hierarchical

clustering,” Expert Systems with Application, vol. 42, no. 5, pp. 2785-2797, Apr. 2015.

[25] Yong Yang and Shuying Huang, “Image Segmentation by Fuzzy C-Means Clustering

Algorithm with a novel penalty term,” Computing and Informatics, vol. 26, no. 1, pp. 17-31,

2007.


Recommended