Post on 31-May-2020

João Ferreira Nunes

Instituto Politécnico de Viana do Castelo

Faculdade de Engenharia da Universidade do Porto

DSIE’10 – 2010.01.29

Binary Images Clustering with k-means


• Introduction

• The Process

– Dataset

– Pre-processing

– Clustering Analysis

• Results

• Conclusions and Future Work

• Develop a method to group binary images inrespect to their content by means of anunsupervised learning technique, k-means;

• Use a set of clustering quality criteria tovalidate the clustering and also to assist theselection of the best number of clusters.

Image Clustering Process

Data Collection


Features Extraction



MPEG-7 Core Experiment CE-Shape-1

Dataset characteristics

• Binary silhouette images that represent objects• Their shape may change due to:

– change of a view point with respect to objects;– non-rigid object motion;– noise resulted from segmentation or digitization;

• Some images have holes, while others do not;• Some images have experienced a number of

transformations, such as scales, distortions, cutsand rotations;

• The size of the images is not constant;

Image samples

Experimental Dataset

apple-15 bell-3 heart-10 device3-12 cup-2

spoon-5 bone-13 guitar-7 key-16 hammer-13

horseshoe-5 horse-4 fork-7

Image Clustering Process




Image Clusters


• Extraction of the ROI (region of interest)

– cropping the images

through their bounding box:

• Noise reduction

– close morphological filter:


Image Clustering Process




Image Clusters

Features Extraction

Features Extraction

• Solidity:

• Axis Ratio:

• Filled Ratio:

• Perimeter-Area Ratio:

• Eccentricity:

• Extent:

• Invariant moment (skew invariant)

Convex Hull Area

Minor Axis Length

Major Axis Length


Filled Area


AreaEccentricity of the elipse


Bounding Box Area

Experimental Dataset Features Representation

apple-15 bell-3 heart-10 device3-12 cup-2

spoon-5 bone-13 guitar-7 key-16 hammer-13

horseshoe-5 horse-4 fork-7

Image Clustering Process



ExtractionClustering Analysis

Image Clusters


K-means Clustering

• Images are allocated into K different sets,according to their level of similarity (Euclideandistance);

• Minimizes the intra-cluster distance andmaximizes the inter-cluster distance;

• The value of K “should” be known in advance;

K-means Clustering – Getting the best K

• Several clustering iterations were conducted,varying k from 3 to 20;

• Computed some internal criteria (with noinformation a priori) that validate eachclustering solution:– Silhouette index

– Calinski-Harabasz index

– C index

– weighted inter-intra index

Internal criteria

Image Clustering Process

Data Collection


Features Extraction




C1 C2 C3 C4 C5 C6apple 20 20bell 20 20

bone 18 2 20cup 12 8 20

device3 15 5 20fork 15 5 20

guitar 1 19 20hammer 8 12 20

heart 19 20horse 20 1 20

horseshoe 20 20key 19 1 20

spoon 8 12 2028 48 54 57 25 48


Conclusions and Future Work

• Achieved results are encouraging and suggest the adequacy of the selected features;

• Future Work Explore new features

Consider weighting features

Increase the Dataset

Compare with other clustering methods

Develop a CBIR system using a supervised method (e.g.: k-nearest neighbor)

