Intelligent Database Systems Lab
Presenter : BEI-YI JIANG
Authors : Hongwu Qin, Xiuqin Ma, Tutut Herawan,
Jasni Mohamad Zain
2014. KBS
MGR: An information theory based hierarchical divisive clustering algorithm for categorical data
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation
• Many algorithms for clustering categorical have low
clustering accuracy while others have high
computational complexity.
Intelligent Database Systems Lab
Objectives
• Proposes a new hierarchical divisive clustering
algorithm for categorical data, termed MGR, based on
information theory.
• Achieve better performance and efficiency of
clustering.
Intelligent Database Systems Lab
Methodology
Intelligent Database Systems Lab
Methodology
Information system
Mean gain ratio and entropy of cluster
Algorithm
1.
2.
3.
Computational complexity4.
Intelligent Database Systems Lab
Methodology• Information system
Intelligent Database Systems Lab
Methodology• Mean gain ratio and entropy of cluster
Intelligent Database Systems Lab
Methodology• Mean gain ratio and entropy of cluster
Intelligent Database Systems Lab
Methodology• Mean gain ratio and entropy of cluster
Intelligent Database Systems Lab
Methodology• Mean gain ratio and entropy of cluster
Intelligent Database Systems Lab
Methodology• Algorithm
Intelligent Database Systems Lab
Methodology• Algorithm
Intelligent Database Systems Lab
Methodology• Algorithm
Intelligent Database Systems Lab
Methodology• Example
Intelligent Database Systems Lab
Methodology• Comparisons with MMR
Intelligent Database Systems Lab
Methodology• Comparisons with MMR
Intelligent Database Systems Lab
Methodology• Comparisons with MMR
Intelligent Database Systems Lab
Methodology• Comparisons with MMR
Intelligent Database Systems Lab
Methodology• Comparisons with MMR
Intelligent Database Systems Lab
Experments• manually label• randomly select 100 English articles from Wikipedia• labeled 3072 concepts that belong to 29044 categories (7780
relevant categories)
Intelligent Database Systems Lab
Experments
Intelligent Database Systems Lab
Experments
Intelligent Database Systems Lab
Experments
Intelligent Database Systems Lab
Experments
Intelligent Database Systems Lab
Experments
Intelligent Database Systems Lab
Experments
Intelligent Database Systems Lab
Experments
Intelligent Database Systems Lab
Conclusions
• MGR has better clustering accuracy and stability.
• MGR has better clustering efficiency and scalability.
Intelligent Database Systems Lab
Comments• Advantages– better clustering accuracy and stability– without specifying the number of clusters
• Applications-Categorical data-Clustering