+ All Categories
Home > Documents > Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap...

Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap...

Date post: 06-Jan-2018
Category:
Upload: alexandra-sparks
View: 235 times
Download: 0 times
Share this document with a friend
Description:
Motivation Clustering is a main task of exploratory data mining Make market Segementation, marketing strategies Document Clustering Target appropriate treatment to patients with similar response patterns Image segementation Apply clustering methods to a real data
17
Wine Clustering Ling Lin
Transcript
Page 1: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

WineClustering

Ling Lin

Page 2: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Contents❏ Motivation❏ Data❏ Dimensionality Reduction-MDS, Isomap❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC❏ Conclustion❏ Reference

Page 3: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Motivation• Clustering is a main task of exploratory data mining

Make market Segementation, marketing strategies Document Clustering Target appropriate treatment to patients with similar response

patterns Image segementation

• Apply clustering methods to a real data

Page 4: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Data➢ Wine data

Source of the data set : “Machine Learning Repository” , University of California, Irvine.

Data sample size : 14 variables and 178 observations in 3 classes : different cultivar

Variables :

1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols

7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10)Color intensity 11)Hue

12)OD280/OD315 of diluted wines 13)Proline

Page 5: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

MDS

Can I seperate objects better? ---> change the ways to find the distances

Page 6: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Cityblock(L1)Distance

Chebychev Distance

Cosine Distance Mahalanobis Distance

Page 7: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Distances• Euclidean Distance-Straight line distance between two points.

• City-block Distance- (L1 Distance)

Sum of the distances of two points in any coordinate dimension.

Page 8: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Distances• Chebychev Distance-(Chessboard Distance)

The greatest distance of two points’ difference in any coordinate dimension.

• Cosine Distance-

The cosine of the angle between two vectors

Page 9: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Distances• Mahalanobis Distance-The dissimilarity of two vectors. S is the

covariance matrix.

Euclidean Distance = c

City-block Distance = a+b

Chebychev Distance = max(a,b) = a

Cosine Distance = cos(θ)

a

b

Page 10: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

MDS in 3D

Page 11: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

MDS in 2D

Page 12: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Isomap

Cosine

Mahalanobis

Page 13: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Isomap

Cosine

Mahalanobis

Page 14: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Kmeans Clustering

Error rate = 0.03

Page 15: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

True Labeled Kmeans Clustering

Normalized Cut Ratio Cut SCC

ClusteringComparison

Page 16: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Conclusion• Dimensionality Reduction-

Different methods for calculating distances and reducing dimension

--->Wine dataV X

3D MDS Cosine Distance Mahalanobis

2D MDS Cosine Distance Mahalanobis

Isomap make Mahalanobis distance a better display

Page 17: Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Conclusion• Clustering:

Kmeans= Rcut→ SCC→ Ncut

Ncut and Rcut : consider both inter and intra cluster connections.

However, in this dataset, the intra cluster connections are weak.


Recommended