Unsupervised Learning (Examples)
Javier Bejar cbea
Term 2010/2011
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 1 / 25
Outline
1 Iris
2 Voting Records
3 Mushroom
4 Image Segmentation
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 2 / 25
Iris
Iris
Differentiate among three species of flowers (Iris)
4 continuous attributes
Attributes: Measures of characteristics of the flowers
150 instances
3 classes
96 % accuracy for supervised learning
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 3 / 25
Iris
Iris
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 4 / 25
Iris
Iris - Expectation/maximization
We use the EM algorithm looking for 3 clusters
Clusters are relatively clear, accuracy is a little bit lower
0 1 2 <-- assigned to cluster
0 50 0 | Iris-setosa
50 0 0 | Iris-versicolor
14 0 36 | Iris-virginica
Cluster 0 <-- Iris-versicolor
Cluster 1 <-- Iris-setosa
Cluster 2 <-- Iris-virginica
Incorrectly clustered instances : 14.0 9.3333 %
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 5 / 25
Iris
Iris - Expectation/maximization
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 6 / 25
Iris
Iris - K-means
K-means algorithm looking of 3 clusters
Clusters are relatively clear, but cluster intersection affects prediction
0 1 2 <-- assigned to cluster
0 50 0 | Iris-setosa
47 0 3 | Iris-versicolor
14 0 36 | Iris-virginica
Cluster 0 <-- Iris-versicolor
Cluster 1 <-- Iris-setosa
Cluster 2 <-- Iris-virginica
Incorrectly clustered instances : 17.0 11.3333 %
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 7 / 25
Voting Records
Voting Records
Classify US senators by their voting
16 binary attributes
Attributes: Vote of the senator to different proposals (budget,immigration, taxes, military aid, ...)
435 instances
2 classes
96.3 % accuracy for supervised learning
Visualization of the data set is very difficult (binary attributes!)
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 8 / 25
Voting Records
Voting Records - PCA
PCA is used to obtain a new set of attributes
The data set does not holds the conditions to apply PCA (nongaussian data)
The 3 first components explain the 60 % of the variance (the first oneexplains 45 %, All are needed to reach 95 % of variance)
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 9 / 25
Voting Records
Voting records - PCA
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 10 / 25
Voting Records
Voting Records - Expectation-maximization
EM algorithm is applied looking for 2 clusters
Clusters are not very clear, the error is large
0 1 <-- assigned to cluster
44 223 | democrat
159 9 | republican
Cluster 0 <-- republican
Cluster 1 <-- democrat
Incorrectly clustered instances : 53.0 12.1839 %
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 11 / 25
Voting Records
Voting Records - K-means
K-means algorithm is applied looking for 2 clusters
The error is larger because of the intersection among clusters
0 1 <-- assigned to cluster
50 217 | democrat
157 11 | republican
Cluster 0 <-- republican
Cluster 1 <-- democrat
Incorrectly clustered instances : 61.0 14.023 %
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 12 / 25
Mushroom
Mushroom
Distinguish between poisonous and edible mushrooms
22 Attributes binary and nominal
Attributes: Visible characteristics of the mushrooms
About 8000 instances
2 classes
100 % accuracy for supervised learning
Visualization using the original attributes is difficult (binary andnominal attributes!)
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 13 / 25
Mushroom
Mushroom - PCA
PCA is used to obtain a new set of attributes
The data set does not holds the conditions to apply PCA (nongaussian data)
The first 10 components explain only 50 % of the variance. Arenecessary all to explain 95 % of the variance (PCA has 59components).
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 14 / 25
Mushroom
Mushroom - PCA
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 15 / 25
Mushroom
Mushroom - PCA
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 16 / 25
Mushroom
Mushroom - PCA
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 17 / 25
Mushroom
Mushroom - PCA
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 18 / 25
Mushroom
Mushroom - Expectation/maximization
EM algorithm is applied looking for 2 clusters
Clusters are not very clear, the error is large
Probably it is more interesting to look for more clusters and analyzethem (the data set has more structure than the supervised labelsshow)
0 1 <-- assigned to cluster
4208 0 | e
836 3080 | p
Cluster 0 <-- e
Cluster 1 <-- p
Incorrectly clustered instances : 836.0 10.2905 %
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 19 / 25
Mushroom
Mushroom - Expectation/maximization + attributeselection
We are cheating :-)
A wrapper using decision trees is used to find the relevant attributes(5 relevant attributes)
EM algorithm is applied looking for 2 clusters
0 1 <-- assigned to cluster
4000 208 | e
528 3388 | p
Cluster 0 <-- e
Cluster 1 <-- p
Incorrectly clustered instances : 736.0 9.0596 %
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 20 / 25
Mushroom
Mushroom - K-means
K-means algorithm is applied looking for 2 clusters
The result is awful, intersection among classes is large, there is nogood partition of the data
0 1 <-- assigned to cluster
1234 2974 | e
2093 1823 | p
Cluster 0 <-- p
Cluster 1 <-- e
Incorrectly clustered instances: 3057.0 37.6292 %
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 21 / 25
Image Segmentation
Clustering for Image Processing
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 22 / 25
Image Segmentation
Clustering in image processing
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 23 / 25
Image Segmentation
Clustering for Image Processing
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 24 / 25
Image Segmentation
Clustering for Image Processing
Javier Bejar cbea Unsupervised Learning (Examples) Term 2010/2011 25 / 25