Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | meghan-douglas |
View: | 215 times |
Download: | 0 times |
1
Running Clustering Algorithm in Weka
Presented by Rachsuda Jiamthapthaksin
Computer Science DepartmentUniversity of Houston
2
What is Weka?
• Data mining software in Java– Supervised learning (classification)– Unsupervised learning (clustering)
• Tools– Exploration– Visualization– Experiment– Statistical summary
3
Download Weka
• http://www.cs.waikato.ac.nz/ml/weka/– Window (weka-3-5-6jre.exe)– Linux
4
Getting Start
5
Memory Limitation in Weka
• Run Chooser from DOS to increase memory
• C:\> java -Xmx128m -classpath .;/progra~1/weka-3-5/weka.jar
weka.gui.GUIChooser
6
Weka GUI
7
Explorer
8
Open Files (.csv, .arff)
9
Dataset’s Description
Attributes
Dataset’sstatistics
10
Remove Class Attribute
Non-classattributes
11
Select A Clustering Algorithm
12
Select A Clustering Algorithm
13
Select A Clustering Algorithm
14
Parameters’ Setting
15
Run A Clustering Algorithm
16
DBSCAN Results=== Run information ===
Scheme: weka.clusterers.DBScan -E 0.9 -M 6 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject
Relation: iris-weka.filters.unsupervised.attribute.Remove-R5Instances: 150Attributes: 4 sepallength sepalwidth petallength petalwidthTest mode: evaluate on training data
=== Model and evaluation on training set ===
DBScan clustering results========================================================================================
Clustered DataObjects: 150Number of attributes: 4Epsilon: 0.9; minPoints: 6Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabaseDistance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObjectNumber of generated clusters: 1Elapsed time: .06
( 0.) 5.1,3.5,1.4,0.2 --> 0( 1.) 4.9,3,1.4,0.2 --> 0( 2.) 4.7,3.2,1.3,0.2 --> 0( 3.) 4.6,3.1,1.5,0.2 --> 0( 4.) 5,3.6,1.4,0.2 --> 0…(146.) 6.3,2.5,5,1.9 --> 0(147.) 6.5,3,5.2,2 --> 0(148.) 6.2,3.4,5.4,2.3 --> 0(149.) 5.9,3,5.1,1.8 --> 0
Clustered Instances
0 150 (100%)
17
Simplify A Tested Dataset
18
Simplify A Tested Dataset
19
Parameters’ Setting
20
DBSCAN Clustering Results=== Run information ===
Scheme: weka.clusterers.DBScan -E 0.3 -M 50 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject
Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5Instances: 150Attributes: 2 petallength petalwidthTest mode: evaluate on training data
=== Model and evaluation on training set ===
DBScan clustering results========================================================================================
Clustered DataObjects: 150Number of attributes: 2Epsilon: 0.3; minPoints: 50Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabaseDistance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObjectNumber of generated clusters: 2Elapsed time: .03
( 0.) 1.4,0.2 --> 0( 1.) 1.4,0.2 --> 0( 2.) 1.3,0.2 --> 0( 3.) 1.5,0.2 --> 0…(146.) 5,1.9 --> 1(147.) 5.2,2 --> 1(148.) 5.4,2.3 --> 1(149.) 5.1,1.8 --> 1
Clustered Instances
0 50 ( 33%)1 100 ( 67%)
21
Run k-Means in Weka
22
Parameters’ Setting
23
k-Means Clustering Results=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5Instances: 150Attributes: 2 petallength petalwidthTest mode: evaluate on training data
=== Model and evaluation on training set ===
kMeans======
Number of iterations: 6Within cluster sum of squared errors: 5.179687509974782
Cluster centroids:
Cluster 0Mean/Mode: 4.906 1.676 Std Devs: 0.8256 0.4248
Cluster 1Mean/Mode: 1.464 0.244 Std Devs: 0.1735 0.1072
Clustered Instances
0 100 ( 67%)1 50 ( 33%)
24
ArffViewer: Convert Dataset’s Extension
25
Open A Dataset’s file
26
Select A Dataset’s File
27
View the Dataset
28
Manipulate the Dataset (Optional)
29
Save As .Arff File
30
Weka Documentation