Department of Computer Science,
University of Waikato, New Zealand
Eibe Frank
WEKA: A Machine
Learning Toolkit
The Explorer
• Classification and
Regression
• Clustering
• Association Rules
• Attribute Selection
• Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
Machine Learning with WEKA
2/22/2011 University of Waikato 3
WEKA: the software
Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
2/22/2011 University of Waikato 4
WEKA: versions
There are several versions of WEKA:
WEKA 3.0: “book version” compatible with
description in data mining book
WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
WEKA 3.3: “development version” with lots of
improvements
This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
2/22/2011 University of Waikato 5
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with “flat” files
2/22/2011 University of Waikato 6
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with “flat” files
2/22/2011 University of Waikato 7
2/22/2011 University of Waikato 8
2/22/2011 University of Waikato 9
2/22/2011 University of Waikato 10
Explorer: pre-processing the data
Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
2/22/2011 University of Waikato 11
2/22/2011 University of Waikato 12
2/22/2011 University of Waikato 13
2/22/2011 University of Waikato 14
2/22/2011 University of Waikato 15
2/22/2011 University of Waikato 16
2/22/2011 University of Waikato 17
2/22/2011 University of Waikato 18
2/22/2011 University of Waikato 19
2/22/2011 University of Waikato 20
2/22/2011 University of Waikato 21
2/22/2011 University of Waikato 22
2/22/2011 University of Waikato 23
2/22/2011 University of Waikato 24
2/22/2011 University of Waikato 25
2/22/2011 University of Waikato 26
2/22/2011 University of Waikato 27
2/22/2011 University of Waikato 28
2/22/2011 University of Waikato 29
2/22/2011 University of Waikato 30
2/22/2011 University of Waikato 31
2/22/2011 University of Waikato 32
Explorer: building “classifiers”
Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
2/22/2011 University of Waikato 33
2/22/2011 University of Waikato 34
2/22/2011 University of Waikato 35
2/22/2011 University of Waikato 36
2/22/2011 University of Waikato 37
2/22/2011 University of Waikato 38
2/22/2011 University of Waikato 39
2/22/2011 University of Waikato 40
2/22/2011 University of Waikato 41
2/22/2011 University of Waikato 42
2/22/2011 University of Waikato 43
2/22/2011 University of Waikato 44
2/22/2011 University of Waikato 45
2/22/2011 University of Waikato 46
2/22/2011 University of Waikato 47
2/22/2011 University of Waikato 48
2/22/2011 University of Waikato 49
2/22/2011 University of Waikato 50
2/22/2011 University of Waikato 51
2/22/2011 University of Waikato 52
2/22/2011 University of Waikato 53
2/22/2011 University of Waikato 54
2/22/2011 University of Waikato 55
2/22/2011 University of Waikato 56
2/22/2011 University of Waikato 57
2/22/2011 University of Waikato 58
2/22/2011 University of Waikato 59
2/22/2011 University of Waikato 60
2/22/2011 University of Waikato 61
2/22/2011 University of Waikato 62
2/22/2011 University of Waikato 63
2/22/2011 University of Waikato 64
2/22/2011 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011 University of Waikato 66QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011 University of Waikato 67QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011 University of Waikato 68
2/22/2011 University of Waikato 69
2/22/2011 University of Waikato 70
2/22/2011 University of Waikato 71
2/22/2011 University of Waikato 72
2/22/2011 University of Waikato 73
2/22/2011 University of Waikato 74
2/22/2011 University of Waikato 75
Quic k Time™ and a TIFF (LZW) dec ompres s or are needed to s ee this pic ture.
2/22/2011 University of Waikato 76
2/22/2011 University of Waikato 77
2/22/2011 University of Waikato 78
2/22/2011 University of Waikato 79
2/22/2011 University of Waikato 80
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011 University of Waikato 81
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011 University of Waikato 82
2/22/2011 University of Waikato 83
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011 University of Waikato 84
2/22/2011 University of Waikato 85
2/22/2011 University of Waikato 86
2/22/2011 University of Waikato 87
2/22/2011 University of Waikato 88
2/22/2011 University of Waikato 89
2/22/2011 University of Waikato 90
2/22/2011 University of Waikato 91
2/22/2011 University of Waikato 92
Explorer: clustering data
WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are:
k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true”
clusters (if given)
Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
2/22/2011 University of Waikato 93
2/22/2011 University of Waikato 94
2/22/2011 University of Waikato 95
2/22/2011 University of Waikato 96
2/22/2011 University of Waikato 97
2/22/2011 University of Waikato 98
2/22/2011 University of Waikato 99
2/22/2011 University of Waikato 100
2/22/2011 University of Waikato 101
2/22/2011 University of Waikato 102
2/22/2011 University of Waikato 103
2/22/2011 University of Waikato 104
2/22/2011 University of Waikato 105
2/22/2011 University of Waikato 106
2/22/2011 University of Waikato 107
2/22/2011 University of Waikato 108
Explorer: finding associations
WEKA contains an implementation of the Apriori
algorithm for learning association rules
Works only with discrete data
Can identify statistical dependencies between
groups of attributes:
milk, butter bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
2/22/2011 University of Waikato 109
2/22/2011 University of Waikato 110
2/22/2011 University of Waikato 111
2/22/2011 University of Waikato 112
2/22/2011 University of Waikato 113
2/22/2011 University of Waikato 114
2/22/2011 University of Waikato 115
2/22/2011 University of Waikato 116
Explorer: attribute selection
Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary
combinations of these two
2/22/2011 University of Waikato 117
2/22/2011 University of Waikato 118
2/22/2011 University of Waikato 119
2/22/2011 University of Waikato 120
2/22/2011 University of Waikato 121
2/22/2011 University of Waikato 122
2/22/2011 University of Waikato 123
2/22/2011 University of Waikato 124
2/22/2011 University of Waikato 125
Explorer: data visualization
Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
“Zoom-in” function
2/22/2011 University of Waikato 126
2/22/2011 University of Waikato 127
2/22/2011 University of Waikato 128
2/22/2011 University of Waikato 129
2/22/2011 University of Waikato 130
2/22/2011 University of Waikato 131
2/22/2011 University of Waikato 132
2/22/2011 University of Waikato 133
2/22/2011 University of Waikato 134
2/22/2011 University of Waikato 135
2/22/2011 University of Waikato 136
2/22/2011 University of Waikato 137
2/22/2011 University of Waikato 138
Conclusion: try it yourself!
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang