+ All Categories
Home > Documents > WEKA: A Machine Machine Learning with WEKAtwiki.di.uniroma1.it/pub/ApprAuto/AnnoAcc0708/weka.pdf ·...

WEKA: A Machine Machine Learning with WEKAtwiki.di.uniroma1.it/pub/ApprAuto/AnnoAcc0708/weka.pdf ·...

Date post: 25-May-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
173
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA
Transcript

Department of Computer Science,University of Waikato, New Zealand

Eibe Frank

� WEKA: A MachineLearning Toolkit

� The Explorer• Classification and

Regression• Clustering• Association Rules• Attribute Selection• Data Visualization

� The Experimenter� The Knowledge

Flow GUI� Conclusions

Machine Learning withWEKA

10/19/2007 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

10/19/2007 University of Waikato 3

WEKA: the software� Machine learning/data mining software written in

Java (distributed under the GNU Public License)� Used for research, education, and applications� Complements “Data Mining” by Witten & Frank� Main features:

� Comprehensive set of data pre-processing tools,learning algorithms and evaluation methods

� Graphical user interfaces (incl. data visualization)� Environment for comparing learning algorithms

10/19/2007 University of Waikato 4

WEKA: versions� There are several versions of WEKA:

� WEKA 3.0: “book version” compatible withdescription in data mining book

� WEKA 3.2: “GUI version” adds graphical userinterfaces (book version is command-line only)

� WEKA 3.3: “development version” with lots ofimprovements

� This talk is based on the latest snapshot of WEKA3.3 (soon to be WEKA 3.4)

10/19/2007 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

10/19/2007 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

10/19/2007 University of Waikato 7

10/19/2007 University of Waikato 8

10/19/2007 University of Waikato 9

10/19/2007 University of Waikato 10

Explorer: pre-processing the data� Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary� Data can also be read from a URL or from an SQL

database (using JDBC)� Pre-processing tools in WEKA are called “filters”� WEKA contains filters for:

� Discretization, normalization, resampling, attributeselection, transforming and combining attributes, …

10/19/2007 University of Waikato 11

10/19/2007 University of Waikato 12

10/19/2007 University of Waikato 13

10/19/2007 University of Waikato 14

10/19/2007 University of Waikato 15

10/19/2007 University of Waikato 16

10/19/2007 University of Waikato 17

10/19/2007 University of Waikato 18

10/19/2007 University of Waikato 19

10/19/2007 University of Waikato 20

10/19/2007 University of Waikato 21

10/19/2007 University of Waikato 22

10/19/2007 University of Waikato 23

10/19/2007 University of Waikato 24

10/19/2007 University of Waikato 25

10/19/2007 University of Waikato 26

10/19/2007 University of Waikato 27

10/19/2007 University of Waikato 28

10/19/2007 University of Waikato 29

10/19/2007 University of Waikato 30

10/19/2007 University of Waikato 31

10/19/2007 University of Waikato 32

Explorer: building “classifiers”� Classifiers in WEKA are models for predicting

nominal or numeric quantities� Implemented learning schemes include:

� Decision trees and lists, instance-based classifiers,support vector machines, multi-layer perceptrons,logistic regression, Bayes’ nets, …

� “Meta”-classifiers include:� Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

10/19/2007 University of Waikato 33

10/19/2007 University of Waikato 34

10/19/2007 University of Waikato 35

10/19/2007 University of Waikato 36

10/19/2007 University of Waikato 37

10/19/2007 University of Waikato 38

10/19/2007 University of Waikato 39

10/19/2007 University of Waikato 40

10/19/2007 University of Waikato 41

10/19/2007 University of Waikato 42

10/19/2007 University of Waikato 43

10/19/2007 University of Waikato 44

10/19/2007 University of Waikato 45

10/19/2007 University of Waikato 46

10/19/2007 University of Waikato 47

10/19/2007 University of Waikato 48

10/19/2007 University of Waikato 49

10/19/2007 University of Waikato 50

10/19/2007 University of Waikato 51

10/19/2007 University of Waikato 52

10/19/2007 University of Waikato 53

10/19/2007 University of Waikato 54

10/19/2007 University of Waikato 55

10/19/2007 University of Waikato 56

10/19/2007 University of Waikato 57

10/19/2007 University of Waikato 58

10/19/2007 University of Waikato 59

10/19/2007 University of Waikato 60

10/19/2007 University of Waikato 61

10/19/2007 University of Waikato 62

10/19/2007 University of Waikato 63

10/19/2007 University of Waikato 64

10/19/2007 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 66QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 67QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 68

10/19/2007 University of Waikato 69

10/19/2007 University of Waikato 70

10/19/2007 University of Waikato 71

10/19/2007 University of Waikato 72

10/19/2007 University of Waikato 73

10/19/2007 University of Waikato 74

10/19/2007 University of Waikato 75

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 76

10/19/2007 University of Waikato 77

10/19/2007 University of Waikato 78

10/19/2007 University of Waikato 79

10/19/2007 University of Waikato 80

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 81

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 82

10/19/2007 University of Waikato 83

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 84

10/19/2007 University of Waikato 85

10/19/2007 University of Waikato 86

10/19/2007 University of Waikato 87

10/19/2007 University of Waikato 88

10/19/2007 University of Waikato 89

10/19/2007 University of Waikato 90

10/19/2007 University of Waikato 91

10/19/2007 University of Waikato 92

Explorer: clustering data� WEKA contains “clusterers” for finding groups of

similar instances in a dataset� Implemented schemes are:

� k-Means, EM, Cobweb, X-means, FarthestFirst� Clusters can be visualized and compared to “true”

clusters (if given)� Evaluation based on loglikelihood if clustering

scheme produces a probability distribution

10/19/2007 University of Waikato 93

10/19/2007 University of Waikato 94

10/19/2007 University of Waikato 95

10/19/2007 University of Waikato 96

10/19/2007 University of Waikato 97

10/19/2007 University of Waikato 98

10/19/2007 University of Waikato 99

10/19/2007 University of Waikato 100

10/19/2007 University of Waikato 101

10/19/2007 University of Waikato 102

10/19/2007 University of Waikato 103

10/19/2007 University of Waikato 104

10/19/2007 University of Waikato 105

10/19/2007 University of Waikato 106

10/19/2007 University of Waikato 107

10/19/2007 University of Waikato 108

Explorer: finding associations� WEKA contains an implementation of the Apriori

algorithm for learning association rules� Works only with discrete data

� Can identify statistical dependencies betweengroups of attributes:� milk, butter ⇒ bread, eggs (with confidence 0.9 and

support 2000)� Apriori can compute all rules that have a given

minimum support and exceed a given confidence

10/19/2007 University of Waikato 109

10/19/2007 University of Waikato 110

10/19/2007 University of Waikato 111

10/19/2007 University of Waikato 112

10/19/2007 University of Waikato 113

10/19/2007 University of Waikato 114

10/19/2007 University of Waikato 115

10/19/2007 University of Waikato 116

Explorer: attribute selection� Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones� Attribute selection methods contain two parts:

� A search method: best-first, forward selection,random, exhaustive, genetic algorithm, ranking

� An evaluation method: correlation-based, wrapper,information gain, chi-squared, …

� Very flexible: WEKA allows (almost) arbitrarycombinations of these two

10/19/2007 University of Waikato 117

10/19/2007 University of Waikato 118

10/19/2007 University of Waikato 119

10/19/2007 University of Waikato 120

10/19/2007 University of Waikato 121

10/19/2007 University of Waikato 122

10/19/2007 University of Waikato 123

10/19/2007 University of Waikato 124

10/19/2007 University of Waikato 125

Explorer: data visualization� Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem� WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d)� To do: rotating 3-d visualizations (Xgobi-style)

� Color-coded class values� “Jitter” option to deal with nominal attributes (and

to detect “hidden” data points)� “Zoom-in” function

10/19/2007 University of Waikato 126

10/19/2007 University of Waikato 127

10/19/2007 University of Waikato 128

10/19/2007 University of Waikato 129

10/19/2007 University of Waikato 130

10/19/2007 University of Waikato 131

10/19/2007 University of Waikato 132

10/19/2007 University of Waikato 133

10/19/2007 University of Waikato 134

10/19/2007 University of Waikato 135

10/19/2007 University of Waikato 136

10/19/2007 University of Waikato 137

10/19/2007 University of Waikato 138

Performing experiments� Experimenter makes it easy to compare the

performance of different learning schemes� For classification and regression problems� Results can be written into file or database� Evaluation options: cross-validation, learning

curve, hold-out� Can also iterate over different parameter settings� Significance-testing built in!

10/19/2007 University of Waikato 139

10/19/2007 University of Waikato 140

10/19/2007 University of Waikato 141

10/19/2007 University of Waikato 142

10/19/2007 University of Waikato 143

10/19/2007 University of Waikato 144

10/19/2007 University of Waikato 145

10/19/2007 University of Waikato 146

10/19/2007 University of Waikato 147

10/19/2007 University of Waikato 148

10/19/2007 University of Waikato 149

10/19/2007 University of Waikato 150

10/19/2007 University of Waikato 151

10/19/2007 University of Waikato 152

The Knowledge Flow GUI� New graphical user interface for WEKA� Java-Beans-based interface for setting up and

running machine learning experiments� Data sources, classifiers, etc. are beans and can

be connected graphically� Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator”� Layouts can be saved and loaded again later

10/19/2007 University of Waikato 153

10/19/2007 University of Waikato 154

10/19/2007 University of Waikato 155

10/19/2007 University of Waikato 156

10/19/2007 University of Waikato 157

10/19/2007 University of Waikato 158

10/19/2007 University of Waikato 159

10/19/2007 University of Waikato 160

10/19/2007 University of Waikato 161

10/19/2007 University of Waikato 162

10/19/2007 University of Waikato 163

10/19/2007 University of Waikato 164

10/19/2007 University of Waikato 165

10/19/2007 University of Waikato 166

10/19/2007 University of Waikato 167

10/19/2007 University of Waikato 168

10/19/2007 University of Waikato 169

10/19/2007 University of Waikato 170

10/19/2007 University of Waikato 171

10/19/2007 University of Waikato 172

10/19/2007 University of Waikato 173

Conclusion: try it yourself!� WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka� Also has a list of projects based on WEKA� WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, BernhardPfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H.Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio deSouza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , RichardKirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle,Xin Xu, Yong Wang, Zhihai Wang


Recommended