+ All Categories
Home > Documents > W E K A Waikato Environment for Knowledge Aquisition.

W E K A Waikato Environment for Knowledge Aquisition.

Date post: 19-Jan-2016
Category:
Upload: ophelia-maxwell
View: 221 times
Download: 1 times
Share this document with a friend
Popular Tags:
14
W E K A Waikato Environment for Knowledge Aquisition
Transcript
Page 1: W E K A Waikato Environment for Knowledge Aquisition.

W E K AWaikato Environment for

Knowledge Aquisition

Page 2: W E K A Waikato Environment for Knowledge Aquisition.

Goals of the workshop

• Aquisition of functional knowledge about the WEKA platform

• Ability of processing (own) data in WEKA

Write seminar work

identifying a problem

transform into data

choose appropriate DM

technique

apply to data

evaluate & interpret the

results

Page 3: W E K A Waikato Environment for Knowledge Aquisition.

Some basic facts about WEKA:

• WEKAWEKA(1)(1) = a flightless bird with an inquisitive nature

(found only on the islands of New Zealand)

• WEKAWEKA(2)(2) = a software ‘workbench’ incorporating several

standard ML/DM techniques

• AAuutthhororss = Ian H. Witten, Eibe Frank (et. al.)

• ProgramProgrammingming languagelanguage = JAVA

• OOrriginigin = The University of Waikato, New Zealand

• LiteraturLiteraturee = Ian H. Witten, Eibe Frank: Practical Machine Learning Tools with JAVA Implementations, Morgan Kaufmann, 1999

• HomepageHomepage = http://www.cs.waikato.ac.nz/~ml/weka

What is WEKA ?

Page 4: W E K A Waikato Environment for Knowledge Aquisition.

• make ML/DM techniques generally

available

• apply them to practical problems

(in agriculture)

• develop new ML/DM algorithms

• contribute to the theoretical framework

of the field (ML/DM)

Objectives of WEKA

Page 5: W E K A Waikato Environment for Knowledge Aquisition.

Versions of WEKA

• There are several versions of WEKA:– WEKA 3.0: “book version” compatible

with description in data mining book– WEKA 3.2: “GUI version” adds graphical

user interfaces (book version is command-line only)

– WEKA 3.4: “development version” with lots of improvements

• This workshop is based on WEKA 3.4(.3)

Page 6: W E K A Waikato Environment for Knowledge Aquisition.

ARFF format (“flat” files):• example: Play-tennis domain

The input to WEKA

%this is an example of a knowledge %domain in ARFF format

@relation weather

@attribute outlook {sunny, overcast, rainy}@attribute temperature real@attribute humidity real@attribute windy {TRUE, FALSE}@attribute play {yes, no}

@datasunny,85,85,FALSE,nosunny,80,90,TRUE,noovercast,83,86,FALSE,yesrainy,70,96,FALSE,yesrainy,68,80,FALSE,yesrainy,65,70,TRUE,noovercast,64,65,TRUE,yessunny,72,95,FALSE,nosunny,69,70,FALSE,yesrainy,75,80,FALSE,yessunny,75,70,TRUE,yesovercast,72,90,TRUE,yesovercast,81,75,FALSE,yes. . .

Conversion to theARFF format?

Example:• converting from

MS-EXCEL to ARFF

Page 7: W E K A Waikato Environment for Knowledge Aquisition.

Starting WEKA – the GUI

Page 8: W E K A Waikato Environment for Knowledge Aquisition.

• Preprocess panel

A quick tour of the “explorer”

Domain info. panel

Attributes panel

Status bar

Filters panel

Attribute info. panel

Log file

Attribute visualization

panel

Page 9: W E K A Waikato Environment for Knowledge Aquisition.

• Classify panel

Classifier panel

Class attribute

Output panel

Test options panel

Result panel

A quick tour of the “explorer”

Page 10: W E K A Waikato Environment for Knowledge Aquisition.

• Visualize panel

A quick tour of the “explorer”

Page 11: W E K A Waikato Environment for Knowledge Aquisition.

• example: The command line

C:\Temp>java weka.classifiers.trees.J48

Weka exception: No training file and no object input file given.

General options:

-t <name of training file> Sets training file.

-T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data.

-c <class index> Sets index of class attribute (default: last).

-x <number of folds> Sets number of folds for cross-validation (default: 10).

-s <random number seed> Sets random number seed for cross-validation (default: 1).

-m <name of file with cost matrix> Sets file with cost matrix.

-l <name of input file> Sets model input file.

-d <name of output file> Sets model output file.

-v Outputs no statistics for training data.

-o Outputs statistics only, not the classifier.

-i Outputs detailed information-retrieval statistics for each class.

-k Outputs information-theoretic statistics.

-p Only outputs predictions for test instances.

-r Only outputs cumulative margin distribution.

-z <class name> Only outputs the source representation of the classifier, giving it the supplied name.

-g Only outputs the graph representation of the classifier.

Options specific to weka.classifiers.j48.J48:

-U Use unpruned tree.

-C <pruning confidence> Set confidence threshold for pruning. (default 0.25)

-M <minimum number of instances> Set minimum number of instances per leaf. (default 2)

-R Use reduced error pruning.

-N <number of folds> Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3)

-B Use binary splits only.

-S Don't perform subtree raising.

-L Do not clean up after the tree has been built.

Page 12: W E K A Waikato Environment for Knowledge Aquisition.

GUI (+):

• visualisation of data and (some) models

GUI (-):

• not all the parameterscan be set (reduced functionality)

GUI vs. command line

Command line (-):

• only textual visualisation of models

• awkward to use

Command line (+):

• full functionality (‘saving the model’)

• batch processing

Page 13: W E K A Waikato Environment for Knowledge Aquisition.

PROs:

• open source (GNU

licence)

• platform-independent (JAVA)

• easy to use

• (relatively) easy to

modify

PROs & CONs of WEKA

CONs:

• relatively slow (JAVA)

• ‘incomplete’

documentation(some GUI features couldbe explained better)

• some features

availableonly from command line

Page 14: W E K A Waikato Environment for Knowledge Aquisition.

Let’s go to work


Recommended