+ All Categories
Home > Documents > 11.11 - IJSkt.ijs.si/petra_kralj/IPS_DM_1415/HandsOnWeka-Part1-handouts.pdf · 11.11.2014 2...

11.11 - IJSkt.ijs.si/petra_kralj/IPS_DM_1415/HandsOnWeka-Part1-handouts.pdf · 11.11.2014 2...

Date post: 09-Feb-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
5
11.11.2014 1 http://kt.ijs.si/petra_kralj/dmkd.html Hand on Weka 2014/11/11 Petra Kralj Novak [email protected] http://kt.ijs.si/petra_kralj/dmkd.html Data Mining Tools Weka http://www.cs.waikato.ac.nz/ml/weka/ Orange http://orange.biolab.si/ Knime http://www.knime.org/ Taverna http://www.taverna.org.uk/ Rapid Miner http://rapid-i.com/content/view/181/196/ ClowdFlows http://clowdflows.org/ http://kt.ijs.si/petra_kralj/dmkd.html Weka (Waikato Environment for Knowledge Analysis) Collection of machine learning algorithms for data mining tasks The algorithms Can be applied directly to a dataset Can be called from Java code (library) Weka contains tools for Data pre-processing Classification Regression Clustering Association rules Visualization Weka is open source software issued under the GNU General Public Licanse http://kt.ijs.si/petra_kralj/dmkd.html Exsercise1: ID3 in Weka 1. Build a decision tree with the ID3 algorithm on the lenses dataset, evaluate on a separate test set http://kt.ijs.si/petra_kralj/dmkd.html Weka: Install Download version 3.6 http://www.cs.waikato.ac.nz/ml/weka/ http://kt.ijs.si/petra_kralj/dmkd.html Weka: Run Explorer Choose Explorer
Transcript
Page 1: 11.11 - IJSkt.ijs.si/petra_kralj/IPS_DM_1415/HandsOnWeka-Part1-handouts.pdf · 11.11.2014 2 Exercise 1: ID3 in Weka • In the Weka data mining tool, induce a decision tree for the

11.11.2014

1

http://kt.ijs.si/petra_kralj/dmkd.html

Hand on Weka 2014/11/11

Petra Kralj Novak

[email protected]

http://kt.ijs.si/petra_kralj/dmkd.html

Data Mining Tools

• Weka http://www.cs.waikato.ac.nz/ml/weka/

• Orange http://orange.biolab.si/

• Knime http://www.knime.org/

• Taverna http://www.taverna.org.uk/

• Rapid Miner http://rapid-i.com/content/view/181/196/

• ClowdFlows http://clowdflows.org/

http://kt.ijs.si/petra_kralj/dmkd.html

Weka (Waikato Environment for Knowledge Analysis)

• Collection of machine learning algorithms for data mining tasks

• The algorithms

– Can be applied directly to a dataset

– Can be called from Java code (library)

• Weka contains tools for

– Data pre-processing

– Classification

– Regression

– Clustering

– Association rules

– Visualization

• Weka is open source software issued under the GNU General Public

Licanse

http://kt.ijs.si/petra_kralj/dmkd.html

Exsercise1: ID3 in Weka

1. Build a decision tree with the ID3 algorithm on the lenses dataset,

evaluate on a separate test set

http://kt.ijs.si/petra_kralj/dmkd.html

Weka: Install

Download

version

3.6

http://www.cs.waikato.ac.nz/ml/weka/

http://kt.ijs.si/petra_kralj/dmkd.html

Weka: Run Explorer

Choose Explorer

Page 2: 11.11 - IJSkt.ijs.si/petra_kralj/IPS_DM_1415/HandsOnWeka-Part1-handouts.pdf · 11.11.2014 2 Exercise 1: ID3 in Weka • In the Weka data mining tool, induce a decision tree for the

11.11.2014

2

http://kt.ijs.si/petra_kralj/dmkd.html

Exercise 1: ID3 in Weka

• In the Weka data mining tool, induce a decision

tree for the lenses dataset with the ID3

algorithm.

• Data: – lensesTrain.arff

– lensesTest.arff

• Compare the outcome with the manually

obtained results.

http://kt.ijs.si/petra_kralj/dmkd.html

Load the data

http://kt.ijs.si/petra_kralj/dmkd.html

Load the data - 2

lensesTrain.arff

http://kt.ijs.si/petra_kralj/dmkd.html

The data are loaded

Target variable

Choose

“Classify”

http://kt.ijs.si/petra_kralj/dmkd.html

Choose algoritem http://kt.ijs.si/petra_kralj/dmkd.html

trees

Id3

Page 3: 11.11 - IJSkt.ijs.si/petra_kralj/IPS_DM_1415/HandsOnWeka-Part1-handouts.pdf · 11.11.2014 2 Exercise 1: ID3 in Weka • In the Weka data mining tool, induce a decision tree for the

11.11.2014

3

http://kt.ijs.si/petra_kralj/dmkd.html

1 2

3

5

lensesTest.arff

4

http://kt.ijs.si/petra_kralj/dmkd.html

Decision tree

http://kt.ijs.si/petra_kralj/dmkd.html

Classification accuracy

Confusion

matrix

http://kt.ijs.si/petra_kralj/dmkd.html

Exercise 2: CAR dataset

• 1728 examples

• 6 attributes – 6 nominal

– 0 numeric

• Nominal target variable – 4 classes: unacc, acc, good, v-good

– Distribution of classes • unacc (70%), acc (22%), good (4%), v-good (4%)

• No missing values

http://kt.ijs.si/petra_kralj/dmkd.html

Preparing the data for WEKA - 1

Data in a spreadsheet

(e.g. MS Excel)

- Rows are examples

- Columns are attributes

- The last column is the target

variable

http://kt.ijs.si/petra_kralj/dmkd.html

Preparing the data for WEKA - 2

Save as “.csv” - Careful with dots “.”,

commas “,” and

semicolons “;”!

Page 4: 11.11 - IJSkt.ijs.si/petra_kralj/IPS_DM_1415/HandsOnWeka-Part1-handouts.pdf · 11.11.2014 2 Exercise 1: ID3 in Weka • In the Weka data mining tool, induce a decision tree for the

11.11.2014

4

http://kt.ijs.si/petra_kralj/dmkd.html

Load the data

Target variable

Car.csv

http://kt.ijs.si/petra_kralj/dmkd.html

Choose algorithm J48

http://kt.ijs.si/petra_kralj/dmkd.html

Building and evaluating the tree

http://kt.ijs.si/petra_kralj/dmkd.html

Classified as

Actual values

Classification

accuracy

http://kt.ijs.si/petra_kralj/dmkd.html

Right

mouse

click

http://kt.ijs.si/petra_kralj/dmkd.html

Tree pruning

Set the minimal number

of objects per leaf to 15

Parameters of the

algorithm (right

mouse click)

Page 5: 11.11 - IJSkt.ijs.si/petra_kralj/IPS_DM_1415/HandsOnWeka-Part1-handouts.pdf · 11.11.2014 2 Exercise 1: ID3 in Weka • In the Weka data mining tool, induce a decision tree for the

11.11.2014

5

http://kt.ijs.si/petra_kralj/dmkd.html

Reduced

number of

leaves and

nodes Easier to interpret

Lower

classification

accuracy

http://kt.ijs.si/petra_kralj/dmkd.html

http://kt.ijs.si/petra_kralj/dmkd.html

Naïve Bayes classifier

http://kt.ijs.si/petra_kralj/dmkd.html

http://kt.ijs.si/petra_kralj/dmkd.html http://kt.ijs.si/petra_kralj/dmkd.html

Summary

• Weka

• ID3, separate test set

• Data preparation

• J48 (C4.5), cross validation, tree prunning

• Naïve Bayes


Recommended