Chapter 5 decision tree induction using frequency tables for attribute selection

Post on 09-Jul-2015

1,086 views 5 download

Tags:

description

Decision Tree Induction Using Frequency Tables for Attribute Selection

transcript

Nguyễn Dương Trung Dũng

1

Content 1. Calculating Entropy in Practice

2. Gini Index of Diversity

3. Inductive Bias

4. Using Gain Ratio for Attribute Selection

2

Calculating Entropy in Practice Age specRx astig tears Class

1 1 1 1 3

1 1 1 2 2

1 1 2 1 3

1 1 2 2 1

1 2 1 1 3

1 2 1 2 2

1 2 2 1 3

1 2 2 2 1

Training Set 1 (age=1) for lens24

3

Calculating Entropy in Practice age=1 age=2 age=3

Class 1 2 1 1

Class 2 2 2 1

Class 3 4 5 6

Column Sum 8 8 8

Frequency Table for Attribute age for lens24

The cells of this table show the number of occurences of each combination of class and attribute value in the training set.

4

Calculating Entropy in Practice

5

Gini Index of Diversity

6

Gini Index of DiversityAge specRx astig tears Class

1 1 1 1 3

1 1 1 2 2

1 1 2 1 3

1 1 2 2 1

1 2 1 1 3

1 2 1 2 2

1 2 2 1 3

1 2 2 2 1

Training Set 1 (age=1) for lens24

7

Gini Index of Diversity

8

Gini Index of DiversityWe can now calculate the new value of the Gini Index as follows

(a) For each non-empty column, form the sum of the squares of the values in the body of the table and divide by the column sum.

(b) Add the values obtained for all the columns and divide by N (the number of instances)

(c) Subtract the total from 1

9

Gini Index of Diversityage=1 age=2 age=3

Class 1 2 1 1

Class 2 2 2 1

Class 3 4 5 6

Column Sum 8 8 8

10

Gini Index of Diversity

11

Inductive Bias

12

Inductive Bias Inductive bias:

- A preference for one choice rather than another

- Determined by external factors such as our preferences, simplicity, familiarity

- Any formula we use for it introduces an inductive bias

13

Using Gain Ratio for Attribute Selection

14

Using Gain Ratio for Attribute Selection

age=1 age=2 age=3

Class 1 2 1 1

Class 2 2 2 1

Class 3 4 5 6

Column Sum 8 8 8

15

The end

16