Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST)...

Post on 28-Dec-2015

214 views 0 download

Tags:

transcript

Artificial Intelligence7. Decision trees

Japan Advanced Institute of Science and Technology (JAIST)Yoshimasa Tsuruoka

Outline• What is a decision tree?• How to build a decision tree• Entropy• Information Gain

• Overfitting• Generalization performance• Pruning

• Lecture slides• http://www.jaist.ac.jp/~tsuruoka/lectures/

Decision treesChapter 3 of Mitchell, T., Machine Learning (1997)

• Decision Trees– Disjunction of conjunctions– Successfully applied to a broad range of tasks• Diagnosing medical cases• Assessing credit risk of loan applications

• Nice characteristics– Understandable to human– Robust to noise

• Concept: PlayTennis

A decision tree

Outlook

Humidity Wind

Sunny RainOvercast

Yes

No Yes

High Normal

No Yes

Strong Weak

• Instance <Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>

Classification by a decision tree

Outlook

Humidity Wind

Sunny RainOvercast

Yes

No Yes

High Normal

No Yes

Strong Weak

(Outlook = Sunny ^ Humidity = Normal)v (Outlook = Overcast)v (Outlook = Rain ^ Wind = Weak)

Disjunction of conjunctions

Outlook

Humidity Wind

Sunny RainOvercast

Yes

No Yes

High Normal

No Yes

Strong Weak

Problems suited to decision trees

• Instanced are represented by attribute-value pairs• The target function has discrete target values• Disjunctive descriptions may be required• The training data may contain errors• The training data may contain missing

attribute values

Training dataDay Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

Which attribute should be tested at each node?

• We want to build a small decision tree

• Information gain– How well a given attribute separates the training

examples according to their target classification– Reduction in entropy

• Entropy– (im)purity of an arbitrary collection of examples

Entropy

• If there are only two classes

• In general,

ppppSEntropy 22 loglog

940.0

14/5log14/514/9log14/9]5,9[ 22

Entropy

c

iii ppSEntropy

12log

Information Gain

vAValuesv

v SEntropyS

SSEntropyASGain

,

• The expected reduction in entropy achieved by splitting the training examples

Example

048.0

00.114

6811.0

14

8940.0

14

6

14

8

,

]3,3[

]2,6[

]5,9[

,

,

StrongWeak

StrongWeakvv

v

Strong

Weak

SEntropySEntropySEntropy

SEntropyS

SSEntropyWindSGain

S

S

S

StrongWeakWindValues

Coumpiting Information Gain

Humidity Wind

High Normal Weak Strong

940.0

]5,9[

E

S

985.0

]4,3[

E

S

592.0

]1,6[

E

S

940.0

]5,9[

E

S

811.0

]2,6[

E

S

00.1

]3,3[

E

S

151.0

592.014

7985.0

14

7940.0

,

HumiditySGain

048.0

592.014

6811.0

14

8940.0

,

WindSGain

Which attribute is the best classifier?

• Information gain

029.0,

048.0,

151.0,

246.0,

eTemperaturSGain

WindSGain

HumiditySGain

OutlookSGain

Splitting training data with Outlook

Outlook

Sunny RainOvercast

{D1,D2,…,D14}[9+,5-]

{D1,D2,D8,D9,D11}[2+,3-]

{D3,D7,D12,D13}[4+,0-]

{D4,D5,D6,D10,D14}[3+,2-]

Yes? ?

Overfitting

• Growing each branch of the tree deeply enough to perfectly classify the training examples is not a good strategy.– The resulting tree may overfit the training data

• Overfitting– The tree can explain the training data very well

but performs poorly on new data

Alleviating the overfitting problem

• Several approaches– Stop growing the tree earlier– Post-prune the tree

• How can we evaluate the classification performance of the tree for new data?– The available data are separated into two sets of

examples: a training set and a validation (development) set

Validation (development) set• Use a portion of the original training data to

estimate the generalization performance.

Original training set

Original training set

Test setTest set

Training setTraining set

Test setTest set

Validation setValidation set