Iterative Dichotomiser 3 (ID3) Iterative Dichotomiser 3 (ID3) AlgorithmAlgorithm
Medha PradhanMedha PradhanCS 157B, Spring 2007CS 157B, Spring 2007
AgendaAgenda
Basics of Decision Tree Introduction to ID3 Entropy and Information Gain Two Examples
BasicsBasics What is a decision tree?
A tree where each branching (decision) node represents a choice between 2 or more alternatives, with every branching node being part of a path to a leaf node
Decision node: Specifies a test of some attribute
Leaf node: Indicates classification of an example
ID3ID3 Invented by J. Ross Quinlan Employs a top-down greedy search through
the space of possible decision trees.Greedy because there is no backtracking. It picks highest values first.
Select attribute that is most useful for classifying examples (attribute that has the highest Information Gain).
EntropyEntropy Entropy measures the impurity of an arbitrary collection of
examples. For a collection S, entropy is given as:
For a collection S having positive and negative examplesEntropy(S) = -p+log2p+ - p-log2p-
where p+ is the proportion of positive examples
and p- is the proportion of negative examples
In general, Entropy(S) = 0 if all members of S belong to the same class.Entropy(S) = 1 (maximum) when all members are split equally.
Information GainInformation Gain Measures the expected reduction in entropy. The
higher the IG, more is the expected reduction in entropy.
where Values(A) is the set of all possible values for attribute A,Sv is the subset of S for which attribute A has value v.
Example 1Example 1Sample training data to determine whether an animal lays eggs.
Independent/Condition attributesDependent/
Decision attributes
Animal Warm-blooded
Feathers Fur Swims Lays Eggs
Ostrich Yes Yes No No Yes
Crocodile No No No Yes Yes
Raven Yes Yes No No Yes
Albatross Yes Yes No No Yes
Dolphin Yes No No Yes No
Koala Yes No Yes No No
Entropy(4Y,2N): -(4/6)log2(4/6) – (2/6)log2(2/6)
= 0.91829
Now, we have to find the IG for all four attributes Warm-blooded, Feathers, Fur, Swims
For attribute ‘Warm-blooded’:Values(Warm-blooded) : [Yes,No]S = [4Y,2N]SYes = [3Y,2N] E(SYes) = 0.97095SNo = [1Y,0N] E(SNo) = 0 (all members belong to same class) Gain(S,Warm-blooded) = 0.91829 – [(5/6)*0.97095 + (1/6)*0]
= 0.10916For attribute ‘Feathers’:Values(Feathers) : [Yes,No]S = [4Y,2N]SYes = [3Y,0N] E(SYes) = 0SNo = [1Y,2N] E(SNo) = 0.91829Gain(S,Feathers) = 0.91829 – [(3/6)*0 + (3/6)*0.91829]
= 0.45914
For attribute ‘Fur’:Values(Fur) : [Yes,No]S = [4Y,2N]SYes = [0Y,1N] E(SYes) = 0
SNo = [4Y,1N] E(SNo) = 0.7219
Gain(S,Fur) = 0.91829 – [(1/6)*0 + (5/6)*0.7219]= 0.3167
For attribute ‘Swims’:Values(Swims) : [Yes,No]S = [4Y,2N]SYes = [1Y,1N] E(SYes) = 1 (equal members in both classes)
SNo = [3Y,1N] E(SNo) = 0.81127
Gain(S,Swims) = 0.91829 – [(2/6)*1 + (4/6)*0.81127]= 0.04411
Gain(S,Warm-blooded) = 0.10916Gain(S,Feathers) = 0.45914Gain(S,Fur) = 0.31670Gain(S,Swims) = 0.04411
Gain(S,Feathers) is maximum, so it is considered as the root node
Feathers
Y N
[Ostrich, Raven, Albatross]
[Crocodile, Dolphin, Koala]
Lays Eggs ?
Animal
Warm-
blooded
Feathers
Fur Swims
Lays Eggs
Ostrich
Yes Yes No No Yes
Crocodile
No No No Yes Yes
Raven Yes Yes No No Yes
Albatross
Yes Yes No No Yes
Dolphin
Yes No No Yes No
Koala Yes No Yes No No
The ‘Y’ descendant has only positive examples and becomes the leaf node with classification ‘Lays Eggs’
We now repeat the procedure,S: [Crocodile, Dolphin, Koala]S: [1+,2-]
Entropy(S) = -(1/3)log2(1/3) – (2/3)log2(2/3)= 0.91829
Animal Warm-blooded
Feathers Fur Swims Lays Eggs
Crocodile No No No Yes Yes
Dolphin Yes No No Yes No
Koala Yes No Yes No No
For attribute ‘Warm-blooded’:Values(Warm-blooded) : [Yes,No]S = [1Y,2N]SYes = [0Y,2N] E(SYes) = 0SNo = [1Y,0N] E(SNo) = 0 Gain(S,Warm-blooded) = 0.91829 – [(2/3)*0 + (1/3)*0] = 0.91829
For attribute ‘Fur’:Values(Fur) : [Yes,No]S = [1Y,2N]SYes = [0Y,1N] E(SYes) = 0SNo = [1Y,1N] E(SNo) = 1Gain(S,Fur) = 0.91829 – [(1/3)*0 + (2/3)*1] = 0.25162
For attribute ‘Swims’:Values(Swims) : [Yes,No]S = [1Y,2N]SYes = [1Y,1N] E(SYes) = 1SNo = [0Y,1N] E(SNo) = 0 Gain(S,Swims) = 0.91829 – [(2/3)*1 + (1/3)*0] = 0.25162
Gain(S,Warm-blooded) is maximum
The final decision tree will be:
Feathers
Y N
Lays eggs Warm-blooded
Y N
Lays EggsDoes not lay eggs
Example 2Example 2 Factors affecting sunburn
Name Hair Height Weight Lotion Sunburned
Sarah Blonde Average Light No Yes
Dana Blonde Tall Average Yes No
Alex Brown Short Average Yes No
Annie Blonde Short Average No Yes
Emily Red Average Heavy No Yes
Pete Brown Tall Heavy No No
John Brown Average Heavy No No
Katie Blonde Short Light Yes No
S = [3+, 5-]Entropy(S) = -(3/8)log2(3/8) – (5/8)log2(5/8)
= 0.95443
Find IG for all 4 attributes: Hair, Height, Weight, Lotion
For attribute ‘Hair’:Values(Hair) : [Blonde, Brown, Red]S = [3+,5-]SBlonde = [2+,2-] E(SBlonde) = 1SBrown = [0+,3-]E(SBrown) = 0 SRed = [1+,0-] E(SRed) = 0Gain(S,Hair) = 0.95443 – [(4/8)*1 + (3/8)*0 + (1/8)*0]
= 0.45443
For attribute ‘Height’:Values(Height) : [Average, Tall, Short]SAverage = [2+,1-] E(SAverage) = 0.91829STall = [0+,2-] E(STall) = 0 SShort = [1+,2-] E(SShort) = 0.91829Gain(S,Height) = 0.95443 – [(3/8)*0.91829 + (2/8)*0 + (3/8)*0.91829]
= 0.26571 For attribute ‘Weight’:Values(Weight) : [Light, Average, Heavy]SLight = [1+,1-] E(SLight) = 1SAverage = [1+,2-] E(SAverage) = 0.91829 SHeavy = [1+,2-] E(SHeavy) = 0.91829Gain(S,Weight) = 0.95443 – [(2/8)*1 + (3/8)*0.91829 + (3/8)*0.91829]
= 0.01571 For attribute ‘Lotion’:Values(Lotion) : [Yes, No]SYes = [0+,3-] E(SYes) = 0SNo = [3+,2-] E(SNo) = 0.97095Gain(S,Lotion) = 0.95443 – [(3/8)*0 + (5/8)*0.97095]
= 0.01571
Gain(S,Hair) = 0.45443Gain(S,Height) = 0.26571Gain(S,Weight) = 0.01571Gain(S,Lotion) = 0.3475Gain(S,Hair) is maximum, so it is considered as the root nodeName Hair Height Weigh
tLotion Sunbur
ned
Sarah Blonde Average
Light No Yes
Dana Blonde Tall Average
Yes No
Alex Brown Short Average
Yes No
Annie Blonde Short Average
No Yes
Emily Red Average
Heavy No Yes
Pete Brown Tall Heavy No No
John Brown Average
Heavy No No
Katie Blonde Short Light Yes No
HairBlonde Red
Brown
[Sarah, Dana,Annie, Katie]
[Emily]
[Alex, Pete, John]
Sunburned
NotSunburned?
Repeating again:S = [Sarah, Dana, Annie, Katie]S: [2+,2-]Entropy(S) = 1
Find IG for remaining 3 attributes Height, Weight, Lotion For attribute ‘Height’:Values(Height) : [Average, Tall, Short]S = [2+,2-]SAverage = [1+,0-] E(SAverage) = 0STall = [0+,1-] E(STall) = 0 SShort = [1+,1-] E(SShort) = 1Gain(S,Height) = 1 – [(1/4)*0 + (1/4)*0 + (2/4)*1]
= 0.5
Name Hair Height Weight Lotion Sunburned
Sarah Blonde Average Light No Yes
Dana Blonde Tall Average Yes No
Annie Blonde Short Average No Yes
Katie Blonde Short Light Yes No
For attribute ‘Weight’:Values(Weight) : [Average, Light]S = [2+,2-]SAverage = [1+,1-] E(SAverage) = 1SLight = [1+,1-] E(SLight) = 1 Gain(S,Weight) = 1 – [(2/4)*1 + (2/4)*1]
= 0
For attribute ‘Lotion’:Values(Lotion) : [Yes, No]S = [2+,2-]SYes = [0+,2-] E(SYes) = 0SNo = [2+,0-] E(SNo) = 0 Gain(S,Lotion) = 1 – [(2/4)*0 + (2/4)*0]
= 1
Therefore, Gain(S,Lotion) is maximum
In this case, the final decision tree will be
Hair
BlondeRed
Brown
Sunburned NotSunburnedLotion
Y N
SunburnedNotSunburned
ReferencesReferences "Machine Learning", by Tom Mitchell, McGraw-Hill, 1997 "Building Decision Trees with the ID3 Algorithm", by:
Andrew Colin, Dr. Dobbs Journal, June 1996
http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/dt_prob1.html
Professor Sin-Min Lee, SJSU. http://cs.sjsu.edu/~lee/cs157b/cs157b.html