+ All Categories
Home > Documents > Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a...

Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a...

Date post: 16-Apr-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
47
Decision Trees Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper aHribuIon. Please send comments and correcIons to Eric.
Transcript
Page 1: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

DecisionTrees

RobotImageCredit:ViktoriyaSukhanova©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperaHribuIon.PleasesendcommentsandcorrecIonstoEric.

Page 2: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

FuncIonApproximaIonProblemSe*ng•  Setofpossibleinstances •  Setofpossiblelabels •  UnknowntargetfuncIon •  SetoffuncIonhypotheses

Input:TrainingexamplesofunknowntargetfuncIonf

Output:Hypothesisthatbestapproximatesf

XY

f : X ! YH = {h | h : X ! Y}

h 2 H

BasedonslidebyTomMitchell

{hxi, yii}ni=1 = {hx1, y1i , . . . , hxn, yni}

Page 3: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

SampleDataset•  ColumnsdenotefeaturesXi

•  Rowsdenotelabeledinstances•  Classlabeldenoteswhetheratennisgamewasplayed

hxi, yii

hxi, yii

Page 4: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

DecisionTree•  Apossibledecisiontreeforthedata:

•  Eachinternalnode:testoneaHributeXi

•  Eachbranchfromanode:selectsonevalueforXi•  Eachleafnode:predictY(or)

BasedonslidebyTomMitchell

p(Y | x 2 leaf)

Page 5: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

DecisionTree•  Apossibledecisiontreeforthedata:

•  WhatpredicIonwouldwemakefor<outlook=sunny,temperature=hot,humidity=high,wind=weak>?

BasedonslidebyTomMitchell

Page 6: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

DecisionTree•  IffeaturesareconInuous,internalnodescantestthevalueofafeatureagainstathreshold

6

Page 7: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

8

Problem Setting: •  Set of possible instances X

–  each instance x in X is a feature vector

–  e.g., <Humidity=low, Wind=weak, Outlook=rain, Temp=hot>

•  Unknown target function f : XY

–  Y is discrete valued

•  Set of function hypotheses H={ h | h : XY }

–  each hypothesis h is a decision tree

–  trees sorts x to leaf, which assigns y

Decision Tree Learning

Decision Tree Learning

Problem Setting: •  Set of possible instances X

–  each instance x in X is a feature vector

x = < x1, x2 … xn>

•  Unknown target function f : XY

–  Y is discrete valued

•  Set of function hypotheses H={ h | h : XY }

–  each hypothesis h is a decision tree

Input: •  Training examples {<x(i),y(i)>} of unknown target function f

Output: •  Hypothesis h ∈ H that best approximates target function f

DecisionTreeLearning

SlidebyTomMitchell

Page 8: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

Stagesof(Batch)MachineLearningGiven:labeledtrainingdata

•  Assumeseachwith

Trainthemodel: modelßclassifier.train(X,Y )

Applythemodeltonewdata:•  Given:newunlabeledinstance

ypredicIonßmodel.predict(x)

model

learner

X,Y

x ypredicIon

X,Y = {hxi, yii}ni=1

xi ⇠ D(X ) yi = ftarget(xi)

x ⇠ D(X )

Page 9: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

ExampleApplicaIon:ATreetoPredictCaesareanSecIonRisk

9

Decision Trees Suppose X = <X1,… Xn>

where Xi are boolean variables

How would you represent Y = X2 X5 ? Y = X2 ∨ X5

How would you represent X2 X5 ∨ X3X4(¬X1)

BasedonExamplebyTomMitchell

Page 10: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

DecisionTreeInducedParIIon

Color

Shape Size +

+ - Size

+ -

+ big

big small

small

round square

red green blue

Page 11: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

DecisionTree–DecisionBoundary•  Decisiontreesdividethefeaturespaceintoaxis-parallel(hyper-)rectangles

•  Eachrectangularregionislabeledwithonelabel–  oraprobabilitydistribuIonoverlabels

11

Decisionboundary

Page 12: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

Expressiveness•  DecisiontreescanrepresentanybooleanfuncIonoftheinputaHributes

•  Intheworstcase,thetreewillrequireexponenIallymanynodes

Truthtablerowàpathtoleaf

Page 13: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

ExpressivenessDecisiontreeshaveavariable-sizedhypothesisspace•  Asthe#nodes(ordepth)increases,thehypothesisspacegrows–  Depth1(“decisionstump”):canrepresentanybooleanfuncIonofonefeature

–  Depth2:anybooleanfnoftwofeatures;someinvolvingthreefeatures(e.g.,)

–  etc.(x1 ^ x2) _ (¬x1 ^ ¬x3)

BasedonslidebyPedroDomingos

Page 14: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

AnotherExample:RestaurantDomain(Russell&Norvig)

Model a patron’s decision of whether to wait for a table at a restaurant

~7,000possiblecases

Page 15: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

ADecisionTreefromIntrospecIon

Isthisthebestdecisiontree?

Page 16: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

Preferencebias:Ockham’sRazor•  PrinciplestatedbyWilliamofOckham(1285-1347)

–  “nonsuntmul0plicandaen0apraeternecessitatem” –  enIIesarenottobemulIpliedbeyondnecessity–  AKAOccam’sRazor,LawofEconomy,orLawofParsimony

•  Therefore,thesmallestdecisiontreethatcorrectlyclassifiesallofthetrainingexamplesisbest•  FindingtheprovablysmallestdecisiontreeisNP-hard•  ...SoinsteadofconstrucIngtheabsolutesmallesttreeconsistentwiththetrainingexamples,constructonethatispreHysmall

Idea:ThesimplestconsistentexplanaIonisthebest

Page 17: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

BasicAlgorithmforTop-DownInducIonofDecisionTrees

[ID3,C4.5byQuinlan]

node=rootofdecisiontreeMainloop:1.  Aßthe“best”decisionaHributeforthenextnode.2.  AssignAasdecisionaHributefornode.3.  ForeachvalueofA,createanewdescendantofnode.4.  Sorttrainingexamplestoleafnodes.5.  Iftrainingexamplesareperfectlyclassified,stop.

Else,recurseovernewleafnodes.

HowdowechoosewhichaHributeisbest?

Page 18: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

ChoosingtheBestAHributeKeyproblem:choosingwhichaHributetosplitagivensetofexamples

•  SomepossibiliIesare:–  Random:SelectanyaHributeatrandom–  Least-Values:ChoosetheaHributewiththesmallestnumberofpossiblevalues

– Most-Values:ChoosetheaHributewiththelargestnumberofpossiblevalues

– Max-Gain:ChoosetheaHributethathasthelargestexpectedinforma0ongain•  i.e.,aHributethatresultsinsmallestexpectedsizeofsubtreesrootedatitschildren

•  TheID3algorithmusestheMax-GainmethodofselecIngthebestaHribute

Page 19: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

ChoosinganAHributeIdea:agoodaHributesplitstheexamplesintosubsetsthatare(ideally)“allposiIve”or“allnegaIve”

WhichsplitismoreinformaIve:Patrons?orType?

BasedonSlidefromM.desJardins&T.Finin

Page 20: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

ID3-inducedDecisionTree

BasedonSlidefromM.desJardins&T.Finin

Page 21: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

ComparetheTwoDecisionTrees

BasedonSlidefromM.desJardins&T.Finin

Page 22: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

22

InformaIonGainWhichtestismoreinformaIve?

Split over whether Balance exceeds 50K

Over 50K Less or equal 50K Employed Unemployed

Split over whether applicant is employed

BasedonslidebyPedroDomingos

Page 23: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

23

Impurity/Entropy(informal)– Measuresthelevelofimpurityinagroupofexamples

InformaIonGain

BasedonslidebyPedroDomingos

Page 24: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

24

Impurity

Very impure group Less impure

Minimum impurity

BasedonslidebyPedroDomingos

Page 25: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

10

node = Root

[ID3, C4.5, Quinlan]

Entropy

Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)

Why? Information theory:

•  Most efficient code assigns -log2P(X=i) bits to encode

the message X=i

•  So, expected number of bits to code one random X is:

# of possible values for X

SlidebyTomMitchell

Entropy:acommonwaytomeasureimpurity

Page 26: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

10

node = Root

[ID3, C4.5, Quinlan]

Entropy

Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)

Why? Information theory:

•  Most efficient code assigns -log2P(X=i) bits to encode

the message X=i

•  So, expected number of bits to code one random X is:

# of possible values for X

SlidebyTomMitchell

Entropy:acommonwaytomeasureimpurity

Page 27: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

Example:Huffmancode•  In1952MITstudentDavidHuffmandevised,inthecourse

ofdoingahomeworkassignment,anelegantcodingschemewhichisopImalinthecasewhereallsymbols’probabiliIesareintegralpowersof1/2.

•  AHuffmancodecanbebuiltinthefollowingmanner:– Rankallsymbolsinorderofprobabilityofoccurrence– Successivelycombinethetwosymbolsofthelowestprobabilitytoformanewcompositesymbol;eventuallywewillbuildabinarytreewhereeachnodeistheprobabilityofallnodesbeneathit– Traceapathtoeachleaf,noIcingdirecIonateachnode

BasedonSlidefromM.desJardins&T.Finin

Page 28: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

HuffmancodeexampleMPA.125B.125C.25D.5

.5.5

1

.125.125

.25

A

C

B

D.25

0 1

0

0 1

1

M code length probA 000 3 0.125 0.375B 001 3 0.125 0.375C 01 2 0.250 0.500D 1 1 0.500 0.500

average message length 1.750

Ifweusethiscodetomanymessages(A,B,CorD)withthisprobabilitydistribuIon,then,overIme,theaveragebits/messageshouldapproach1.75

BasedonSlidefromM.desJardins&T.Finin

Page 29: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

30

2-ClassCases:

•  Whatistheentropyofagroupinwhichallexamplesbelongtothesameclass?–  entropy=-1log21=0

•  Whatistheentropyofagroupwith50%ineitherclass?–  entropy=-0.5log20.5–0.5log20.5=1

Minimum impurity

Maximum impurity

notagoodtrainingsetforlearning

goodtrainingsetforlearning

BasedonslidebyPedroDomingos

H(x) = �nX

i=1

P (x = i) log2 P (x = i)Entropy

Page 30: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

SampleEntropy

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 31: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

32

InformaIonGain•  WewanttodeterminewhichaHributeinagivensetoftrainingfeaturevectorsismostusefulfordiscriminaIngbetweentheclassestobelearned.

•  InformaIongaintellsushowimportantagivenaHributeofthefeaturevectorsis.

•  WewilluseittodecidetheorderingofaHributesinthenodesofadecisiontree.

BasedonslidebyPedroDomingos

Page 32: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

FromEntropytoInformaIonGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 33: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

FromEntropytoInformaIonGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 34: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

FromEntropytoInformaIonGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 35: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

FromEntropytoInformaIonGain

11

Sample Entropy

Entropy

Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

SlidebyTomMitchell

Page 36: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

InformaIonGain

12

Information Gain is the mutual information between

input attribute A and target variable Y

Information Gain is the expected reduction in entropy

of target variable Y for data sample S, due to sorting

on variable A

SlidebyTomMitchell

Page 37: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

38

CalculaIngInformaIonGain

996.03016log

3016

3014log

3014

22 =⎟⎠⎞⎜

⎝⎛ ⋅−⎟

⎠⎞⎜

⎝⎛ ⋅−=impurity

787.0174log

174

1713log

1713

22 =⎟⎠⎞⎜

⎝⎛ ⋅−⎟

⎠⎞⎜

⎝⎛ ⋅−=impurity

Entire population (30 instances) 17 instances

13 instances

(Weighted) Average Entropy of Children = 615.0391.03013787.0

3017 =⎟

⎠⎞⎜

⎝⎛ ⋅+⎟

⎠⎞⎜

⎝⎛ ⋅

Information Gain= 0.996 - 0.615 = 0.38

391.01312log

1312

131log

131

22 =⎟⎠⎞⎜

⎝⎛ ⋅−⎟

⎠⎞⎜

⎝⎛ ⋅−=impurity

InformaOonGain=entropy(parent)–[averageentropy(children)]

parententropy

childentropy

childentropy

BasedonslidebyPedroDomingos

Page 38: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

39

Entropy-BasedAutomaIcDecisionTreeConstrucIon

Node1Whatfeature

shouldbeused?

Whatvalues?

TrainingSetXx1=(f11,f12,…f1m)x2=(f21,f22,f2m)..xn=(fn1,f22,f2m)

QuinlansuggestedinformaIongaininhisID3systemandlaterthegainraIo,bothbasedonentropy.

BasedonslidebyPedroDomingos

Page 39: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

40

UsingInformaIonGaintoConstructaDecisionTree

AHributeA

v1 vkv2

FullTrainingSetX

SetX�

repeatrecursivelyIllwhen?

DisadvantageofinformaIongain:•  ItprefersaHributeswithlargenumberofvaluesthatsplit

thedataintosmall,puresubsets•  Quinlan’sgainraIousesnormalizaIontoimprovethis

X�={x�X|value(A)=v1}

ChoosetheaHributeAwithhighestinformaIongainforthefulltrainingsetattherootofthetree.

ConstructchildnodesforeachvalueofA.EachhasanassociatedsubsetofvectorsinwhichAhasaparIcularvalue.

BasedonslidebyPedroDomingos

Page 40: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

12

Information Gain is the mutual information between

input attribute A and target variable Y

Information Gain is the expected reduction in entropy

of target variable Y for data sample S, due to sorting

on variable A

Page 41: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

13

SlidebyTomMitchell

Page 42: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

13

SlidebyTomMitchell

Page 43: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

13

SlidebyTomMitchell

Page 44: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

DecisionTreeApplet

hHp://webdocs.cs.ualberta.ca/~aixplore/learning/DecisionTrees/Applet/DecisionTreeApplet.html

Page 45: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

14

Decision Tree Learning Applet

•  http://www.cs.ualberta.ca/%7Eaixplore/learning/

DecisionTrees/Applet/DecisionTreeApplet.html

Which Tree Should We Output?

•  ID3 performs heuristic

search through space of

decision trees

•  It stops at smallest

acceptable tree. Why?

Occam’s razor: prefer the simplest hypothesis that fits the data

SlidebyTomMitchell

Page 46: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

The ID3 algorithm builds a decision tree, given a set of non-categorical attributes C1, C2, .., Cn, the class attribute C, and a training set T of records

function ID3(R:input attributes, C:class attribute, S:training set) returns decision tree;

If S is empty, return single node with value Failure;

If every example in S has same value for C, return single node with that value;

If R is empty, then return a single node with most frequent of the values of C found in examples S; # causes errors -- improperly classified record

Let D be attribute with largest Gain(D,S) among R;

Let {dj| j=1,2, .., m} be values of attribute D;

Let {Sj| j=1,2, .., m} be subsets of S consisting of records with value dj for attribute D;

Return tree with root labeled D and arcs labeled d1..dm going to the trees ID3(R-{D},C,S1). . . ID3(R-{D},C,Sm);

BasedonSlidefromM.desJardins&T.Finin

Page 47: Decision Trees - Penn Engineeringcis519/fall2017/lectures/02_DecisionT… · Decision trees have a variable-sized hypothesis space • As the #nodes (or depth) increases, the hypothesis

Howwelldoesitwork?Manycasestudieshaveshownthatdecisiontreesareatleastasaccurateashumanexperts.– Astudyfordiagnosingbreastcancerhadhumanscorrectlyclassifyingtheexamples65%oftheIme;thedecisiontreeclassified72%correct– BriIshPetroleumdesignedadecisiontreeforgas-oilseparaIonforoffshoreoilpla}ormsthatreplacedanearlierrule-basedexpertsystem– Cessnadesignedanairplaneflightcontrollerusing90,000examplesand20aHributesperexample

BasedonSlidefromM.desJardins&T.Finin


Recommended