+ All Categories
Home > Documents > Decision Tree Pruning. Problem Statement We like to output small decision tree Model Selection The...

Decision Tree Pruning. Problem Statement We like to output small decision tree Model Selection The...

Date post: 21-Dec-2015
Category:
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Decision Tree Pruning
Transcript

Decision Tree Pruning

Problem Statement

• We like to output small decision tree Model Selection

• The building is done until zero training error

• Option I : Stop Early Small decrease in index function Cons: may miss structure

• Option 2: Prune after building.

Pruning

• Input: tree T

• Sample: S

• Output: Tree T’

• Basic Pruning: T’ is a sub-tree of T Can only replace inner nodes by leaves

• More advanced: Replace an inner node by one of its children

Reduced Error Pruning

• Split the sample to two part S1 and S2

• Use S1 to build a tree.

• Use S2 to sample whether to prune.

• Process every inner node v After all its children has been process Compute the observed error of Tv and leaf(v)

If leaf(v) has less errors replace Tv by leaf(v)

Reduced Error Pruning: Example

Pruning: CV & SRM

• Generate for each pruning size compute the minimal error pruning At most m different sub-trees

• Select between the prunings Cross Validation Structural Risk Minimization Any other index method

Finding the minimum pruning

• Procedure Compute

• Inputs: k : number of errors T : tree S : sample

• Output: P : pruned tree size : size of P

Procedure compute

• IF IsLeaf(T) IF Errors(T) k

• THEN size=1

• ELSE size = P=T; return;

• IF Errors(root(T)) k size=1; P=root(T); return;

Procedure compute

• For i = 0 to k DO Call Compute(i, T[0], S0, sizei,0,Pi.0)

Call Compute(k-i, T[1], S1, sizei,1,Pi.1)

• size = minimum {sizei,0 + sizei,1 +1}

• I = arg min {sizei,0 + sizei,1 +1}

• P = MakeTree(root(T),PI,0, PI,1}

• What is the time complexity?

Cross Validation

• Split the sample S1 and S2

• Build a tree using S1

• Compute the candidate pruning

• Select using S2

• Output the tree with smallest error on S2

SRM

• Build a Tree T using S

• Compute the candidate pruning

• kd the size of the pruning with d errors

• Select using the SRM formula

})({minm

kTobs d

dd

Drawbacks

• Running time Since |T| = O(m) Running time O(m2) Many passes over the data

• Significant drawback for large data sets

Linear Time Pruning

• Single Bottom-up pass linear time

• Use SRM like formula Local soundness

• Competitiveness to any pruning

Algorithm

• Process a node after processing its children

• Local parameters: Tv current sub-tree at v, of size sizev

Sv sample reaching v, of size mv

lv length of path leading to v

• Local Test: obs(Tv,Sv) + a(mv,sizev,lv,) > obs (root(Tv),Sv)

obs

The function a()

• Parameters: paths(H,l) set of paths of length l over H. trees(H,s) set of trees of size s over H.

• Formula:

m

msizeHtreeslHpathsclsizema

)/log(|,(|log|),(|log),,,(

The function a()

• Finite Class H |paths(H,l)| < |H|l. |trees(H,s)| < (4|H|)s.

• Formula:

• Infinite Classes: VC-dim

m

mHsizelclsizema

)/log(||log)(),,,(

Example

lv =3

sizev

mv

m

a(mv,sizev,lv,)

Local uniform convergence

• Sample S Sc = { x S |c(x)=1}, mc=|Sc|

• Finite classes C and H e(h|c) = Pr[ h(x) f(x) | c(x)=1 ] obs(h|c)

• Lemma: with probability 1-

cm

HCchobsche

)/1log(||log||log|)|()|(|

Global Analysis

• Notation T original tree (depends on S) T* pruned tree Topt optimal tree

rv= (lv+sizev)log|H| +log (mv/)

a(mv,sizev,lv,) = O( sqrt{ rv/mv }) 1

Sub-Tree Property

• Lemma: with probability 1-T* is a sub-tree of Topt

• Proof: Assume the all the local lemmas hold. Each pruning reduces the error. Assume T* has a subtree outside Topt

Adding that subtree to Topt will improve it!

Comparing T* and Topt

• Additional pruned nodes: V={v1, … , vt}

• Additional error: e(T*) - e(Topt) = (e(vi)-e(T*

vi))Pr[vi]

• Claim: With high probability

4]Pr[4)()(1

*t

i v

viopt

i

i

m

rvTeTe

Analysis

• Lemma: With probability 1- If Pr[vi] > 12(lopt log |H|+ log 1/ )/m =b

THEN Pr[vi] > 2obs(vi)

• Proof: Relative Chernoff Bound. Union over |H|l paths.

• V’ = {vi V | Pr[vi]>b}

Analysis of

• Sum over V-V’ bounded by sopt b

''

]Pr[]Pr[VVv v

vi

Vv v

vi

i i

i

i i

i

m

rv

m

rv

Analysis of

• Sum of mv < loptsizeopt

• Sum of rv <sizeopt(lopt log |H|+ log m/ )

• Putting it all together

))((2

]Pr['''

Vv

vVv

vVv v

vi

i

i

i

i

i i

i mrmm

rv


Recommended