Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’...

Lecture 1, Slide 1

Today’s Topics

• Dealing with Noise

• Overfitting (the key issue in all of ML)

• A ‘Greedy’ Algorithm for Pruning D-Trees

• Generating IF-THEN Rules from D-Trees

• Rule Pruning

9/22/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 6, Week 3

CS 540 - Fall 2015 (© Jude Shavlik), Lecture 6, Week 2

Noise: Major Issue in ML

Worst Case of Noise

+, - at same point in feature space

Causes of Noise

1. Too few features (“hidden variables”) or too few possible values

2. Incorrectly reported/measured/judged feature values

3. Mis-classified instances

9/22/15 2


Noise - Major Issue in ML (cont.)

Overfitting

Producing an ‘awkward’ concept because of a few ‘noisy’ points

-

+ + + + - +

- -

- - -

+ + + + - +

- -

- -

Bad performance on future ex’s? Better performance?

9/22/15 3


Overfitting Viewed in Terms of Function-Fitting (can exactly fit N points with an N-1 degree polynomial)

Data = Red Line + Noise Model

f(x)

x

+ + +

+ + + + + +

+ + +

+

+

Underfitting?

Overfitting?

9/22/15 4


Definition of Overfitting

Assuming large enough test set so that it is representative, concept C overfit the training data if there exists a simpler concept S so that

but

>

<

Training set accuracy of

C

Training set accuracy of

S

Test set accuracy of

C

Test set accuracy of

S

9/22/15 5


Remember!

• It is easy to learn/fit the training data

• What’s hard is generalizing well to future (‘test set’) data!

• Overfitting avoidance (reduction, really) is the key issue in ML

• Easy to think ‘spurious correlations’ are meaningful signals

9/22/15 6

See a Pattern?

9/22/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 6, Week 2 Lecture 1, Slide 7

The first 10 digits of Pi: 3.14159265

What comes next in Pi?

3 (already used)

After that? 5

“35” rounds to “4” (in fractional part of number)

“4” has since been added!

Picture taken (by me) June 2015 in Lambeau Field Atrium, Green Bay, WI

Presumably a ‘spurious correlation’


Can One Underfit?

• Sure, if not fully fitting the training set Eg, just return majority category

(+ or -) in the trainset as the learned model

• But also if not enough data to illustrate important distinctions

Eg, color may be important, but all examples seen are red, so no reason to include color and make more complex model

9/22/15 8


Overfitting + Noise

Using the strict definition of overfitting presented earlier, is it possible to overfit noise-free data?

(Remember: overfitting the key ML issue, not just a decision-tree topic)

9/22/15 9


Example of Overfitting Noise-Free Data

Let – Correct concept = A B– Feature C be true 50% of the time,

for both + and – examples– Prob(pos example) = 0.66– Training set

+: A B C D E, A B C ¬D E, A B C D ¬E

-: A ¬B ¬C D ¬E, ¬A B ¬C ¬D E

9/22/15 10


Example (concluded)

Tree Trainset Accuracy TestSet Accuracy

100% 50%

Pruned

60% 66%

C

+ -

FT

+

9/22/15 11


ID3 & Noisy Data

To avoid overfitting, could allow splitting to stop before all ex’s are of one class

– Early stopping was Quinlan’s original ideaStop if further splitting not justified by a statistical test(just skim text’s material on the 2 test)

– But post-pruning now seen as better

More robust to weaknesses of greedy algo’s(eg, post-pruning benefits from seeing the full tree; a node may look bad when building tree, but not in hindsight)

9/22/15 12


ID3 & Noisy Data (cont.)

Recap: Build complete tree, then use some ‘spare’ (tuning) examples to decide which parts of tree can be pruned

- called Reduced [tuneset] Error Pruning

9/22/15 13


ID3 & Noisy Data (cont.)

• See which dropped subtree leads tohighest tune-set accuracy

• Repeat (ie, another greedy algo)

Better tuneset accuracy?

discard?

9/22/15 14


Greedily Pruning D-Trees

Sample (Hill Climbing) Search Space

best

Stop here if node’s best child is not an improvement

9/15/15

Note in pruning we’re reversing the tree-building process

15


Greedily Pruning D-trees - Pseudocode

1. Run ID3 to fully fit TRAIN’ Set, measure accuracy on TUNE

2. Consider all subtrees where ONE interior node removed and replaced by leaf

- label with majority category in pruned subtree IF progress on TUNE choose best subtree ELSE (ie, if no improvement) quit

3. Go to 2

+

9/22/15 16


Train/Tune/Test Accuracies (same sort of curves for other tuned param’s in other algo’s)

100%

Acc

urac

y

Tune

Test

Train

Ideal tree to choose

Chosen pruned tree Amount of Pruning

9/22/15 17


The General Tradeoff in Greedy Algorithms (more later)

Efficiency vs. Optimality

R

A B

CD

FE

Initial Tree

Assume True Best Cuts

Discard C’s & F’s subtrees

Single Best Cut

Discard B’s subtrees - irrevocable

Greedy Search: Powerful, General-

Purpose, Trick–of-the-Trade

9/22/15 18


Generating IF-THENRules from Trees

• Antecedent: Conjunction of all decisions leading to terminal node

• Consequent: Label of terminal node

RedCOLOR ?

SIZE ?

Blue

Big Small+ -

+

Green

-

19


Generating Rules (cont)

Previous slide’s tree generates these rules

If Color=Green Output = -

If Color=Blue Output = +

If Color=Red and Size=Big +

If Color=Red and Size=Small -

Note

1. Can ‘clean up’ the rule set (next slide)

2. Decision trees learn disjunctive concepts

20


Rule Post-Pruning(Another Greedy Algorithm)

1. Induce a decision tree

2. Convert to rules (see earlier slide)

3. Consider dropping any one rule antecedent

– Delete the one that improves tuning set accuracy the most

– Repeat as long as progress being made

21


Rule Post-Pruning (cont)

Advantages– Allows an intermediate node to be pruned

from some rules but retained in others

– Can correct poor early decisions in tree construction

– Final concept more understandable

Also applicable to ML algo’s that directly learn rules (eg, ILP, MLNs)

22

But note that the final rules will overlap one another – so need a ‘conflict resolution’ scheme


Training with Noisy Data If we can clean up the training data, should we do so?

– No (assuming one can’t clean up the testing data when the learned concept will be used)

– Better to train with the same type of data as will be experienced when the result of learning is put into use

– Recall hadBankcruptcy was best indicator of “good candidate for credit card” story!

9/22/15 23


Aside:A Rose by Any Other Name …

Tuning sets also called

– Pruning sets (in d-tree algorithms)

– Validation sets (in general),but sometimes in the literature(eg, stats community) AI’s test setscalled validation (and AI’s tuning sets called test sets!)

9/22/15 24

Date post:	02-Jan-2016
Category:	Documents
Upload:	oswin-weaver
View:	213 times
Download:	0 times

Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’...

Documents