+ All Categories
Home > Documents > Presentation Title Department of Computer Science A More Principled Approach to Machine Learning...

Presentation Title Department of Computer Science A More Principled Approach to Machine Learning...

Date post: 05-Jan-2016
Category:
Upload: doris-cook
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
46
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of Computer Science 2 February 2015
Transcript
Page 1: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

A More Principled Approach to Machine Learning

Michael R. SmithBrigham Young UniversityDepartment of Computer Science2 February 2015

Page 2: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Machine Learning Learn from past

experience

Change their behavior without explicitly being programed

Optimization techniquesMaximize accuracyMinimize error

Mine data

2

Page 3: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Machine Learning Example I, Robot

3

Page 4: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Machine Learning

4

Page 5: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

5

Machine Learning

Weight Height Blood Press

Temp

205 78 good 98.2

157 65 bad 100.7

185 71 mod 99.5

Learning Algorithm

Training Data

Weight Height Blood Press

Temp

172 67 bad 100.1

Has Disease

yes

yes

no

Has Disease

?

Page 6: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

6

Machine Learning

Learning Algorithm )

Training Data ()

Hyper-parameters ()

Hypothesis/model ()

Page 7: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

7

Machine LearningWeight Height Blood

PressTemp Has

Disease

205 78 good 98.2 Yes

157 65 bad 100.7 Yes

185 71 mod 99.5 No

Learning Algorithm ()

Training Data ()

Weight Height Blood Press

Temp

172 67 bad 100.1

Has Disease

?

Hyper-parameters ()

Data Set

# Features

# Classes

Entropy

… # Nodes

Learning Rate

… Accuracy

Disease

4 2 0.24 … 3 0.1 … 83.4

Iris 4 3 0.76 … 7 0.2 … 97.4Meta-data

Page 8: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

8

Meta-Learning

Learning Algorithm ()Meta-

Data ()

Data Set

# Features

# Classes

Entropy

… # Nodes

Learning Rate

… Accuracy

Disease

4 2 0.24 … 3 0.1 … 83.4

Iris 4 3 0.76 … 7 0.2 … 97.4

Data Set

# Features

# Classes

Entropy

Ecology 17 3 0.5 …

, , …

Meta-features

Page 9: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Meta-Learning Learning how to learn

Learn from previous experimentsData-driven approach

Given a data set, automatically:Preprocess the data ()

Select features Create new features Select/discard instances

Select a learning algorithm Set the hyper-parameters for

9

Page 10: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Meta-Learning Difficulties

Large space of possibilities Can have an infinite number of choices for Large search space for the data (instances, features, etc.)

Cannot select until is chosenThe performance of is dependent on and Meta-features are not predictive of performanceGetting data is computationally expensive

10

Page 11: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Previous Work Model Selection

Predict with fixed

Hyper-parameter SelectionPredict for a given

Hyper-parameter OptimizationSearch the space of

Grid Search Bayesian Optimization Search

No learning from previous experiments

OpenML.orgStore results from previous experiments 11

Random Search

Page 12: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Instance Hardness Learning algorithms are generally evaluated at the

data set level

Are some instances intrinsically hard to classify?Why are some instances misclassified?Are there instances which are misclassified that should not be?

Are some instances misclassified by all learning algorithms? If so, why?

12

Page 13: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Data Set

13

Page 14: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Overfit

14

Page 15: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

15

Linear Classifier

Page 16: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

16

Detrimental Instances

Page 17: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Instance Hardness Better intuition of learning algorithms and why

instances are misclassifiedCan learning algorithms be improved? Where?

Informed analysis of learning algorithm performance Is the classification reasonable?

Where can the quality of the data be improved

Empirical analysis of the classification of 57 data sets by 9 learning algorithms10-fold cross-validation178,109 instances5,310 models were created 17

Page 18: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Instance Hardness Measure how difficult and instance is to classify

correctly

18

Page 19: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Instance Hardness

19

9 learning algorithmsC4.5MLPRIPPERNNgeRidor

Unsupervised Meta-learningCluster learning algorithms based on diversity Intuition for all of the algorithms in the cluster

5NNRandom ForestLWLNaïve Bayes

Page 20: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Existence of Instance Hardness

20

53% correctly classified by all algorithms

5% misclassified by all algorithms

Learning algorithms disagree on 42% of the instances

15% misclassified by the majority of algorithms

Page 21: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

21

Modeling Detrimental Instances

𝒙

𝒚

Each instance is composed of: – the input features – the true unobserved

class label – the observed class label

True class label is generally ignoredRegularizationValidation setsPruning

Page 22: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

22

Modeling Detrimental Instances

𝒚

𝒙 𝒙

𝒚

𝑝 ( �̂�|𝑥 )𝑝 (𝑥) 𝑝 ( �̂�|𝑥 , 𝑦 )𝑝 (𝑦∨𝑥 )𝑝 (𝑥)

How can the true class label be taken into account?FilteringData polishingSpecific learning algorithm

Boosting

Weight by

Page 23: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Instance Quality Learning

23

Incorporate the quality into the learning processMaximize the quality of the data rather than just the data

Detrimental instances should have less of an effect on

Inequality LearningDo not treat all of the instances “equally”

Page 24: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

24

Inequality Learning

Page 25: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

25

0.00019

0.678

0.054

Inequality Learning

Page 26: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Results: Original

MLP C4.5 5-NN LWL NB Nnge RandF Ridor Rip

Orig 80.7 80.1 79 69.4 75.7 79.4 81.6 76.6 77.8

QW-L 83.8 80.1 80 70.4 77.2 79.4 83.3 78.6 79.7

p-val < 0.001 0.045 0.015 0.014 < 0.001 0.788 < 0.001 0.036 < 0.001

g,e,l 47,0,5 32,0,20 35,1,16 28,10,14 35,1,16 20,1,27 33,1,18 31,1,19 38,0,14

QW-B 84.6 82.3 80.3 68.2 75.2 79.4 83.5 78.6 78.8

p-val < 0.001 < 0.001 0.016 0.590 0.858 0.877 < 0.001 0.013 < 0.001

g,e,l 49,0,3 37,1,14 32,0,20 22,12,18 19,1,32 21,1,26 32,2,18 34,1,16 37,3,12

Filter 82.9 81.8 82.3 70.0 77.3 82.4 83.2 79.5 79.7

p-val < 0.001 < 0.001 < 0.001 0.032 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001

g,e,l 39,0,13 38,3,11 38,4,10 26,12,14 36,1,15 40,0,12 33,1,18 35,3,14 40,2,10

26

Page 27: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Results: Original

27

MLP C4.5 5-NN LWL NB Nnge RandF Ridor Rip

Orig 80.7 80.1 79 69.4 75.7 79.4 81.6 76.6 77.8

QW-L 83.8 80.1 80 70.4 77.2 79.4 83.3 78.6 79.7

p-val < 0.001 0.045 0.015 0.014 < 0.001 0.788 < 0.001 0.036 < 0.001

g,e,l 47,0,5 32,0,20 35,1,16 28,10,14 35,1,16 20,1,27 33,1,18 31,1,19 38,0,14

QW-B 84.6 82.3 80.3 68.2 75.2 79.4 83.5 78.6 78.8

p-val < 0.001 < 0.001 0.016 0.590 0.858 0.877 < 0.001 0.013 < 0.001

g,e,l 49,0,3 37,1,14 32,0,20 22,12,18 19,1,32 21,1,26 32,2,18 34,1,16 37,3,12

Filter 82.9 81.8 82.3 70.0 77.3 82.4 83.2 79.5 79.7

p-val < 0.001 < 0.001 < 0.001 0.032 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001

g,e,l 39,0,13 38,3,11 38,4,10 26,12,14 36,1,15 40,0,12 33,1,18 35,3,14 40,2,10

Page 28: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Inequality Learning

28

Increases the accuracy for all of the investigated learning algorithmsAdvantage to using a continuous value rather than binary

Most effective in global learning algorithms such as backpropagationCould be a side effect of how we integrated instance quality

into the learning algorithm. (Future Work)

Focusing on the data, how does it compare with hyper-parameter optimization (HPO)?

Page 29: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Comparison of HPO and Filtering HPO and filtering can both be expensive, which has

the greatest benefitStandard Approach

Perspective for current state Based off the performance on a validation set

Optimistic Approach Perspective for the potential of a technique Based off the performance on 10-fold cross-validation

HPO Filtering

29

Page 30: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

K-Fold Cross-Validation Create K partitions of the data set

For each partition, use as testing and remaining K-1 partitions for training

30

Page 31: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

K-Fold Cross-Validation Use a validation set to determine which set of hyper-

parameters to use

31

Validation examples

Page 32: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Experimental Methodology

32

Hyper-parameter optimizationBayesian Optimization (more than 512 hyper-parameter

settings explored for most learning algorithms) Standard uses the accuracy on a validation set Optimistic uses the 10-fold cross-validation accuracy

FilteringEnsemble Filter (L-Filter)

Removes instances that are misclassified by the majority of a set of learning algorithms

Adaptive Filter (A-Filter) Greedy search among candidate learning algorithms

Page 33: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Results-Standard Approach

MLP C4.5 kNN NB RF RIP75

77

79

81

83

85

87

89

91

93

Orig L-Filter HPO

Accu

racy

VS Orig

L-Filter

HPO

MLP 44,1,7 47,0,5

C4.5 45,1,6 39,0,13

kNN 44,2,6 41,2,9

NB 42,0,10

42,1,9

RF 38,3,11

37,2,13

RIP 50,0,2 47,1,4

Page 34: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Results-Optimistic Approach

34

MLP C4.5 kNN NB RF RIP75

77

79

81

83

85

87

89

91

93

HPO L-Filter A-Filter

Accu

racy

Not one filtering approach is best for all data sets and learning algorithms

VS HPO

L-Filter

A-Filter

MLP 27,3,22

45,0,7

C4.5 33,4,15

48,2,2

kNN 30,2,20

51,0,1

NB 22,2,28

34,0,18

RF 27,1,24

46,0,6

RIP 34,1,17

48,0,4

Page 35: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Why does filtering have such a significant effect? Recall: Maximize the probability of the hypothesis

given the data

At the instance-level:

𝑝 (h ∣𝐷 )=𝑝 (𝐷∣h )𝑝 (h )𝑃 (𝐷 )

𝑝 (h ∣𝐷 )=∏𝑖

∣𝐷 ∣

𝑝 ( ⟨𝑥𝑖 , 𝑦 𝑖 ⟩ ∣h)𝑝 (h )

𝑃 (𝐷 )

35

Page 36: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

36

Example Data Set

Page 37: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

A Need for Better Understanding Filter has a much higher potential than HPO

No principled examination

37

HPO Pros

Significantly increases acc One pass Uses all of the instances

Exceptions VS noise

Cons Uses all of the instances

Noisy instances are used to induce

FilteringPros

Significantly increases acc Noisy instances are not used

to induce

Cons Requires multiple passes

through the training set Find noisy instances Train the learning algorithm

Can remove good instances

Page 38: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

The Need for a Repository

38

Page 39: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

The Need for a Repository

39

Page 40: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

The Need for a Repository

40

Page 41: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Benefits of a Repository Better science

Reproducible/saved resultsSave time

Build reputationEasier to compare with other work

Gives a snapshot of current stateOverallSpecific data set

Meta-learningProvide data set

41

Page 42: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Machine Learning Results Repository

42

Page 43: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Machine Learning Results Repository

43

Data Set-Level

Learning Algorithm -Level

Instance-Level

Page 44: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Future Directions and Projects MLRR

Data qualityLinking with papersCreating user profilesAnonymous postings for supplemental material

Meta-learningCombine learning with optimization techniquesMeta-features

Deep learning Collaborative filtering

Automate machine learning44

Page 45: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Future Directions and Projects Incorporate information into the learning process

Use cases of machine learningHow is machine learning actually used?How can it be made easier to use?

Collaboration/application to other fieldsBioinformaticsSocial mediaSports statistics

45

Page 46: Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Presentation TitleDepartment of Computer Science

Thank you


Recommended