+ All Categories
Home > Documents > T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a...

T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a...

Date post: 15-Dec-2015
Category:
Upload: kory-salter
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
40
TAMING THE LEARNING ZOO
Transcript
Page 1: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

TAMING THE LEARNING ZOO

Page 2: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

2

SUPERVISED LEARNING ZOO

Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori

Classification Decision trees (discrete attributes, few relevant) Support vector machines (continuous attributes)

Regression Least squares (known structure, easy to interpret) Neural nets (unknown structure, hard to interpret)

Nonparametric approaches k-Nearest-Neighbors Locally-weighted averaging / regression

Page 3: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

VERY APPROXIMATE “CHEAT-SHEET” FOR TECHNIQUES DISCUSSED IN CLASS

Task Attributes N scalability D scalability

Capacity

Bayes nets C D Good Good Good

Naïve Bayes C D Excellent Excellent Low

Decision trees C D,C Excellent Excellent Fair

Linear least squares

R C Excellent Excellent Low

Nonlinear LS R C Poor Poor Good

Neural nets R C Poor Good Good

SVMs C C Good Good Good

Nearest neighbors

C D,C L:E, E:P Poor Excellent*

Locally-weighted averaging

R C L:E, E:P Poor Excellent*

Boosting C D,C ? ? Excellent*

Page 4: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

VERY APPROXIMATE “CHEAT-SHEET” FOR TECHNIQUES DISCUSSED IN CLASS

Task Attributes N scalability D scalability

Capacity

Bayes nets C D Good Good Good

Naïve Bayes C D Excellent Excellent Low

Decision trees C D,C Excellent Excellent Fair

Linear least squares

R C Excellent Excellent Low

Nonlinear LS R C Poor Poor Good

Neural nets R C Poor Good Good

SVMs C C Good Good Good

Nearest neighbors

C D,C L:E, E:P Poor Excellent*

Locally-weighted averaging

R C Good Poor Excellent*

Boosting C D,C ? ? Excellent*

Note: we have looked at a limited subset of existing techniques in this class (typically, the “classical” versions).

Most techniques extend to:• Both C/R tasks (e.g., support vector regression)• Both continuous and discrete attributes• Better scalability for certain types of problem

With “sufficiently large” data sets

With “sufficiently diverse” weak leaners

Page 5: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

AGENDA

Quantifying learner performance Cross validation Error vs. loss Precision & recall

Model selection

Page 6: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

CROSS-VALIDATION

Page 7: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

ASSESSING PERFORMANCE OF A LEARNING ALGORITHM Samples from X are typically unavailable Take out some of the training set

Train on the remaining training set Test on the excluded instances Cross-validation

Page 8: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

CROSS-VALIDATION

Split original set of examples, train

+

+

+

+

++

+

-

-

-

--

-

+

+

+

+

+

-

-

-

-

-

-Hypothesis space H

Train

Examples D

Page 9: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

CROSS-VALIDATION

Evaluate hypothesis on testing set

+

+

+

+

++

+

-

-

-

--

-

Hypothesis space H

Testing set

Page 10: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

CROSS-VALIDATION

Evaluate hypothesis on testing set

Hypothesis space H

Testing set

++

++

+

--

-

-

-

-

++

Test

Page 11: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

CROSS-VALIDATION

Compare true concept against prediction

+

+

+

+

++

+

-

-

-

--

-

Hypothesis space H

Testing set

++

++

+

--

-

-

-

-

++

9/13 correct

Page 12: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

COMMON SPLITTING STRATEGIES

k-fold cross-validation

Train Test

Dataset

Page 13: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

COMMON SPLITTING STRATEGIES

k-fold cross-validation

Leave-one-out (n-fold cross validation)

Train Test

Dataset

Page 14: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

COMPUTATIONAL COMPLEXITY

k-fold cross validation requires k training steps on n(k-1)/k datapoints k testing steps on n/k datapoints (There are efficient ways of computing L.O.O.

estimates for some nonparametric techniques, e.g. Nearest Neighbors)

Average results reported

Page 15: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

BOOTSTRAPPING

Similar technique for estimating the confidence in the model parameters

Procedure:1. Draw k hypothetical datasets from original

data. Either via cross validation or sampling with replacement.

2. Fit the model for each dataset to compute parameters k

3. Return the standard deviation of 1,…,k (or a confidence interval)

Can also estimate confidence in a prediction y=f(x)

Page 16: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

SIMPLE EXAMPLE: AVERAGE OF N NUMBERS Data D={x(1),…,x(N)}, model is constant Learning: minimize E() = i(x(i)-)2 => compute

average Repeat for j=1,…,k :

Randomly sample subset x(1)’,…,x(N)’ from D Learn j = 1/N i x(i)’

Return histogram of 1,…,j

10 100 1000 100000.44

0.46

0.48

0.5

0.52

0.54

0.56

AverageLower rangeUpper range

|Data set|

Page 17: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

17

BEYOND ERROR RATES

Page 18: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

BEYOND ERROR RATE Predicting security risk

Predicting “low risk” for a terrorist, is far worse than predicting “high risk” for an innocent bystander (but maybe not 5 million of them)

Searching for images Returning irrelevant images is

worse than omitting relevant ones

18

Page 19: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

BIASED SAMPLE SETS

Often there are orders of magnitude more negative examples than positive

E.g., all images of Kris on Facebook If I classify all images as “not Kris” I’ll have

>99.99% accuracy

Examples of Kris should count much more than non-Kris!

Page 20: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

FALSE POSITIVES

20x1

x2

True concept Learned concept

Page 21: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

FALSE POSITIVES

21x1

x2

True concept Learned concept

New query

An example incorrectly predicted

to be positive

Page 22: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

FALSE NEGATIVES

22x1

x2

True concept Learned concept

New query

An example incorrectly predicted

to be negative

Page 23: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

PRECISION VS. RECALL

Precision # of relevant documents retrieved / # of total

documents retrieved Recall

# of relevant documents retrieved / # of total relevant documents

Numbers between 0 and 1

23

Page 24: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

PRECISION VS. RECALL

Precision # of true positives / (# true positives + # false

positives) Recall

# of true positives / (# true positives + # false negatives)

A precise classifier is selective A classifier with high recall is inclusive

24

Page 25: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

REDUCING FALSE POSITIVE RATE

25x1

x2

True concept Learned concept

Page 26: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

REDUCING FALSE NEGATIVE RATE

26x1

x2

True concept Learned concept

Page 27: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

PRECISION-RECALL CURVES

27

Precision

Recall

Measure Precision vs Recall as the classification boundary is tuned

Perfect classifier

Actual performance

Page 28: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

PRECISION-RECALL CURVES

28

Precision

Recall

Measure Precision vs Recall as the classification boundary is tuned

Penalize false negatives

Penalize false positives

Equal weight

Page 29: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

PRECISION-RECALL CURVES

29

Precision

Recall

Measure Precision vs Recall as the classification boundary is tuned

Page 30: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

PRECISION-RECALL CURVES

30

Precision

Recall

Measure Precision vs Recall as the classification boundary is tuned

Better learningperformance

Page 31: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

OPTION 1: CLASSIFICATION THRESHOLDS Many learning algorithms (e.g., linear

models, NNets, BNs, SVM) give real-valued output v(x) that needs thresholding for classification

v(x) > t => positive label given to xv(x) < t => negative label given to x

May want to tune threshold to get fewer false positives or false negatives

31

Page 32: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

OPTION 2: LOSS FUNCTIONS & WEIGHTED DATASETS

General learning problem: “Given data D and loss function L, find the hypothesis from hypothesis class H that minimizes L”

Loss functions: L may contain weights to favor accuracy on positive or negative examples E.g., L = 10 E+

+ 1 E-

Weighted datasets: attach a weight w to each example to indicate how important it is Or construct a resampled dataset D’ where each

example is duplicated proportionally to its w

Page 33: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

MODEL SELECTION

Page 34: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

COMPLEXITY VS. GOODNESS OF FIT

More complex models can fit the data better, but can overfit

Model selection: enumerate several possible hypothesis classes of increasing complexity, stop when cross-validated error levels off

Regularization: explicitly define a metric of complexity and penalize it in addition to loss

Page 35: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

MODEL SELECTION WITH K-FOLD CROSS-VALIDATION

Parameterize learner by a complexity level C Model selection pseudocode:

For increasing levels of complexity C: errT[C],errV[C] = Cross-Validate(Learner,C,examples) If errT has converged,

Find value Cbest that minimizes errV[C] Return Learner(Cbest,examples)

Page 36: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

REGULARIZATION

Minimize: Cost(h) = Loss(h) + Complexity(h)

Example with linear models y = Tx: L2 error: Loss() = i (y(i)-Tx(i))2

Lq regularization: Complexity(): j |j|q

L2 and L1 are most popular in linear regularization

L2 regularization leads to simple computation of optimal

L1 is more complex to optimize, but produces sparse models in which many coefficients are 0!

Page 37: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

DATA DREDGING

As the number of attributes increases, the likelihood of a learner to pick up on patterns that arise purely from chance increases

In the extreme case where there are more attributes than datapoints (e.g., pixels in a video), even very simple hypothesis classes can overfit E.g., linear classifiers

Many opportunities for charlatans in the big data age!

Page 38: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

38

OTHER TOPICS IN MACHINE LEARNING

Unsupervised learning Dimensionality reduction Clustering

Reinforcement learning Agent that acts and learns how to act in an

environment by observing rewards Learning from demonstration

Agent that learns how to act in an environment by observing demonstrations from an expert

Page 39: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

ISSUES IN PRACTICE

The distinctions between learning algorithms diminish when you have a lot of data

The web has made it much easier to gather large scale datasets than in early days of ML

Understanding data with many more attributes than examples is still a major challenge! Do humans just have really great priors?

Page 40: T AMING THE L EARNING Z OO. S UPERVISED L EARNING Z OO Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori.

NEXT LECTURES

Temporal sequence models (R&N 15) Decision-theoretic planning Reinforcement learning Applications of AI


Recommended