Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie,...

1 Data Mining Trevor Hastie Stanford University

Modern Trends in Data Mining

Trevor Hastie Stanford University

November 2012

How IBM built Watson its Jeopardy-playing

supercomputer by Dawn Kawamoto DailyFinance 02082011

Learning from its

mistakes According to

David Ferrucci (PI of Watshy

son DeepQA technology for

IBM Research) Watsons

software is wired for more

than handling natural Ian-

guage processing

Machine learning allows the computer to become smarter as it

tries to answer questions - and to learn as it gets them right or

wrong

-IT CLE TOOLS SPONSORED BY

Adam NOW lJAYilNC

LN SII JJ-CTTllmiddotAT~M

For Todays Graduate Just One Word Statistics By STEVE LOHR

Published August 5 2009

MOUNTAIN VIEW Calif - At Harvard Carrie Grimes majored in

anthropology and archaeology and ventured to places like Honduras

where she studied Mayan settlement patterns by mapping where

artifacts were found But she was drawn to what she calls all the

computer and math stuff that was part of the job

Enlarge This Image People think of field archaeology as

Indiana Jones but much of what you

really do is data analysis she said

Now Ms Grimes does a different kind

of digging She works at Google

where she uses statistical analysis of mounds of data to

come up with ways to improve its search engine

Ms Grimes is an Internet-age statistician one of many

who are changing the image of the profession as a place for

dronish number nerds They are finding themselves

increasingly in demand - and even cool

I keep saying that the sexy job in the next 10 years will be

statisticians said Hal Varian chief economist at Google

And Im not kidding

Thor Swift for The New York Times

Carrie Grimes senior staff engineer at Google uses statistical analysis of data to help improve the companys search engine

Multimedia

SIGN IN TO

RECOMMEND

SIGN IN TO

E-MAIL

PRINT

REPRINTS

SHARE

QUOTE OF THE DAY

NEW YORK TIMES

AUGUST 5 2009 I keep saying that the sexy job in the next 10 years will be statisticians (sic) And Im not kidding - HAL VARIAN chief economist at Google

I Datamining for Prediction I

bull We have a collection of data pertaining to our business industry production process monitoring device etc

bull Often the goals of data-mining are vague such as look for

patterns in the data - not too helpful

bull In many cases a response or outcome can be identified as a good and useful target for prediction

bull Accurate prediction of this target can help the company make better decisions and save a lot of money

bull Data-mining is particularly good at building such prediction models - an area known as supervised learning

I Example Credit Risk Assessment I

bull Customers apply to a bank for a loan or credit card

bull They supply the bank with information such as age income employment history education bank accounts existing debts etc

bull The bank does further background checks to establish credit history of customer

bull Based on this information the bank must decide whether to make the loan or issue the credit card

Example continued Credit Risk Assessment J

bull The bank has a large database of existing and past customers Some of these defaulted on loans others frequently made late payments etc An outcome variable Status is defined taking value good or default Each of the past customers is scored with a value for status

bull Background information is available for all the past customers

bull Using supervised learning techniques we can build a risk prediction model that takes as input the background information and outputs a risk estimate (probability of def a ult) for a prospective customer

The California based company Fair-Isaac uses a generalized additive model + boosting methods in the construction of their credit risk scores

I Example Churn Prediction I

bull When a customer switches to another provider we call this uchurn Examples are cell-phone service and credit card providers

bull Based on customer information and usage patterns we can predict

- the probability of churn

- the retention probability ( as a function of time)

bull This information can be used to evaluate

- prospective customers to decide on acceptance

- present customers to decide on intervention strategy

Risk assessment and survival models are used by US cell-phone companies such as ATampT to manage churn

Home Rules

Leaderboard Register Update Submit Download

Rank Team Name

1 The Ensemble

2 BellKors Pragmatic Chaos

Grand Prize - RMSE lt= 08563

3 Grand Prize Team

4 OQera Solutions and Vandelay United

5 Vandelay Industries I

6 PragmaUcTheory

7 BellKor In BlgChaos

8 Dace

9 OQera Solutions

10 BellKor

Dlsplay top 20 leadersLeaderboard

Best Score Improvement

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Grand Prize one million dollars if beat Netflixs RMSE by 10

Competition ends Sep 21 2009 after 3 years two leaders 41305

teams Ultimate winner is BellKors Pragmatic Chaos

I Net flix Challenge I

Netflix users rate movies from 1-5 Based on a history of ratings predict the rating a viewer will give to a new movie

bull Training data sparse 400K (users) by 18K (movies) rating matrix with 987 missing About 100M movierater pairs

bull Quiz set of about 14M movieviewer pairs for which predictions of ratings are required (N etflix held them back)

bull Probe set of about 14 million movierater pairs similar in composition to the quiz set for which the ratings are known

bull Both winning teams used ensemble methods to achieve their results

I The Supervised Learning Problem I

Starting point

bull Outcome measurement Y ( also called dependent variable response target output)

bull Vector of p predictor measurements X (also called inputs regressors covariates features independent variables)

bull In the regression problem Y is quantitative ( eg price blood pressure rating)

bull In classification Y takes values in a finite unordered set ( default yesno churnretain spamemail)

bull We have training data (x1 Y1) (xN YN) These are observations (examples instances) of these measurements

I Objectives I

On the basis of the training data we would like to

bull Accurately predict unseen test cases for which we know X but do not know Y

bull In the case of classification predict the probability of an outcome

bull Understand which inputs affect the outcome and how

bull Assess the quality of our predictions and inferences

I More Examples I

bull Predict whether someone will have a heart attack on the basis of demographic diet and clinical measurements

bull Determine whether an incoming email is spam based on frequencies of key words in the message

bull Identify the numbers in a handwritten zip code from a digitized image

bull Estimate the probability that an insurance claim is fraudulent based on client demographics client history and the amount and nature of the claim

bull Predict the type of cancer in a tissue sample using DNA expression values

I Email or Spam I

bull data from 4601 emails sent to an individual (named George at HP labs before 2000) Each is labeled as uspam or uemail

bull goal build a customized spam filter

bull input features relative frequencies of 57 of the most commonly occurring words and punctuation marks in these email messages

george you hp free edu remove

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

Average percentage of words or characters in an email message equal to

the indicated word or character We have chosen the words and

characters showing the largest difference between spam and email

0 1 =----- 3 yen ~ I 7 8 q middot l 231Samp~ ~

Handwritten Digit Identification J

A sample of segmented and normalized handwritten digits scanned from zip-codes on envelopes Each image has 16 x 16 pixels of grayscale values ranging from O - 255

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

ERLUUEN TUPLE1TUP1 s1cw21352 Sl0381m9

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SICIIU2991D4 I Microarray Cancer Data I SICIIU3aJ1D2

SID13161

GNAL

ilyeniA

sI02m112 esn

SICIAl3l1 CO2 H1ma1mRNA s1ooumss

SI0U1915

UVIPROTO

esn

Expression matrix of 6830 genesESTICU1 S

IOJU 51

ONAPOLVUER Sl0315812

S IOUIJUSl

Sl016l11T SICIIU4TDS9 SICIIU48l21S1

(rows) and 64 samples (columns) iiJ

lUTOCtO NORIAL6D SIDU116 ESTICU6

Cir

for the human tumor data (100SIOOU29631 D

randomly chosen rows shown)

The display is a heat map rangshy

ing from bright green (under exshyHYPOTHETCAL

WASWkotl SICIIU321SS4 ESTICU15

bull pressed) to bright red ( over exshyESTICU2

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

Btbullio Goal predict cancer class based

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

1 on expression values

Data Mining Inference and Prediction

I Shameless self-promotion I

All of the topics in this lecture are

covered in the 2009 second edition

of our 2001 book

The book blends traditional linear

methods with contemporary nonshy

parametric methods and many

between the two

I Ideal Bayes Predictions I

bull For a quantitative output Y the best prediction we can make when the input vector X == x is

f(x) == Ave(YIX == x)

- This is the conditional expectation - deliver the Y-average of all those examples having X == x

- This is best if we measure errors by average squared error

Ave(Y - J(X))2

bull For a qualitative output Y taking values 1 2 M compute

- Pr(Y == mlX == x) for each value of m This is the conditional probability of class m at X == x

- Classify C(x) == j if Pr(Y == jlX == x) is the largest - the majority vote classifier

Implementation with Training Data J

The ideal prediction formulas suggest a data implementation To predict at X == x gather all the training pairs (Xi Yi) having Xi == x then

bull For regression use the mean of their Yi to estimate J(x) == Ave(YIX == x)

bull For classification compute the relative proportions of each class among these Yi to estimate Pr(Y == mlX == x) Classify the new observation by majority vote

Problem in the training data there may be NO observations having Xi == x

I Nearest Neighbor Averaging I

bull Estimate Ave(YIX == x) by

Averaging those Yi whose Xi are in a neighborhood of x

bull Eg define the neighborhood to be the set of k observations having values Xi closest to x in euclidean distance Ilxi - xi I-

bull For classification compute the class proportions among these k closest points

bull Nearest neighbor methods often outperform all other methods - about one in three times - especially for classification

csgt

20

00

Data Mining Trevor Hastie Stanford University

0U 0

0 0

0q

I)

0

I Kernel smoothing I bull Smooth version of nearest-

neighbor averaging

bull At each point x the function o

0

o 0 cP f(x) == Y(YIX == x) is esti-q 0

0 mated by the weighted aver-I)

0 I

0 age of the ys 0 q

I

0

bull The weights die down U

I 0 smoothly with distance from 00 02 04 06 08 10 the target point x (indicated

by shaded orange region)

not to be confused with kernel methods as in SVMs

I Structured Models I

bull When we have a lot of predictor variables NN methods often fail because of the curse of dimensionality

It is hard to find nearby points in high dimensions

bull Near-neighbor models offer little interpretation

bull We can overcome these problems by assuming some structure for the regression function Ave(YIX == x) or the probability function Pr(Y == klX == x) Typical structural assumptions

- Linear Models

- Additive Models

- Low-order interaction models

- Restrict attention to a subset of predictors

- and many more

e30+61 Xl +62x2+ +JpXp

Pr(Y == +llX == x) == l + e3o +61x 1+ 62x 2+ + fPxP

log Pr(Y = -llX = x)

22

= 3o + f31x1 + f32 x2 + + (3pxp Pr(Y == +llX == x)

I Linear Models I

bull Linear models assume

bull For two class classification problems linear logistic regression

has the form

bull This translates to

1w 1id1

n

Chapters 3 and 4 of deal with linear models

I Linear Model Complexity Control I

With many inputs linear regression can overfit the training data leading to poor predictions on future data Two general remedies are available

bull Variable selection reduce the number of inputs in the model For example stepwise selection or best subset selection

bull Regularization leave all the variables in the model but when fitting the model restrict their coefficients

Ridge I5= 1 3] lt s All the coefficients are non-zero but are shrunk toward zero ( and each other)

Lasso L5=1 l3j I lt s Some coefficients drop out the model others are shrink toward zero

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

I Best Subset Selection I

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

Each point corresponds to a linear m

odel involving a subset of the variables and show

s the residual sum-of-squares on the training

data The red m

odels are the candidates and we need to choose s

o ffi nt

middot I

Coefficients S ( s) 2 00 02 04 6

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

Both ridge and lasso coeffi

cients paths can be computed very

efficiently f or all valu

es of s

I Overfitting and Model Assessment I

bull In all cases above the larger s the better we will fit the training data Often we overfit the training data

bull Overfit models can perform poorly on test data (high variance)

bull Underfit models can perform poorly on test data (high bias)

Model assessment aims to

1 Choose a value for a tuning parameter s for a technique

2 Estimate the future prediction ability of the chosen model

bull For both of these purposes the best approach is to evaluate the procedure on an independent test set if one is available

bull If possible one should use different test data for (1) and (2) above a validation set for (1) and a test set for (2)

=

27

predicting the kth part Yi - xf t-k Ek ( A) == I i E kth pa ( ( s) )2 r bull t

I K-Fold Cross-Validation I

Primarily a method for estimating a tuning parameter s when data are scarce we illustrate for the regularized linear regression models

bull Divide the data into K roughly equal parts (5 or 10)1 2 3 4 5

Train Train Validation Train Train

bull for each k == 1 2 K fi t the model with parameter s to the other K - 1 parts giving - k (s) and compute its error in

bull This gives the overall cross-validation error CV (s) == i( I[ 1 Ek (s)

bull do this for many values of s and choose the value of s that makes CV ( s) smallest

I Cross-Validation Error Curve I bull 10-fold CV error curve using

lasso on some diabetes data ( 64 inputs 442 samples)

bull Thick curve is CV error curve

bull Shaded region indicates stanshydard error of CV estimate

bull Curve shows effect of overshyfitting - errors start to inshycrease above s == 02

bull This shows a trade-off beshy0 0 0 2 0 4 0 6 0 8 1 0 tween bias and variance

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Modern Structured Models in Data MiningJ

The following is a list of some of the more important and currently popular prediction models in data mining

bull Linear Models ( often heavily regularized)

bull Generalized Additive Models

bull Neural Networks

bull Hierarchical Bayesian Prediction Models

bull Trees Random Forests and Boosted Tree Models - hot

bull Support Vector and Kernel Machines - hot

Pr (Y = + 1 IX = i) 1 ( ) 1 ( ) 1 ( ) log ( I ) = eto + 1 r1 + 2 12 + + r

Pr Y = -1 X = T

I Generalized Additive Models I

Allow a compromise between linear models and more flexible local models (kernel estimates) when there are a many inputs X == (X1 X2 X

p )

bull Additive models for regression

bull Additive models for classification

Each of the functions fj (xj ) ( one for each input variable) can be a smooth function ( eg kernel estimate) linear or omitted

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

1 f (CAPMAX) f(re) f (hp) f (remove)

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

f (CAPTOT) f (edu) f (hpl) f (internet) -5 ) ll) C 1C C

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

bull Shown are the m

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Overall error rate 53

bull Functions can

be re-

parametrized ( eg

log terms

quadratic step-functions)

and then fit by linear model

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

Single (Hidden) Layer Perceptron

bull Like a complex regression or logisshytic regression model - more flexi-ble but less interpretable - a black

box

bull Hidden units Z1 Z2 Zm (4 here) Zj == a ( aoj + a3X) a (Z) == ez (1 + ez ) is the logistic sigmoid activation function

bull Output is a linear regression or logisshytic regression model in the Zj

bull Complexity controlled by m ridge regularization and early stopping of the backpropagation algorithm for fitshyting the neural network

bull Margin

I Support Vector Machines I bull Maximize the gap (margin)

between the two classes on the training data Decisi oundary bull

bull If not separable T

enlarge the feature space via basis expansions ( eg polynomials)

- use a asoft margin ( allow limited overlap)

bull Margin

bull Solution depends on a small number of points ( asupport vectors -) - 3 here

bullbull bull

bull

I Support Vector Machines I bull Maximize the soft margin subshy

ject to a bound on the totalX T 3 + 3o == Q bull

overlap I i i lt B

bull Even if data are separable

bull Soft Ma gin wider soft margin more stable

Primarily used for classificashytion problems Builds a linear classifier f(X) == (30 + (3TXbull

Soft Margin If f(X) gt 0 classify as +1 else if f(X) lt 0 classify as -1

bull Generalizations use kernels f(X) == ao + I 1 aiK(X xi )

Classification and Regression Trees J

Can handle huge datasets

Can handle mixed predictors-quantitative and qualitative

Easily ignore redundant variables

Handle missing data elegantly

Small trees are easy to interpret

X Large trees are hard to interpret

X Often prediction performance is poor

36

h p

6i m

I Tree fit to SP AM data I

ch$ lt 0 0 5 5 5 c

re move lt O tr e move gt 0 0 6 ffi h p gt 0 40 5

8 1 2 3 -irn i le m i l p m s p

ch lt 0 1 9 ge orge 0 1 6 A A E lt2 9 0 7

cli gt 0 lH ge orge gt l 6 A PAVE gt 2 9 0 7

i l i ei ma1 s

ge o 0 0 0 5 C A t-- lt2 75 0 5 1 9 9 lt 5 8 ge orge gt O O C A PAV E gt 2 Ji 1 99 9 gt 0 5 8

q f1 i l a i m s p i l

0 3 fr e e lt 0 6 5 h p gt 0 0 3 fr e e gt 0 0 6 5

i l i m

C A lt l 0 5 u s m e s lt 0 1 4 5 C P M o n e ss gt 0 1 4 5

f i l frr i e m i m

r e c e o 1 l7l 0 4 5 r e c e i ve O l efil u gt 0 0 4 5

q e6 i bf1 i l

o u r lt 2 o u r gt l 2

I Ensemble Methods and Boosting I

Classification trees can be simple but often produce noisy (bushy) or weak (stunted) classifiers

bull Bagging Breiman 1 996) Fit many large trees to bootstrap-resampled versions of the training data and classify by majority vote

bull Random Forests Breiman 1999) Improvements over bagging

bull Boosting Freund f3 Shapire 1 996) Fit many smallish trees to reweighted versions of the training data Classify by weighted majority vote

In general Boosting gt-- Random Forests gt-- Bagging gt-- Single Tree

0

w

Q) I-

t---

Spam Data

0

0 Bagg ing 0 Random Forest

G rad ient Boosti ng (5 Node) L() ltD 0

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

Modern Gradient Boosting (Friedman 2001) J

bull Fits an additive model

where each of the Tj (X) is a tree in X

bull Can be used for regression logistic regression and more For example gradient boosting for regression works by repeatedly fitting trees to the residuals

1 Fit a small tree T1 (X) to Y

2 Fit a small tree T2 (X) to the residual Y - T1 (X)

3 Fit a small tree T3 (X) to the residual Y - T1 (X) - T2 (X) and so on

bull m is the tuning parameter which must be chosen using a validation set (m too big will overfit)

I Software I

bull R is free software for statistical modeling graphics and a general programming environment Works on PCs Macs and LinuxUnix platforms All the models here can be fit in R R grew from its predecessor Splus and both implement the S language developed at Bell Labs in the 80s

bull SAS and their Enterprise Miner can fit most of the models mentioned in this talk with good data-handling capabilities and high-end user interfaces

bull Salford Systems has commercial versions of trees random forests and gradient boosting

bull SVM software is all over but beware of patent infringements if put to commercial use

bull Many free versions of neural network software Google will find

1 summary l

bull Many amazing tools are available from the simplest linear models to complex boosting algorithms

bull A void unwarranted complexity if linear models perform well they are easier to manage than more complex models

bull Boosting provides a good benchmark for what performance might be achievable

bull A good software environment is essential if R can manage your problem size its a great environment

Structure Bookmarks
- Modern Trends in Data Mining
- - Modern Trends in Data Mining
  - Trevor Hastie Stanford University November 2012
  - Figure
  - How IBM built Watson its Jeopardy-playing supercomputer by Dawn Kawamoto DailyFinance 02082011
  - - How IBM built Watson its Jeopardy-playing supercomputer by Dawn Kawamoto DailyFinance 02082011
    - Figure
    - Learning from its mistakes According to David Ferrucci (PI of Watshyson DeepQA technology for IBM Research) Watsons software is wired for more than handling natural Ianguage processing
    - - -
      - Machine learning allows the computer to become smarter as it tries to answer questions -and to learn as it gets them right or wrong
        
        Figure
        
        For Todays Graduate Just One Word Statistics
        
        By STEVE LOHR Published August 5 2009
        
        MOUNTAIN VIEW Calif -At Harvard Carrie Grimes majored in anthropology and archaeology and ventured to places like Honduras where she studied Mayan settlement patterns by mapping where artifacts were found But she was drawn to what she calls all the computer and math stuff that was part of the job
        
        
        Indiana Jones but much of what you really do is data analysis she said
        
        
        Figure
        
        of digging She works at
        
        Google
        
        where she uses statistical analysis of mounds of data to come up with ways to improve its search engine
        
        Ms Grimes is an Internet-age statistician one of many who are changing the image of the profession as a place for dronish number nerds They are finding themselves increasingly in demand -and even cool
        
        I keep saying that the sexy job in the next 10 years will be statisticians said Hal Varian chief economist at Google And Im not kidding
        
        Figure
        
        Thor Swift for The New York Times Carrie Grimes senior staff engineer at Google uses statistical analysis of data to help improve the companys search engine
        
        
        Multimedia
        
        SIGN IN TO RECOMMEND
        
        SIGN IN TO E-MAIL PRINT REPRINTS SHARE
        
        QUOTE OF THE DAY
        
        NEW YORK TIMES AUGUST 5 2009
        
        I keep saying that the sexy job in the next 10 years will be statisticians (sic) And Im not kidding -HAL VARIAN chief economist at Google
        
        
        bull
        
        bull
        
        bull
        
        We have a collection of data pertaining to our business industry production process monitoring device etc
        
        bull
        
        bull
        
        Often the goals of data-mining are vague such as look for patterns in the data -not too helpful
        
        bull
        
        bull
        
        In many cases a response or outcome can be identified as a good and useful target for prediction
        
        bull
        
        bull
        
        Accurate prediction of this target can help the company make better decisions and save a lot of money
        
        bull
        
        bull
        
        Data-mining is particularly good at building such prediction models -an area known as supervised learning
        
        
        bull
        
        bull
        
        bull
        
        Customers apply to a bank for a loan or credit card
        
        bull
        
        bull
        
        They supply the bank with information such as age income employment history education bank accounts existing debts etc
        
        bull
        
        bull
        
        The bank does further background checks to establish credit history of customer
        
        bull
        
        bull
        
        Based on this information the bank must decide whether to make the loan or issue the credit card
        
        Example continued Credit Risk Assessment
        
        J
        
        bull
        
        bull
        
        bull
        
        The bank has a large database of existing and past customers Some of these defaulted on loans others frequently made late payments etc An outcome variable Status is defined taking value good or default Each of the past customers is scored with a value for status
        
        bull
        
        bull
        
        Background information is available for all the past customers
        
        bull
        
        bull
        
        Using supervised learning techniques we can build a risk prediction model that takes as input the background information and outputs a risk estimate (probability of def a ult) for a prospective customer
        
        
        Example Churn Prediction I
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        When a customer switches to another provider we call this
        
        uchurn Examples are cell-phone service and credit card providers
        
        bull
        
        bull
        
        bull
        
        Based on customer information and usage patterns we can predict
        
        -the probability of churn -the retention probability ( as a function of time)
        
        bull
        
        bull
        
        This information can be used to evaluate -prospective customers to decide on acceptance -present customers to decide on intervention strategy
        
        
        
        
        Dlsplay top 20 leaders
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        BellKors Pragmatic Chaos
        
        Grand Prize -RMSE lt= 08563
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        Grand Prize one million dollars if beat Netflixs RMSE by 10 Competition ends Sep 21 2009 after Ł 3 years two leaders 41305 teams Ultimate winner is BellKors Pragmatic Chaos
        
        
        
        bull
        
        bull
        
        bull
        
        Training data sparse 400K (users) by 18K (movies) rating matrix with 987 missing About 100M movierater pairs
        
        bull
        
        bull
        
        Quiz set of about 14M movieviewer pairs for which predictions of ratings are required (N etflix held them back)
        
        bull
        
        bull
        
        Probe set of about 14 million movierater pairs similar in composition to the quiz set for which the ratings are known
        
        bull
        
        bull
        
        Both winning teams used ensemble methods to achieve their results
        
        The Supervised Learning Problem I
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        Outcome measurement Y ( also called dependent variable response target output)
        
        bull
        
        bull
        
        Vector of p predictor measurements X (also called inputs regressors covariates features independent variables)
        
        bull
        
        bull
        
        In the regression problem Y is quantitative ( eg price blood pressure rating)
        
        bull
        
        bull
        
        In classification Y takes values in a finite unordered set ( default yesno churnretain spamemail)
        
        bull
        
        bull
        
        We have training data (x1 Y1) (xN YN) These are observations (examples instances) of these measurements
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        Accurately predict unseen test cases for which we know X but do not know Y
        
        bull
        
        bull
        
        In the case of classification predict the probability of an outcome
        
        bull
        
        bull
        
        Understand which inputs affect the outcome and how
        
        bull
        
        bull
        
        Assess the quality of our predictions and inferences
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        Predict whether someone will have a heart attack on the basis of demographic diet and clinical measurements
        
        bull
        
        bull
        
        Determine whether an incoming email is spam based on frequencies of key words in the message
        
        bull
        
        bull
        
        Identify the numbers in a handwritten zip code from a digitized image
        
        bull
        
        bull
        
        Estimate the probability that an insurance claim is fraudulent based on client demographics client history and the amount and nature of the claim
        
        bull
        
        bull
        
        Predict the type of cancer in a tissue sample using DNA expression values
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        data from 4601 emails sent to an individual (named George at HP labs before 2000) Each is labeled as uspam or uemail
        
        bull
        
        bull
        
        goal build a customized spam filter
        
        bull
        
        bull
        
        input features relative frequencies of 57 of the most commonly occurring words and punctuation marks in these email messages
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        Average percentage of words or characters in an email message equal to the indicated word or character We have chosen the words and characters showing the largest difference between spam and email
        
        Handwritten Digit Identification
        
        J
        
        Figure
        
        A sample of segmented and normalized handwritten digits scanned from zip-codes on envelopes Each image has 16 x 16 pixels of grayscale values ranging from O -255
        
        
        Figure
        
        SICIIU2991D4
        
        I Microarray Cancer Data I
        
        SICIIU3aJ1D2 SID13161 GNAL
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        Expression matrix of 6830 genes
        
        esn
        
        ESTICU1 SOJU 51 ONAPOLVUER Sl0315812 S OUIJUSl Sl016l11T SICIIU48l21S1
        
        I
        
        I
        
        SICIIU4TDS9
        
        (rows) and 64 samples (columns)
        
        ŁiŁiJ
        
        lUTOCtO NORIAL6D SIDU116 ESTICU6 SIOOU29631 D
        
        Figure
        
        for the human tumor data (100
        
        Cir
        
        
        
        ing from bright green (under exshy
        
        HYPOTHETCAL WASWŁkotl SICIIU321SS4
        
        ESTICU15
        
        bull
        
        pressed) to bright red ( over exshy
        
        ESTICU2 SIOOU322aJ6 SD2m3l4
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        ŁBŁtbullŁiŁGoal predict cancer class based
        
        o
        
        esn
        
        SICIAl366311 SICW3S119T SIDS2919
        
        esn
        
        SID43ED9 1
        
        Figure
        
        on expression values
        
        Figure
        
        
        Figure
        
        Shameless self-promotion I
        
        I
        
        All of the topics in this lecture are covered in the 2009 second edition of our 2001 book The book blends traditional linear methods with contemporary nonshyparametric methods and many between the two
        
        Ideal Bayes Predictions I
        
        I
        
        
        f(x) == Ave(YX == x)
        
        I
        
        -This is the conditional expectation -deliver the Y-average of all those examples having X == x
        
        -This is best if we measure errors by average squared error Ave(Y -J(X))
        
        2
        
        bull For a qualitative output Y taking values 1 2 M compute -Pr(Y == mlX == x) for each value of m This is the conditional probability of class m at X == x
        
        -Classify C(x) == j if Pr(Y == jlX == x) is the largest -the majority vote classifier
        
        Implementation with Training Data
        
        J
        
        
        bull
        
        bull
        
        bull
        
        For regression use the mean of their Yi to estimate J(x) == Ave(YIX == x)
        
        bull
        
        bull
        
        For classification compute the relative proportions of each class among these Yi to estimate Pr(Y == mlX == x) Classify the new observation by majority vote
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        Eg define the neighborhood to be the set of k observations having values Xi closest to x in euclidean distance Ilxi -xI
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        For classification compute the class proportions among these k closest points
        
        bull
        
        bull
        
        Nearest neighbor methods often outperform all other methods -about one in three times -especially for classification
        
        I Kernel smoothing I
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        mated by the weighted aver-
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        f(x) == Y(YIX == x) is esti
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        Smooth version of nearest-neighbor averaging
        
        bull
        
        bull
        
        At each point x the function
        
        q
        
        0
        
        I
        
        0
        
        bull The weights die down
        
        U
        
        I
        
        0
        
        smoothly with distance from
        
        00 02 04 06 08 10
        
        the target point x (indicated by shaded orange region)
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        Near-neighbor models offer little interpretation
        
        bull
        
        bull
        
        We can overcome these problems by assuming some structure for the regression function Ave(YIX == x) or the probability function Pr(Y == klX == x) Typical structural assumptions
        
        -Linear Models -Additive Models -Low-order interaction models -Restrict attention to a subset of predictors - and many more
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        Linear models assume
        
        bull
        
        bull
        
        For two class classification problems linear logistic regression has the form
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        Linear Model Complexity Control I
        
        I
        
        
        bull
        
        bull
        
        bull
        
        Variable selection reduce the number of inputs in the model For example stepwise selection or best subset selection
        
        bull
        
        bull
        
        Regularization leave all the variables in the model but when
        
        fitting the model restrict their coefficients Ridge I5=3] s All the coefficients are non-zero but are shrunk toward zero ( and each other)
        
        1
        
        lt
        
        Lasso L5=l3I s Some coefficients drop out the model others are shrink toward zero
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        Each point corresponds to a linear model involving a subset of the variables and shows the residual sum-of-squares on the training data The red models are the candidates and we need to choose s
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        Both ridge and lasso coefficients paths can be computed very efficiently for all values os
        
        f
        
        
        bull
        
        bull
        
        bull
        
        In all cases above the larger s the better we will fit the training data Often we overfit the training data
        
        bull
        
        bull
        
        Overfit models can perform poorly on test data (high variance)
        
        bull Underfit models can perform poorly on test data (high bias) Model assessment aims to
        
        1
        
        1
        
        1
        
        Choose a value for a tuning parameter s for a technique
        
        2
        
        2
        
        Estimate the future prediction ability of the chosen model
        
        bull
        
        bull
        
        bull
        
        For both of these purposes the best approach is to evaluate the procedure on an independent test set if one is available
        
        bull
        
        bull
        
        If possible one should use different test data for (1) and (2) above a validation set for (1) and a test set for (2)
        
        
        
        bull Divide the data into K roughly equal parts (5 or 10)
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        for each k == 1 2 K fi t the model with parameter s to the
        
        other K -1 parts giving Ł-k(s) and compute its error in
        
        bull
        
        bull
        
        This gives the overall cross-validation error CV (s) == i( I[ Ek (s)
        
        1
        
        bull
        
        bull
        
        do this for many values of s and choose the value of s that makes CV ( s) smallest
        
        I Cross-Validation Error Curve I
        
        bull 10-fold CV error curve using lasso on some diabetes data
        
        0
        
        0
        
        0
        
        ( 64 inputs 442 samples)
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        bull Shaded region indicates stanshy
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        Ł dard error of CV estimate
        
        Ii)
        
        gt
        
        St
        
        bull Curve shows effect of overshy
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        bull This shows a trade-off beshy
        
        0
        
        00 02 04 06 08 10
        
        tween bias and variance
        
        Tuning Parameter s
        
        Modern Structured Models in Data Mining
        
        J
        
        
        bull
        
        bull
        
        bull
        
        Linear Models ( often heavily regularized)
        
        bull
        
        bull
        
        Generalized Additive Models
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        Hierarchical Bayesian Prediction Models
        
        bull
        
        bull
        
        Trees Random Forests and Boosted Tree Models -hot
        
        bull
        
        bull
        
        Support Vector and Kernel Machines -hot
        
        
        Allow a compromise between linear models and more flexible local models (kernel estimates) when there are a many inputs
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        Additive models for regression
        
        bull
        
        bull
        
        Additive models for classification
        
        Figure
        
        Figure
        
        Each of the functions f(x) ( one for each input variable) can be a smooth function ( eg kernel estimate) linear or omitted
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        I GAM fit to SPAM data I
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        bull Shown are the most important
        
        internet
        
        predictors
        
        Figure
        
        bull Many show nonlinear behav
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        Overall error rate 53
        
        bull
        
        bull
        
        Functions can be re-parametrized ( eg log terms quadratic step-functions) and then fit by linear model
        
        r bull Produces a prediction per email Pr(SPAMIX == x)
        
        0 5000 10000 15000
        
        CAPTOT
        
        rmiddot II I n1 I I 0 2000 6000 10000 CAPMAX
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        bull Like a complex regression or logisshytic regression model -more flexi
        
        -
        
        output Layer ble but less interpretable -a black box
        
        bull Hidden units Z1 Z2 Z(4 here) Z== a( aoj + a3X)
        
        m
        
        j
        
        Hidden Layer
        
        a(Z) == e (1 + e) is the logistic sigmoid activation function
        
        z
        
        z
        
        bull Output is a linear regression or logisshytic regression model in the Z
        
        j
        
        
        
        I Support Vector Machines I
        
        bull Maximize the gap (margin) between the two classes on the training data
        
        Decisi oundary bull
        
        bull If not separable
        
        T
        
        
        -use a asoft margin ( allow limited overlap)
        
        bull Margin
        
        bull Solution depends on a small number of points ( asupport vectors-) -3 here
        
        bullbull bull bull bull bull
        
        Figure
        
        
        bull Maximize the soft margin subshyject to a bound on the total
        
        X T 3 + 3o == Q bull overlap IŁB
        
        i
        
        i
        
        lt
        
        
        wider soft margin more stable
        
        bull
        
        Soft Ma gin
        
        Primarily used for classificashytion problems Builds a linear classifier f(X) == (3+ (3X
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        If f(X) gt 0 classify as +1 else if f(X) lt 0 classify as -1
        
        bull Generalizations use kernels f(X) == ao + IŁ aiKX xi
        
        1
        
        (
        
        )
        
        Classification and Regression Trees
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        eorge 016A A Elt2907
        
        Figure
        
        g
        
        chlt019
        
        eorgegt l6APAVEgt2907
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        georgegtOO CAPAVEgt2 ŁJi
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        ece o1Łl7l045 eceive O lefilugt0045
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        Ensemble Methods and Boosting I
        
        I
        
        
        bull Bagging Breiman 1996) Fit many large trees to bootstrap-resampled versions of the training data and classify by majority vote
        
        
        bull Boosting Freund f3 Shapire 1996) Fit many smallish trees to reweighted versions of the training data Classify by weighted majority vote
        
        In general Boosting gt--Random Forests gt--Bagging gt--Single Tree
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        Random Forest Gradient Boosting (5 Node)
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        Modern Gradient Boosting (Friedman 2001)
        
        J
        
        
        Figure
        
        where each of the T(X) is a tree in X
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        Can be used for regression logistic regression and more For example gradient boosting for regression works by repeatedly fitting trees to the residuals
        
        1
        
        1
        
        1
        
        Fit a small tree T(X) to Y
        
        1
        
        2
        
        2
        
        Fit a small tree T2(X) to the residual Y -T1 (X)
        
        3
        
        3
        
        Fit a small tree T(X) to the residual Y -T(X) -T(X) and so on
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        m is the tuning parameter which must be chosen using a validation set (m too big will overfit)
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        R is free software for statistical modeling graphics and a general programming environment Works on PCs Macs and LinuxUnix platforms All the models here can be fit in R R grew from its predecessor Splus and both implement the S language developed at Bell Labs in the 80s
        
        bull
        
        bull
        
        SAS and their Enterprise Miner can fit most of the models mentioned in this talk with good data-handling capabilities and high-end user interfaces
        
        bull
        
        bull
        
        Salford Systems has commercial versions of trees random forests and gradient boosting
        
        bull
        
        bull
        
        SVM software is all over but beware of patent infringements if put to commercial use
        
        bull
        
        bull
        
        Many free versions of neural network software Google will find
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        Many amazing tools are available from the simplest linear models to complex boosting algorithms
        
        bull
        
        bull
        
        A void unwarranted complexity if linear models perform well they are easier to manage than more complex models
        
        bull
        
        bull
        
        Boosting provides a good benchmark for what performance might be achievable
        
        bull
        
        bull
        
        A good software environment is essential if R can manage your problem size its a great environment

Page 2: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

How IBM built Watson its Jeopardy-playing

supercomputer by Dawn Kawamoto DailyFinance 02082011

Learning from its

mistakes According to

David Ferrucci (PI of Watshy

son DeepQA technology for

IBM Research) Watsons

software is wired for more

than handling natural Ian-

guage processing

Machine learning allows the computer to become smarter as it

tries to answer questions - and to learn as it gets them right or

wrong

Adam NOW lJAYilNC

And Im not kidding

Multimedia

SIGN IN TO

RECOMMEND

SIGN IN TO

E-MAIL

PRINT

REPRINTS

SHARE

QUOTE OF THE DAY

NEW YORK TIMES

Home Rules

Rank Team Name

1 The Ensemble

3 Grand Prize Team

6 PragmaUcTheory

8 Dace

9 OQera Solutions

10 BellKor

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 3: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Adam NOW lJAYilNC

And Im not kidding

Multimedia

SIGN IN TO

RECOMMEND

SIGN IN TO

E-MAIL

PRINT

REPRINTS

SHARE

QUOTE OF THE DAY

NEW YORK TIMES

Home Rules

Rank Team Name

1 The Ensemble

3 Grand Prize Team

6 PragmaUcTheory

8 Dace

9 OQera Solutions

10 BellKor

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 4: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Home Rules

Rank Team Name

1 The Ensemble

3 Grand Prize Team

6 PragmaUcTheory

8 Dace

9 OQera Solutions

10 BellKor

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 5: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Home Rules

Rank Team Name

1 The Ensemble

3 Grand Prize Team

6 PragmaUcTheory

8 Dace

9 OQera Solutions

10 BellKor

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 6: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Home Rules

Rank Team Name

1 The Ensemble

3 Grand Prize Team

6 PragmaUcTheory

8 Dace

9 OQera Solutions

10 BellKor

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 7: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Home Rules

Rank Team Name

1 The Ensemble

3 Grand Prize Team

6 PragmaUcTheory

8 Dace

9 OQera Solutions

10 BellKor

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 8: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Home Rules

Rank Team Name

1 The Ensemble

3 Grand Prize Team

6 PragmaUcTheory

8 Dace

9 OQera Solutions

10 BellKor

08553 1010

08554 1009

08571 991

08573 989

08579 983

08582 980

08590 971

08603 958

08611 949

08612 948

Last Submit Time

2009--07-26 183822

2009--07-26 181828

2009--07-24 130749

2009--07-25 200552

2009--07-26 024953

2009--07-12 150953

2009--07-26 125725

2009--07-24 171843

2009--07-26 180208

2009-07-26 171911

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 9: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 10: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Starting point

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 11: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I Objectives I

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 12: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I More Examples I

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 13: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I Email or Spam I

spam 000 226 002 052 051 001 028

email 127 127 090 007 011 029 001

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 14: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 15: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I o

I I

I

I I I

I I

I

SI

SI046 SIWU2S1915

I I

I I I

I bull SI00141662

SI001298052 SI001411210 SI00136211 ESTICH15

92S s

SID13161

GNAL

ilyeniA

sI02m112 esn

SI0U1915

UVIPROTO

esn

IOJU 51

S IOUIJUSl

Cir

SIOOU322aJ6 S

ID2m3l4 pressed)

ESTICU15

Sl028Hl53s 0ssus1

Sl0291SOS esn

esn SICIAl366311

SICW3S119T

SIDS2919esn

SID43ED9

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 16: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

of our 2001 book

between the two

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 17: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Ave(Y - J(X))2

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 18: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 19: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 20: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

csgt

20

00

0U 0

0 0

0q

I)

0

neighbor averaging

0

0 I

0 age of the ys 0 q

I

0

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 21: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

- Linear Models

- Additive Models

- and many more

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 22: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

e30+61 Xl +62x2+ +JpXp

22

I Linear Models I

has the form

1w 1id1

n

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 23: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 24: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

R -0 res

-

bull

bull bull bull

bull bullbull

24

D

ata

Min

ing

T

rev

or H

astie

S

tan

ford

Un

iversity

00

bull bull

ro

0co

bullbull I bull

bull bull

J

I I

bull amp

co

bull

I bull

E

0

bull bullbull

I I

ro J

----- -

I I bull

bullCf)

I

Q

J

I -2()Q)

o

-1-

i_

__

0N0

0

1 2

3

4

5

6

7

8

Su

bse

t Siz

e s

data The red m

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 25: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

o ffi nt

middot I

25

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I Rid

ge I

I Lasso I

CDci

--_CJ

_ltCQ_

ci

if]ltl)

bull u ci N

ltl)

u NciI

0

lcavol

svi I

_-

lweight

_-

pgg45

___--

-lbph

gleason

age

lcp

2

4

6

8

Sh

rinka

ge F

acto

r s

lcavol

ci

svi -

---

- lw

eight -

pgg45

lbph

----------

-------------------g1easan-

age

ciI

lcp

00

Q

2

Q4

Q

6

QB

n

Sh

rinka

ge F

acto

r s

es of s

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 26: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 27: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

=

27

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 28: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Tuning Parameter s

0 0 0 CD

0 0 Ii) Ii)

0

0 0

0 Ii)

0

gt Ii)St

0 0 0 0St

0 0

() Ii)

0 0

() 0

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 29: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 30: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

Pr Y = -1 X = T

p )

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 31: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

) Jc our) 5 5 -1 5 10 C s

7 I

I I

I ~

(ch$) j ( 1999) f f (over) -5 10 -5 ) C

1

-5 middot1bull) _

C ~ -10 5 C 5 10

1 I

r I I= i

I ~1 Q

31

Da

ta M

inin

g

Tre

vo

r Ha

stie

Sta

nfo

rd U

niv

ersity

I GA

M fit to SP

AM

data I

--

-r

ll

II

I I

I

I -

bullmiddot1rn

1

I

0

1 2

3

0

2

4

6

0

2

4

6

8

1

0

f

ost important

ov

er

r

em

ov

e

in

te

rn

et

1-

bull-

11 11111

1

0

2

4

6

0

5

10

1

5

20

5

1

0

bu

sin

es

s

hp

h

pl

__11 111

Il l

0

2

4

6

0

5

10

15

20

0

5

1

0

15

19

99

r

e

ed

u

r

rmiddot II

I n

1 I

I

01

23

45

6

0

20

00

6

00

0

10

00

0

5000

10

00

0

15

00

0

ch

$

CAPM

AX

CA

PT

OT

predictors

bull Many show

nonlinear behav-1or

bull Functions can

be re-

parametrized ( eg

log terms

bull Produces

a prediction

per em

ail Pr(SPA

MIX

==

x)

--_Q)Q)

1--1H

-

lt lt+-

0

--_Q)bO1--1

a

0Q)bO

ltI0

0

2

4

6

8

ou

r

--_

0

ll

-ll

J

- 1 Q)

01 0

2

4

6

8

1

0

fr

ee

-

Y------------ll-

0

1

0

20

3

0

ge

or

ge

cu-

0

lt lt+-

0

10

2

0

30

ch

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 32: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I Neural Networks I

output Layer

Hidden Layer

Input Layer

box

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 33: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

bull Margin

bullbull bull

bull

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 34: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

overlap I i i lt B

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 35: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 36: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

36

h p

6i m

ch$ lt 0 0 5 5 5 c

i l i ei ma1 s

i l i m

f i l frr i e m i m

q e6 i bf1 i l

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 37: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 38: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

0

w

Q) I-

t---

Spam Data

0

0 ltD 0

0

ltt0

0

0 500 1 000 1 500 2000 2500

Number of Trees

L

L L() L L()

+- 0

Cl) 0

0 L() 0

0

L() ltt 0

0

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 39: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 40: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

I Software I

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Page 41: Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie, Stanford University 5 . I Example: Credit Risk Assessment I • Customers apply to a

1 summary l

Structure Bookmarks
  - Figure
    - Figure
    - - -
        
        Figure
        
        
        
        
        
        
        
        Figure
        
        
        Google
        
        
        
        
        Figure
        
        
        
        Multimedia
        
        
        
        QUOTE OF THE DAY
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        J
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        I
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        
        
        
        
        Leaderboard
        
        Rank
        
        Rank
        
        Rank
        
        Team Name
        
        1
        
        1
        
        The Ensemble
        
        2
        
        2
        
        
        
        3 Grand Prize Team
        
        
        5 Industries I
        
        Vandelay
        
        6
        
        PragmaUcTheory
        
        7 BellKor In
        
        BlgChaos
        
        8
        
        Dace
        
        9 Solutions
        
        OQera
        
        10
        
        BellKor
        
        Best Score
        
        08553
        
        08554
        
        08571
        
        08573
        
        08579
        
        08582
        
        08590
        
        08603
        
        08611
        
        08612
        
        Improvement
        
        1010
        
        1009
        
        991
        
        989
        
        983
        
        980
        
        971
        
        958
        
        949
        
        948
        
        Last Submit Time
        
        2009--07-26 183822
        
        2009--07-26 181828
        
        2009--07-24 130749
        
        2009--07-25 200552
        
        2009--07-26 024953
        
        2009--07-12 150953
        
        2009--07-26 125725
        
        2009--07-24 171843
        
        2009--07-26 180208
        
        2009-07-26171911
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        I
        
        Starting point
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Objectives I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I More Examples I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        I Email or Spam I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        george
        
        george
        
        george
        
        you
        
        hp
        
        free
        
        edu
        
        remove
        
        spam
        
        spam
        
        000
        
        226
        
        002
        
        052
        
        051
        
        001
        
        028
        
        email
        
        email
        
        127
        
        127
        
        090
        
        007
        
        011
        
        029
        
        001
        
        
        
        J
        
        Figure
        
        
        
        Figure
        
        SICIIU2991D4
        
        
        
        ilyeni
        
        A
        
        sI02m112
        
        esn
        
        SICIAl3l1 CO2
        
        H1ma1mRNA
        
        s1ooumss
        
        SI0U1915
        
        UVIPROTO
        
        
        esn
        
        
        I
        
        I
        
        SICIIU4TDS9
        
        
        ŁiŁiJ
        
        
        Figure
        
        
        Cir
        
        
        
        
        
        ESTICU15
        
        bull
        
        
        
        I
        
        pressed)
        
        ESTICU15
        
        ESTICU15
        
        Sl028Hl53
        
        s 0ssus
        
        1
        
        Sl0291SOS
        
        esn
        
        
        o
        
        esn
        
        
        esn
        
        SID43ED9 1
        
        Figure
        
        
        Figure
        
        
        Figure
        
        
        I
        
        
        
        I
        
        
        
        I
        
        
        
        2
        
        
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        i
        
        -
        
        -
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0
        
        0 cP
        
        o
        
        Figure
        
        0
        
        o
        
        
        Figure
        
        I)
        
        0
        
        I
        
        0
        
        age of the ys
        
        0
        
        q
        
        0
        
        U
        
        0
        
        0
        
        0
        
        0
        
        q
        
        Figure
        
        I)
        
        0
        
        
        -
        
        0
        
        0
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        q
        
        0
        
        I
        
        0
        
        
        U
        
        I
        
        0
        
        
        00 02 04 06 08 10
        
        
        
        
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        Pr(Y == +llX == x)
        
        Linear Models I
        
        I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        This translates to
        
        Figure
        
        Figure
        
        1w 1id1
        
        nŁ
        
        
        
        I
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        lt
        
        
        1
        
        j
        
        lt
        
        
        0
        
        bull
        
        0
        
        bull
        
        0
        
        co bull bullro
        
        Figure
        
        bull
        
        bull
        
        bull
        
        bull
        
        I
        
        bull
        
        I
        
        bull
        
        bull
        
        I
        
        0 I
        
        co
        
        co
        
        E
        
        bull
        
        bull
        
        bull
        
        bull
        
        amp
        
        J
        
        bull
        
        I bull
        
        I
        
        J
        
        bull
        
        Cf)
        
        I
        
        J -()
        
        2
        
        ro
        
        Q
        
        I
        
        -
        
        -
        
        1
        
        I
        
        I
        
        bull
        
        Q)
        
        o
        
        -
        
        i
        
        _
        
        _
        
        _
        
        0
        
        N
        
        0
        
        0 1 2 3 4 5 6 7 8
        
        Subset Size s
        
        
        
        I Ridge I Lasso I
        
        I
        
        CD ci
        
        _
        
        ltCQ_
        
        ci if]
        
        ltl)
        
        bull u ci
        
        Ł
        
        N
        
        ltl)
        
        u Ł
        
        N ci
        
        I
        
        Figure
        
        0
        
        lcavol
        
        svi I _-lweight
        
        gleason
        
        age
        
        lcp
        
        2 4 6 8
        
        Shrinkage Factor s
        
        lcavol
        
        Figure
        
        ci
        
        svi
        
        lweight
        
        ----
        
        Ł-
        
        -pgg45 lbph
        
        -
        
        age
        
        ciI
        
        lcp
        
        00 Q2 Q4 Q6 QB n
        
        Shrinkage Factor s
        
        
        f
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        1
        
        1
        
        1
        
        
        2
        
        2
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        
        1 2 3 4 5
        
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        
        bull
        
        bull
        
        
        1
        
        bull
        
        bull
        
        
        
        
        0
        
        0
        
        0
        
        
        Figure
        
        CD
        
        0
        
        0
        
        Ii)
        
        
        Ii)
        
        0
        
        0
        
        Ł
        
        
        0
        
        Ii)
        
        0
        
        0Ł
        
        0
        
        
        Ii)
        
        gt
        
        St
        
        
        0
        
        0
        
        0
        
        0
        
        St
        
        fitting -errors
        
        0
        
        0
        
        ()
        
        crease above s ==
        
        Ii)
        
        0
        
        0
        
        start to inshy02
        
        ()
        
        
        0
        
        00 02 04 06 08 10
        
        
        Tuning Parameter s
        
        
        J
        
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        Neural Networks
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        
        X == (XX2X )
        
        1
        
        p
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        Figure
        
        Figure
        
        
        j
        
        j
        
        Figure
        
        ---rllŁ IIIII II I
        
        ---rllŁ IIIII II I
        
        0 1 2 3
        
        over
        
        1
        
        1
        
        -bull-Ł11 11111 1
        
        -bull-Ł11 11111 1
        
        0 2 4 6
        
        business
        
        __11 111 Il l 0 2 4 6 1999 0 5 10 15 edu
        
        0 5 10 15 20
        
        re
        
        
        Figure
        
        f
        
        -bullmiddot1rn1 I
        
        -bullmiddot1rn1 I
        
        0 2 4 6 8 10
        
        
        internet
        
        predictors
        
        Figure
        
        
        -
        
        1or
        
        0 5 10 15 20 5 10
        
        Figure
        
        0 2 4 6
        
        0 2 4 6
        
        remove
        
        hp hpl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        
        0 5000 10000 15000
        
        CAPTOT
        
        
        01 23456
        
        ch$
        
        --_ Q)
        
        Q) Ł
        
        1--1 H
        
        -
        
        ltlt+-
        
        --_
        
        Q)
        
        bO
        
        1--1 a 0
        
        Q)
        
        bO
        
        ltŁI
        
        0
        
        Figure
        
        0 2 4 6 8
        
        our
        
        --_ 0
        
        ll
        
        -
        
        ll
        
        Ł1Ł
        
        Q)
        
        0 J
        
        0 J
        
        0 J
        
        - 0
        
        TR
        
        1
        
        0
        
        0
        
        2
        
        4
        
        6
        
        8
        
        10
        
        free
        
        Figure
        
        0 10 20 30
        
        george
        
        c
        
        u
        
        - 0
        
        ltlt+-
        
        Figure
        
        0 10 20 30
        
        ch
        
        I Neural Networks I
        
        Input Layer
        
        
        -
        
        
        
        m
        
        j
        
        Hidden Layer
        
        
        z
        
        z
        
        
        j
        
        
        
        
        
        Decisi oundary bull
        
        
        T
        
        
        
        bull Margin
        
        
        
        Figure
        
        
        
        
        i
        
        i
        
        lt
        
        
        
        bull
        
        Soft Ma gin
        
        
        0
        
        T
        
        bull
        
        Soft Margin
        
        
        
        1
        
        (
        
        )
        
        
        J
        
        
        
        
        
        
        
        
        Figure
        
        
        
        Figure
        
        ch$lt00555
        
        c
        
        removeltO
        
        Łt
        
        ffihpgt0405
        
        removegt006
        
        8 1 2 3 Ł
        
        Ł -irn il
        
        em il pm sp
        
        
        Figure
        
        g
        
        chlt019
        
        
        cligt0lH g
        
        il Łiei Łm
        
        Ł
        
        a
        
        1
        
        s
        
        Ł
        
        eo 0005 CAt--ŁŁlt27505 19 9lt 58
        
        g
        
        Figure
        
        
        1999gt058
        
        il aiŁm sp Łil
        
        q
        
        Ł
        
        Łf1
        
        eelt 065 eegt0065
        
        03 fr
        
        hpgt003 fr
        
        Łil iŁm
        
        es lt0145
        
        CA ltl05usm
        
        C PM o nessgt0145
        
        Łf
        
        Ł
        
        il Łm
        
        Figure
        
        frri em i
        
        
        r
        
        r
        
        Ł
        
        Ł
        
        q
        
        e
        
        iŁ bf1il
        
        6
        
        ourlt 2
        
        ourgtl2
        
        
        I
        
        
        
        
        
        
        Spam Data
        
        0
        
        0 Bagging 0
        
        
        L()
        
        ltD
        
        0 0
        
        0
        
        ltD
        
        0 0
        
        0
        
        0 0
        
        0 500 1000 1500 2000 2500
        
        Number of Trees
        
        L L L()L L() +- 0 Cl) 0 0 L() 0 0 L() ltt 0 0
        
        
        J
        
        
        Figure
        
        
        j
        
        bull
        
        bull
        
        bull
        
        bull
        
        
        1
        
        1
        
        1
        
        
        1
        
        2
        
        2
        
        
        3
        
        3
        
        
        3
        
        1
        
        2
        
        bull
        
        bull
        
        
        I Software I
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        1summaryl
        
        bull
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull
        
        
        bull
        
        bull

Date post:	09-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Trevor Hastie Stanford University November 2012. · 2019-12-05 · Data Mining Trevor Hastie,...

Documents