ReceiverOperatingCharacteristic(ROC)
Curves
Evaluatingaclassifierandpredictiveperformance
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
1-Specificity (i.e. % of true negatives incorrectly declared positive)
Sen
sitiv
ity (%
of t
rue
posi
tives
dec
lare
d po
sitiv
e)
Falsepositiverate
True
positivera
te
Classification(supervisedlearning)• Insupervisedlearning,weareinterestedinbuildingamodeltopredicttheclass(oroutcome)ofanewobservationbasedonobservablepredictorsusingthetrain-set/test-setframework.– Whichstudentsarenotlikelytoreturnfortheirsecondyearofcollege?
– Whichanimalsadmittedtoaveterinaryhospitalarelikelytosurvive(ornotsurvive)?
– Whichbankcustomersarelikelytodefaultonaloan?
Someclassificationprocedures:• Logisticregression(0-1response)– Choosevariables,fitmodel,calculate(i.e.)– 0.5predictedclass1,0.5predictedclass0
• Classificationtree– Use‘greedy’algorithmtochoosevariablesandcreatedecisiontree(predictionattreeleaves)
• LinearDiscriminantAnalysis(LDA)– Builtonmultivariatenormal(spheroids)– Classifynewobservationtonearestcentroid
p̂p̂ ≥ p̂ <
ŷ
Assumptions:• Logisticregression– Predictorsarerelatedtoresponsewithasigmoidalmeanstructure
– ConditionalBernoulligiventhemean• Classificationtree(maybe?)– Thedatacanbedescribedbyfeatures– Thereexistssometreethatisn’ttoobigthatcanpredictreasonablywell
• Lineardiscriminantanalysis– DatafollowmultivariateGaussian– Equalcovariancematrices
Pros/Consoftheseclassifiers:
NOTE:Insimulationstudies,wehavefoundthateachoftheseclassifiersperformsbetterthantheotherswhenthedataweresimulatedunderthegivenmodelassumptions.
Pros ConsLogistic,Regression
Interpretation,of,covariates.
Computationally,complex,interative,algorithm,and,may,not,converge.
More,robust,to,deviations,from,assumptions,than,LDA.
Depends,on,sigmoidal,mean,structure,and,conditional,binomial,distributions.
Trees Fewer,assumptions,of,data,,more,flexible,algorithm,,easy,interpation,of,certain,characteristics.
Instability,(random,forests,can,help),,lack,of,smoothness,,uses,a,greedy,algorithm.
LDA Dimensionality,reduction,,estimation,under,assumptions,uses,maximum,likelihood
Not,flexible,when,assumptions,deviate,from,multivariate,normal,assumptions.
Whatmakesagoodclassifier?
• Yourpredictionsarecorrect.– Truepositiveswerecorrectlyclassified– Truenegativeswerecorrectlyclassified
• Example:Diseasediagnostictest– Patientsaregiventhediagnostictest– Testgivesthosewiththediseasea‘positive’
• Highsensitivity(correctlyclassifyingtruepositives)– Testgivesthosewithoutthediseasea‘negative’
• Highspecificity(correctlyclassifyingtruenegatives)
Misclassificationrate
• Misclassification– Theobservationbelongstooneclass,butthemodelclassifiesitasamemberofadifferentclass.
• Aclassifierthatmakesnoerrorswouldbeperfect– Notlikelytofindsuchaclassifierintherealworld– Noise– Otherrelevantvariablesnotintheavailabledataset
Misclassificationrate
• Wecanpresenttheaccuracyoftheresultswithaconfusionmatrix.
• Overall error rate = (18+5)/1000 = 2.3%
• NOTE: The prediction value is often calculated through leave-one-out cross-validation in logistic regression.
Predict(as(1 Predict(as(0Actual(1 20 5Actual(0 18 957
Beatingthenaïveclassifier
• Inthisdataset,the0’sweremuchmoreprevalentthanthe1’s.
• WhatifwejustclassifiedALLobservationsas0?Howwelldowedo?
• Overall error rate = (25)/1000 = 2.5%
Predict(as(1 Predict(as(0Actual(1 20 5Actual(0 18 957
Distinguishingbetweenthetwotypesofmistakestobemade
• Sensitivity=#oftruepositivesdeclared‘positive’#oftruepositivesIsyourdiseasediagnostictoolgettingtheonesyouwant?
• Specificity=#oftruenegativesnotdeclared‘positive’#oftruenegativesIsyourdiseasediagnostictoolNOTgettingtheonesyouDON’Twant?
• Foragivenclassifier,wewouldlikebothtobehigh.
ReceiverOperatingCharacteristic(ROC)curve
• TocreateanROCcurve,wefirstorderthepredictedprobabilitiesfromhighesttolowest.
• Highestprobabilitiesarepredictedtohavethedisease(we’llwanttoclassifythoseas‘disease’).
• Lowestprobabilitiesarepredictedtonothavethedisease(we’llwanttoclassifythoseas‘notdisease’).
• Probability=0.5?Flipofthecoin.
ROCcurve• TocreateanROCcurve,westartatthehighestpredictedprobabilityandworkourwaydownthelist(i.e.highestprobtolowestprob),westopateachpositionC(i.e.potentialcutoff)anduseitastheclassifier(i.e.Corabove=1,belowC=0),andask“Howwelldoesthisclassifierdo?Sensitivity?Specificity?”
preds[1,] 0.9299[2,] 0.9033[3,] 0.8918[4,] 0.8851[5,] 0.8687..
!Mostlikelytohavedisease
Forexample,ifweclassifycaseswithC=0.8918andhigheras‘1’andprobabilitieslessthanC=0.8918as‘0’,howwelldowedo?
Classifiedas‘1’
Classifiedas‘0’
ROCcurve
• NOTE:Ifwestartatthetopofthelistandwedon’tgoveryfardownthelistforC,we’llprobablygetmostlytruepositives(i.e.lowfalsepositiverate,good),buttherearestilllotsoftruepositivesthatwemissed(lowtruepositiverate,bad).
!Mostlikelytohavedisease preds[1,] 0.9299[2,] 0.9033[3,] 0.8918[4,] 0.8851[5,] 0.8687..
Classifiedas‘1’
Classifiedas‘0’
TheROCcurveshowsuswhathappenstosensitivityandspecificityaswemoveourC(thresholdforclassifier)downthelist.
Cathighpredictedprobability(veryfewcasesdeclared‘disease’).Lowfalsepositiverate(good).Lowtruepositiverate(bad).
Catlowpredictedprobability(almostallcasesdeclared‘disease’).Highfalsepositiverate(bad).Hightruepositiverate(good).
Falsepositiverate
True
positivera
te
Sensitivityandspecificityinfofromthe“first”Cthreshold.
Sensitivityandspecificityinfofromthe“last”Cthreshold.
Falsepositiverate
True
positivera
te
EachmovedownthelistofprobabilitiescoincideswitheitheramoveupontheROCcurve(correctchoiceas‘disease’)ortotheright(incorrectchoicewhendeclared‘disease’).Thus,weseea‘stairstep’phenomenonintheROCcurve.Forsmalldatasets,thisisveryapparent.
Weessentiallystartatthepoint(0,0).Ifthefirstcase(highestprobability)iscorrectlyclassifiedwhendeclared‘disease’,wemoveverticallyup(truepositive).Ifthefirstcase(highestprobability)isincorrectlyclassifiedas‘disease,wemovetotheright(falsepositive).
Aswestartmovingdownthelist,wewanttomoveupontheROCcurve,nottotheright.
SeeanimationofROCcurvecreation
http://homepage.stat.uiowa.edu/~rdecook/stat6220/ROC_animated1.html
WhatROCshapesaysit’sagoodclassifier?
• Wewantittogoupverticallyveryquickly– i.e.Aswemovedownthelistofpredictedprobabilities,we’regettingall‘diseased’casesandno‘non-diseased’.
• Weknowintheend(wheneveryoneisclassifiedasa‘disease’case)allthenegativeswillbemisclassifiedandallthepositiveswillbecorrectlyclassified.So,we’llalwaysendat(1,1).
ComparingclassifierswithROCanalysisBestofthese Worstofthese
Choosingatrandom
AUCorareaunderthecurvecomparestheclassifierstoo.
ROCcurve:Provost’sOfficeClient
• Example:Predictwhichstudentswillnotreturnfortheirsecondyearofcollege.
– Weusethepredictedprobabilitiesfromthelogisticregressionforclassification.
– Iftheprobabilityofacasebeinginclass1(notretained)isequaltoorgreaterthan0.5,thatcaseisclassifiedasa1.
– Anycasewithanestimatedprobabilityoflessthan0.5wouldbeclassifiedasa0(retained).
Provost’sOfficeExample
• Logisticregressionvariables– Staffordloan– Liveoncampus– Firstgenerationcollegestudent– RAI– Selectiveprogramofstudy– HighSchoolGPA
Provost’sOfficeExample
• Thepredictedprobabilityofnotbeingretainedisusedtoclassifyeachcase.
• Studentswithaveryhighprobabilityareexpectedtonotreturn.
• Studentswithaverylowprobabilityareexpectedtoreturnintheir2ndyear.
Provost’sOfficeExampleROC from logistic regression classifier
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
TheROCcurvefromthegivenclassifier:logisticregressionpredictedprobabilities…meh
*PlotgeneratedfromROCRpackageinR.
Provost’sOfficeExampleThebluepointrepresentstheclassifierbasedonprobability=0.5cutoff.
Sensitivity:41/859=4.77%Specificity:
4762/4799=99.23%OverallMisclassification:
855/5658=0.1511
ROC from logistic regression classifier
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
*Pointcalculatedbyhandandadded.
Provost’sOfficeExampleWecanalsolookataccuracy(1-misclassificationrate)vs.thecutoff.Thebestcutoffforhighestaccuracyis0.4623.
Sensitivity:66/859=7.68%Specificity:
4750/4799=98.97%OverallMisclassification:
842/5658=0.1488
Cutoff
Accuracy
0.0 0.2 0.4 0.6
0.2
0.3
0.4
0.5
0.6
0.7
0.8
*PlotgeneratedfromROCRpackageinR.
(Prettyclosetoour0.5threshold)
Provost’sOfficeExampleThegreenpointrepresentstheoptimalclassifierbasedonequalimportanceofsensitivityandspecificity(asmaxof“sensitivity+specificity”)whichisprob=0.191cutoff.
Sensitivity:55.2%Specificity:77.8%OverallMisclassification:
1456/5658=0.2573
ROC from logistic regression classifier
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
*PointdeterminedfromEpipackage.
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
cutoff (prob)
SpecificitySensitivity
SensitivitySpecificitySens+SpecDist from top left to ROC curve
Provost’sOfficeExampleOptimalcutoffscanbefoundfromsoftwareorcalculated.
Maxsens+speccutoff(0.191) Our0.5cutoff
Mineucliddistancecutoff(0.146)
Cutoffwhereeveryonedeclaredpositive.
Cutoffwherenoonedeclaredpositive.
ComparingclassifierswithROCsThecurvesmayvisually“look”different,butaretheyreally?Ifwecollecteddataagain,wouldtheylookthisdifferent?
ROC curve with 95% CIs
Specificity (%)
Sen
sitiv
ity (%
)
020
4060
80100
100 80 60 40 20 0
AUC: 71.8% (69.9%–73.7%)AUC: 71.8% (69.9%–73.7%)
BootstrappingtheROC
Canusebootstrappingtocreateaconfidencebandforthesensitivity(givenspecificity)fortheROCcurve.
*PlotgeneratedfrompROCpackageinR.
BootstrappingtheROC
Butthebootstrapresultsdon’talwayslooksonice.
ROC curve with 95% CIs
Specificity (%)
Sen
sitiv
ity (%
)
020
4060
80100
100 80 60 40 20 0
AUC: 83.0% (60.4%–100.0%)AUC: 83.0% (60.4%–100.0%)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
1-Specificity (i.e. % of true negatives incorrectly declared positive)
Sen
sitiv
ity (%
of t
rue
posi
tives
dec
lare
d po
sitiv
e)
*PlotgeneratedfrompROCpackageinR.
BootstrappingROCcurvesforcomparison
Semi-transparentcoloringinRcancome-inhandy.# semi-transparent red: col="#ff000030"
# semi-transparent blue: col="#0000ff20"
ROC curve with 95% Bootstrapped CIs
Specificity (%)
Sen
sitiv
ity (%
)
020
4060
80100
100 80 60 40 20 0*PlotgeneratedfrompROCpackageinR.
############################################ 1a) Manually scripting the ROC curve:############################################
## ‘NotRetained’ is the 0-1 response variable.## ‘preds’ are the leave-one-out predicted probabilities.
## Use 0.5 as the cutoff for classifying:pred.outcome=round(preds)
## Gather the data for the ROC curve:ROC.info.LogReg=data.frame(NotRetained,preds,pred.outcome)## Put in order by predicted probability, largest first:ROC.info.LogReg=ROC.info.LogReg[rev(order(ROC.info.LogReg$preds)),]
## A 'positive' will be considered as someone who was not retained.(num.true.pos=sum(NotRetained==1))(num.true.neg=sum(NotRetained==0))plot(c(0,1),c(0,1),type="n",xlab="1-Specificity (i.e. % of true neg's incorrectly declared positive)",ylab="Sensitivity (% of true pos's declared positive)",main="ROC curve (logistic regression)",sub="False Positive Rate")x=cumsum(1-ROC.info.LogReg$NotRetained)/num.true.negy=cumsum(ROC.info.LogReg$NotRetained)/num.true.poslines(x,y)abline(0,1)
#################################### 1b) Manually calculating AUC:####################################
## Calculate AUC thinking as geometric trapezoid shapes:#http://stats.stackexchange.com/questions/145566/how-to-calculate-area-under-the-curve-auc-or-the-c-statistic-by-hand
## Same x,y labeling as previous slide:x=cumsum(1-ROC.info.LogReg$NotRetained)/num.true.negy=cumsum(ROC.info.LogReg$NotRetained)/num.true.pos
height = (y[-1]+y[-n])/2width = diff(x) sum(height*width)
########################################################## 1c) Manually calculating sensitivity, specificity, ## ## misclassification rate for 0.5 threshold: ##########################################################
## All the needed info is in the following table:table(NotRetained,pred.outcome) # 0 1# 0 4762 37# 1 818 41
################################################ 2) ROCR package for creating ROC curve: ################################################library(ROCR)
pred=prediction(preds,NotRetained)
perf=performance(pred,"tpr","fpr")
plot(perf,main="ROC from logistic regression classifier")
(AUC.ROCR=performance(pred,"auc")@y.values[[1]])
## Plot accuracy (1-misclassification rate) vs. cutoff:acc = performance(pred, "acc")(ac.val = max(unlist([email protected])))#[1] 0.8511842
th = unlist([email protected])[unlist([email protected]) == ac.val]
plot(acc)abline(v=th, col='grey', lty=2)th#[1] 0.462324
################################################ 3) pROC package for bootstrapped ROC: ################################################library(pROC)
## ‘NotRetained’ is the 0-1 response variable.## ‘preds’ are the leave-one-out predicted probabilities.
## Use 0.5 as the cutoff for classifying:pred.outcome=round(preds)
rocobj=plot.roc(NotRetained, preds, main="ROC curve with 95% CIs", percent=TRUE, ci=TRUE, print.auc=TRUE)
## Calculate CI of sensitivity at select set of ## specificities and form a 'band' (might take a bit): ciobj=ci.se(rocobj,specificities=seq(0, 100, 5)) plot(ciobj, type="shape", col="#1c61b6AA") # blue band
## Use bootstrap to add CI in both directions at "best" cutoff: plot(ci(rocobj, of="thresholds", thresholds="best"),col="yellow",lwd=2)
## Overlay Bootstrapped ROC's:rocobj=plot.roc(NotRetained, preds, main="ROC curve with 95% Bootstrapped CIs", percent=TRUE, ci=TRUE, print.auc=FALSE)
ciobj=ci.se(rocobj,specificities=seq(0, 100, 5)) plot(ciobj, type="shape", col="#ff000030") # semi-transparent red color
## Gather info on second classifier:ROC.info.LogReg.2=data.frame(NotRetained,preds.2,pred.outcome.2)
## Overlay the second ROC curve onto the first:rocobj2=plot.roc(ROC.info.LogReg.2$NotRetained, ROC.info.LogReg.2$preds.2, percent=TRUE, ci=TRUE, print.auc=FALSE, add=TRUE)
ciobj2=ci.se(rocobj2,specificities=seq(0, 100, 5)) plot(ciobj2, type="shape", col="#0000ff20") # semi-transparent blue color
Somereferences
• Flach,P.A.,(2016).ROCAnalysis.ChapterinEncyclopediaofMachineLearningandDataMining.– https://research-information.bristol.ac.uk/files/94977288/Peter_Flach_ROC_Analysis.pdf
• James,G.,Witten,D.,Hastie,T.andTibshirani,R.(2015).AnIntroductiontoStatisticalLearningwithApplicationsinR.– http://www-bcf.usc.edu/~gareth/ISL/– Click‘DownloadthebookPDF’
• Fawcett,T.(2006).AnIntroductiontoROCAnalysis.PatternRecognitionLetters27pp.861-874.