Post on 13-Jan-2020
transcript
Spervised Classification: an Exploration with RErwan Le Pennec and Ana Karina Fermin
Fall 2015
We will use the twoClass dataset from Applied Predictive Modeling, the book of M. Kuhn and K. Johnson toillustrate the most classical supervised classification algorithms. We will use some advanced R packages: theggplot2 package for the figures and the caret package for the learning part. caret that provides an unifiedinterface to many other packages. We will also use h2o, a package dedicated to in memory learning of largedata set, for its deep learning algorithm.
library("plyr")library("dplyr")library("ggplot2")library("grid")library("gridExtra")library("caret")library("h2o")library("doMC")registerDoMC(cores = 3)
The twoClass dataset
We read first the dataset and use ggplot2 to display it.
library(AppliedPredictiveModeling)library(RColorBrewer)data(twoClassData)twoClass=cbind(as.data.frame(predictors),classes)twoClassColor <- brewer.pal(3,'Set1')[1:2]names(twoClassColor) <- c('Class1','Class2')ggplot(data = twoClass,aes(x = PredictorA, y = PredictorB)) +
geom_point(aes(color = classes), size = 6, alpha = .5) +scale_colour_manual(name = 'classes', values = twoClassColor) +scale_x_continuous(expand = c(0,0)) +scale_y_continuous(expand = c(0,0))
1
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
We create a few functions that will be useful to display our classifiers.
nbp <- 250;PredA <- seq(min(twoClass$PredictorA), max(twoClass$PredictorA), length = nbp)PredB <- seq(min(twoClass$PredictorB), max(twoClass$PredictorB), length = nbp)Grid <- expand.grid(PredictorA = PredA, PredictorB = PredB)
PlotGrid <- function(pred,title) {surf <- (ggplot(data = twoClass, aes(x = PredictorA, y = PredictorB,
color = classes)) +geom_tile(data = cbind(Grid, classes = pred), aes(fill = classes)) +scale_fill_manual(name = 'classes', values = twoClassColor) +ggtitle("Decision region") + theme(legend.text = element_text(size = 10)) +
scale_colour_manual(name = 'classes', values = twoClassColor)) +scale_x_continuous(expand = c(0,0)) +scale_y_continuous(expand = c(0,0))pts <- (ggplot(data = twoClass, aes(x = PredictorA, y = PredictorB,
color = classes)) +geom_contour(data = cbind(Grid, classes = pred), aes(z = as.numeric(classes)),
color = "red", level = 1.5) +geom_point(size = 4, alpha = .5) +ggtitle("Decision boundary") +theme(legend.text = element_text(size = 10)) +scale_colour_manual(name = 'classes', values = twoClassColor)) +
scale_x_continuous(expand = c(0,0)) +scale_y_continuous(expand = c(0,0))
grid.arrange(surf, pts, main = textGrob(title, gp = gpar(fontsize = 20)), ncol = 2)}
2
As explained in the introduction, we will use caret for the learning part. This package provides a unifiedinterface to a huge number of classifier available in R. It is a very powerful tool when exploring the differentmodels. In particular, it proposes to compute a resampling accuracy estimate and gives the user the choice ofthe specific methodology. We will use a repeated V-fold strategy with 10 folds and 2 repetitions. We willreuse the same seed for each model in order to be sure that the same folds are used.
library("caret")V <- 10T <- 4TrControl <- trainControl(method = "repeatedcv",
number = V,repeats = T)
Seed <- 345
Finally, we provide a function that will store the accuracies for every resample and every model. We will alsocompute an accuracy based on the same data than the one used to learn in order to show the over-fittingphenomenon.
ErrsCaret <- function(Model, Name) {Errs <- data.frame(t(postResample(predict(Model, newdata = twoClass), twoClass[["classes"]])),
Resample = "None", model = Name)rbind(Errs, data.frame(Model$resample, model = Name))
}
Errs <- data.frame()
We are now ready to define a function that take in input the current collection of accuracies, a name of amodel, the corresponding formula and methods, as well as more parameters used to specify the model, andcomputes the trained model, displays its prediction in a figure and add the errors in the collection.
CaretLearnAndDisplay <- function (Errs, Name, Formula, Method, ...) {set.seed(Seed)Model <- train(as.formula(Formula), data = twoClass, method = Method, trControl = TrControl, ...)Pred <- predict(Model, newdata = Grid)PlotGrid(Pred, Name)Errs <- rbind(Errs, ErrsCaret(Model, Name))
}
We can apply this function to any model available in caret. We will pick a few models and sort themdepending on the heuristic used to define them. We will distinguish models coming from a statistical pointof view in which one try to estimate the conditional law and plug it into the Bayes classifier and from anoptimization point of view in which one try to enforce a small training error by minimizing a relaxed criterion.
Statistical point of view
Generative Modeling
In the generative modeling approach, one propose to estimate the joint law of the covariates and the labeland to derive the conditional law using the Bayes formula. The generative models differ by the choice of thedensity estimator (often specified by an intra class model). We consider here the LDA and QDA methods, inwhich a Gaussian model is used, and two variants of the Naive Bayes method, in which all the features areassumed to be independent.
3
Errs <- CaretLearnAndDisplay(Errs, "Linear Discrimant Analysis", "classes ~ .", "lda");
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Linear Discrimant AnalysisErrs <- CaretLearnAndDisplay(Errs, "Quadratic Discrimant Analysis", "classes ~ . ", "qda");
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Quadratic Discrimant AnalysisErrs <- CaretLearnAndDisplay(Errs, "Naive Bayes with Gaussian model", "classes ~ .", "nb",
tuneGrid = data.frame(usekernel = c(FALSE), fL = c(0)))
4
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Naive Bayes with Gaussian modelErrs <- CaretLearnAndDisplay(Errs, "Naive Bayes with kernel density estimates", "classes ~ .", "nb",
tuneGrid = data.frame(usekernel = c(TRUE), fL = c(0)))
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Naive Bayes with kernel density estimates
Logistic regression
The most classical model is probably the logistic model, which is the canonical example of a parametricconditional regression estimate.
Errs <- CaretLearnAndDisplay(Errs, "Logistic", "classes ~ .", "glm")
5
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
LogisticWe are not restricted to the use of two features but may use any transformation of them. For instance, wemay use all the possible monomials up to degree 2.
Errs <- CaretLearnAndDisplay(Errs, "Quadratic Logistic", "classes ~ PredictorA + PredictorB ++ I(PredictorA^2) + I(PredictorB^2)+ I(PredictorA*PredictorB)", "glm")
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Quadratic Logistic
Kernel Methods
In the nearest neighbor method, we need to supply a parameter: k the number of neighbors used to definethe kernel. We will compare visually the solution obtained with a few k values.
6
ErrsKNN <- data.frame()KNNKS <- c(1, 5, 9, 13, 17, 21, 25, 29)for (k in KNNKS) {
ErrsKNN <- CaretLearnAndDisplay(ErrsKNN, sprintf("k-NN with k=%i", k),"classes ~ .","knn", tuneGrid = data.frame(k = c(k)))
}
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=1
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=5
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=9
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=13
7
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=17
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=21
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=25
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with k=29Errs <- rbind(Errs, ErrsKNN)
caret provides an automatic way to select the best k, or any method parameter, using the chosen resamplingscheme.
Errs <- CaretLearnAndDisplay(Errs, "k-NN with CV choice", "classes ~ .", "knn",tuneGrid = data.frame(k = KNNKS))
8
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
k−NN with CV choiceWe will look more closely at this choice at the end of this document.
Optimization point of view
SVM
Here is first the optimal hyperplan using the plain vanilla SVM.
Errs <- CaretLearnAndDisplay(Errs, "Support Vector Machine", "classes ~ .", "svmLinear")
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Support Vector MachineIf we use a polynomial kernel (with automatic degree selection), we can obtain a more complex classifier.
9
Errs <- CaretLearnAndDisplay(Errs, "Support Vector Machine with polynomial kernel","classes ~ .", "svmPoly")
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Support Vector Machine with polynomial kernelWe can also try a Gaussian kernel.
Errs <- CaretLearnAndDisplay(Errs, "Support Vector Machine with Gaussian kernel","classes ~ .", "svmRadial")
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Support Vector Machine with Gaussian kernel
NN
Neural networks are also present in caret.
10
Errs <- CaretLearnAndDisplay(Errs, "Neural Network", "classes ~ .", "mlp")
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Neural NetworkA promising, but not adapted to this too small dataset, is the h2o package which is an interface to the highlyscalable h2o in memory learning library.
Name <- "H2O NN"
Localh2o <- h2o.init()twoClass.h2o <- as.h2o(Localh2o, twoClass)Model.h2o <- h2o.deeplearning(x = 1:2, y = 3,
training_frame = twoClass.h2o,activation = "Tanh",hidden = c(4,3),loss = 'CrossEntropy')
Grid.h2o <- as.h2o(Localh2o, Grid)Pred.h2o <- h2o.predict(Model.h2o, newdata = Grid.h2o)Pred <- factor(as.matrix(Pred.h2o$predict),
levels = c('Class1','Class2'))PlotGrid(Pred, Name)
11
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
H2O NNPred.h2o <- h2o.predict(Model.h2o, newdata = twoClass.h2o)Pred <- factor(as.matrix(Pred.h2o$predict),
levels = c('Class1','Class2'))Errs <- rbind(Errs, data.frame(t(postResample(Pred, twoClass[["classes"]])),
Resample = "None", model = Name))
set.seed(Seed)Folds <- caret::createMultiFolds(twoClass[["classes"]], k = V, times = T)
for (t in 1:T) {for (v in 1:V) {
twoClassTrain <- slice(twoClass, Folds[[(t-1)*V+v]])twoClassTest <- slice(twoClass, -Folds[[(t-1)*V+v]])twoClassTrain.h2o <- as.h2o(Localh2o, twoClassTrain)Model.h2o <- h2o.deeplearning(x = 1:2, y = 3,
training_frame = twoClassTrain.h2o,activation = "Tanh",hidden = c(4,3),
loss = 'CrossEntropy')twoClassTest.h2o <- as.h2o(Localh2o, twoClassTest)Pred.h2o <- h2o.predict(Model.h2o, newdata = twoClassTest.h2o)Pred <- factor(as.matrix(Pred.h2o$predict),
levels = c('Class1','Class2'))Errs <- rbind(Errs, data.frame(t(postResample(Pred, twoClassTest[["classes"]])),
Resample = sprintf("Fold%d.Rep%d",v,t), model = Name))}
}
Trees
We conclude this model exploration by some tree base methods: plain tree, bagging, random forest, adaboost,C5.0. . .
12
Errs <- CaretLearnAndDisplay(Errs, "CART", "classes ~ .", "rpart",control = rpart.control(minsplit = 5, cp = 0), tuneGrid = data.frame(cp = .02))
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
CARTlibrary(rpart.plot)Tree <- train(classes ~ ., data = twoClass, method = "rpart", control = rpart.control(minsplit = 5, cp = 0),
tuneGrid = data.frame(cp = .02), trControl = TrControl)Tree$finalModel
## n= 208#### node), split, n, loss, yval, (yprob)## * denotes terminal node#### 1) root 208 97 Class1 (0.5336538 0.4663462)## 2) PredictorB>=0.19795 121 30 Class1 (0.7520661 0.2479339)## 4) PredictorA>=0.12705 114 25 Class1 (0.7807018 0.2192982)## 8) PredictorA< 0.31135 56 6 Class1 (0.8928571 0.1071429) *## 9) PredictorA>=0.31135 58 19 Class1 (0.6724138 0.3275862)## 18) PredictorB>=0.28795 43 9 Class1 (0.7906977 0.2093023)## 36) PredictorA< 0.61615 41 7 Class1 (0.8292683 0.1707317) *## 37) PredictorA>=0.61615 2 0 Class2 (0.0000000 1.0000000) *## 19) PredictorB< 0.28795 15 5 Class2 (0.3333333 0.6666667) *## 5) PredictorA< 0.12705 7 2 Class2 (0.2857143 0.7142857)## 10) PredictorB>=0.3196 2 0 Class1 (1.0000000 0.0000000) *## 11) PredictorB< 0.3196 5 0 Class2 (0.0000000 1.0000000) *## 3) PredictorB< 0.19795 87 20 Class2 (0.2298851 0.7701149) *
rpart.plot(Tree$finalModel)
13
PredictorB >= 0.2
PredictorA >= 0.13
PredictorA < 0.31
PredictorB >= 0.29
PredictorA < 0.62
PredictorB >= 0.32
Class1
Class1 Class2
Class2
Class1 Class2
Class2
yes no
prp(Tree$finalModel, type = 2, extra = 104, nn = TRUE, fallen.leaves = TRUE,box.col = twoClassColor[Tree$finalModel$frame$yval])
PredictorB >= 0.2
PredictorA >= 0.13
PredictorA < 0.31
PredictorB >= 0.29
PredictorA < 0.62
PredictorB >= 0.32
Class1.53 .47100%
Class1.75 .25
58%
Class1.78 .22
55%
Class1.89 .11
27%
Class1.67 .33
28%
Class1.79 .21
21%
Class1.83 .17
20%
Class2.00 1.00
1%
Class2.33 .67
7%
Class2.29 .71
3%
Class11.00 .00
1%
Class2.00 1.00
2%
Class2.23 .77
42%
yes no
1
2
4
8
9
18
36 37 19
5
10 11 3
Errs <- CaretLearnAndDisplay(Errs, "Bagging", "classes ~ .", "treebag",control = rpart.control(minsplit = 5))
14
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
BaggingErrs <- CaretLearnAndDisplay(Errs, "Random Forest", "classes ~ .", "rf",
tuneLength = 1,control = rpart.control(minsplit = 5))
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
Random ForestErrs <- CaretLearnAndDisplay(Errs, "AdaBoost", "classes ~ .", "ada",
control = rpart.control(minsplit = 5))
15
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
AdaBoostErrs <- CaretLearnAndDisplay(Errs, "C5.0", "classes ~ .", "C5.0")
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision region
0.2
0.4
0.6
0.2 0.4 0.6PredictorA
Pre
dict
orB classes
Class1
Class2
Decision boundary
C5.0
Model comparison
We will now compare our model according to 4 criterion: - Accuracy: the naive empirical accuracy computedon the same dataset than the one used to learn the classifier, - AccuracyCV: the mean of the accuraciesobtained by the resampling scheme, - AccuracyInf : a lowest bound on the mean accuracy obtained bysubstracting to the mean two times the standard deviation divided by the square root of the number ofresamples. - AccuracyPAC: a highly probably bound on the accuracy obtained by substracting the standarddeviation.
16
Global comparison
ErrCaretAccuracy <- function(Errs) {Errs <- group_by(Errs, model)cbind(dplyr::summarize(Errs, mAccuracy = mean(Accuracy, na.rm = TRUE), mKappa = mean(Kappa, na.rm = TRUE),
sdAccuracy = sd(Accuracy, na.rm = TRUE), sdKappa = sd(Kappa, na.rm = TRUE)))}
ErrAndPlotErrs <- function(Errs) {ErrCV <- ErrCaretAccuracy(dplyr::filter(Errs, !(Resample == "None")))ErrCV <- transmute(ErrCV, AccuracyCV = mAccuracy, AccuracyCVInf = mAccuracy - 2 * sdAccuracy/sqrt(T * V),
AccuracyCVPAC = mAccuracy - sdAccuracy,model = model)
ErrEmp <- dplyr::select(dplyr::filter(Errs, (Resample == "None")), Accuracy, model)Err <- dplyr::left_join(ErrCV, ErrEmp)print(ggplot(data = reshape2::melt(Err, "model"),
aes(x = model, y = value, color = variable)) +geom_point(size = 5) +theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.margin = grid::unit(c(1, 1, .5, 1.5), "lines")))Err}
Err <- ErrAndPlotErrs(Errs)
0.6
0.7
0.8
0.9
1.0
Linea
r Disc
riman
t Ana
lysis
Quadr
atic
Discrim
ant A
nalys
is
Naive
Bayes
with
Gau
ssian
mod
el
Naive
Bayes
with
kern
el de
nsity
esti
mat
es
Logis
tic
Quadr
atic
Logis
tic
k−NN w
ith k=
1
k−NN w
ith k=
5
k−NN w
ith k=
9
k−NN w
ith k=
13
k−NN w
ith k=
17
k−NN w
ith k=
21
k−NN w
ith k=
25
k−NN w
ith k=
29
k−NN w
ith C
V choic
e
Suppo
rt Ve
ctor M
achin
e
Suppo
rt Ve
ctor M
achin
e with
poly
nom
ial ke
rnel
Suppo
rt Ve
ctor M
achin
e with
Gau
ssian
kern
el
Neura
l Net
work
H2O N
NCART
Baggin
g
Rando
m F
ores
t
AdaBoo
stC5.
0
model
valu
e
variable
AccuracyCV
AccuracyCVInf
AccuracyCVPAC
Accuracy
17
FindBestErr <- function(Err) {for (name in names(Err)[!(names(Err) == "model")]) {
ind <- which.max(Err[, name])writeLines(strwrap(paste("Best method according to", name, ":", Err[ind, "model"])))}
}
FindBestErr(Err)
## Best method according to AccuracyCV : k-NN with k=25## Best method according to AccuracyCVInf : k-NN with k=25## Best method according to AccuracyCVPAC : Support Vector Machine## with Gaussian kernel## Best method according to Accuracy : k-NN with k=1
An interesting plot to assess the variability of the Cross Validation result is the following parallel plot inwhich an error is computed for each resample in addition to the mean.
PlotParallels <- function(Errs) {pl <- ggplot(data = dplyr::filter(Errs, Resample != "None"),
aes(x = model, y = Accuracy, color = Resample)) +geom_line(aes(x = as.numeric(model))) +scale_x_discrete(labels = unique(Errs$model)) +scale_color_grey(guide = FALSE)
Err <- dplyr::summarize(group_by(dplyr::filter(Errs, Resample != "None"), model),Accuracy = mean(Accuracy), Resample = "CV")
pl <- pl + geom_line(data = Err, aes(x = as.numeric(model)), size = 3) +theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.margin = grid::unit(c(1, 1, .5, 1.5), "lines"))print(pl)
}
PlotParallels(Errs)
18
0.5
0.6
0.7
0.8
0.9
Linea
r Disc
riman
t Ana
lysis
Quadr
atic
Discrim
ant A
nalys
is
Naive
Bayes
with
Gau
ssian
mod
el
Naive
Bayes
with
kern
el de
nsity
esti
mat
es
Logis
tic
Quadr
atic
Logis
tic
k−NN w
ith k=
1
k−NN w
ith k=
5
k−NN w
ith k=
9
k−NN w
ith k=
13
k−NN w
ith k=
17
k−NN w
ith k=
21
k−NN w
ith k=
25
k−NN w
ith k=
29
k−NN w
ith C
V choic
e
Suppo
rt Ve
ctor M
achin
e
Suppo
rt Ve
ctor M
achin
e with
poly
nom
ial ke
rnel
Suppo
rt Ve
ctor M
achin
e with
Gau
ssian
kern
el
Neura
l Net
work
H2O N
NCART
Baggin
g
Rando
m F
ores
t
AdaBoo
stC5.
0
model
Acc
urac
y
K-NN Comparison
We focus here on the choice of the neighborhood in the k Nearest Neighbor method.
ErrKNN <- ErrAndPlotErrs(ErrsKNN)
19
0.6
0.7
0.8
0.9
1.0
k−NN w
ith k=
1
k−NN w
ith k=
5
k−NN w
ith k=
9
k−NN w
ith k=
13
k−NN w
ith k=
17
k−NN w
ith k=
21
k−NN w
ith k=
25
k−NN w
ith k=
29
model
valu
e
variable
AccuracyCV
AccuracyCVInf
AccuracyCVPAC
Accuracy
FindBestErr(ErrKNN)
## Best method according to AccuracyCV : k-NN with k=25## Best method according to AccuracyCVInf : k-NN with k=25## Best method according to AccuracyCVPAC : k-NN with k=25## Best method according to Accuracy : k-NN with k=1
20