8/7/2019 Stat 586 Discriminant Analysis
1/41
Discriminant AnalysisDiscriminant Analysis
An IntroductionAn Introduction
8/7/2019 Stat 586 Discriminant Analysis
2/41
22
Problem descriptionProblem description
We wish to predict group membership forWe wish to predict group membership fora number of subjects from a set ofa number of subjects from a set of
predictor variables.predictor variables. The criterion variable (also called groupingThe criterion variable (also called grouping
variable) is the object of classification. Thisvariable) is the object of classification. Thisis ALWAYS a categorical variable!!!is ALWAYS a categorical variable!!!
Simple case: two groups andSimple case: two groups and pp predictorpredictorvariables.variables.
8/7/2019 Stat 586 Discriminant Analysis
3/41
33
ExampleExample
We want to know whether somebody hasWe want to know whether somebody haslung cancer. Hence, we wish to predict alung cancer. Hence, we wish to predict a
yes or no outcome.yes or no outcome. Possible predictor variables: number ofPossible predictor variables: number of
cigarettes smoked a day, caughingcigarettes smoked a day, caughingfrequency and intensity etc.frequency and intensity etc.
8/7/2019 Stat 586 Discriminant Analysis
4/41
44
Approach (1)Approach (1)
Linear discriminant analysis constructs one orLinear discriminant analysis constructs one ormoremorediscriminant equationsdiscriminant equations DDii (linear(linearcombinations of the predictor variables Xcombinations of the predictor variables Xkk))
such that the different groups differ as muchsuch that the different groups differ as muchas possible on D.as possible on D.
Discriminant function:Discriminant function:
k
p
k
ki XbbD !
!1
0
8/7/2019 Stat 586 Discriminant Analysis
5/41
55
Approach (2)Approach (2)
More precisely, the weights of theMore precisely, the weights of thediscriminant function are calculated indiscriminant function are calculated in
such a way, that the ratio (between groupssuch a way, that the ratio (between groupsSS)/(within groups SS) is as large asSS)/(within groups SS) is as large aspossible.possible.
Number of discriminant functions =Number of discriminant functions =
min(number of groupsmin(number of groups 1,p).1,p).
8/7/2019 Stat 586 Discriminant Analysis
6/41
DefinitionsDefinitions
Suppose we have a set of g classes.Suppose we have a set of g classes.
Let W denote the withinLet W denote the within--class covariance matrix, that isclass covariance matrix, that isthe covariance matrix of the variables centered on thethe covariance matrix of the variables centered on the
class mean.class mean. B denote the betweenB denote the between--classes covariance matrix, that is,classes covariance matrix, that is,
of the predictions by the class means.of the predictions by the class means.
The sample covariances are:The sample covariances are:
4/13/20114/13/2011 66
8/7/2019 Stat 586 Discriminant Analysis
7/41
77
InterpretationInterpretation
First discriminant function DFirst discriminant function D11 distinguishesdistinguishesfirst group from groups 2,3,..N.first group from groups 2,3,..N.
Second discriminant function DSecond discriminant function D22distinguishes second group from groups 3,distinguishes second group from groups 3,4,N.4,N.
etcetc
8/7/2019 Stat 586 Discriminant Analysis
8/41
88
Visualization (two outcomes)Visualization (two outcomes)
8/7/2019 Stat 586 Discriminant Analysis
9/41
99
Visualization (3 outcomes)Visualization (3 outcomes)
8/7/2019 Stat 586 Discriminant Analysis
10/41
1010
Approach (3)Approach (3)
To calculate the optimal weights, a trainingTo calculate the optimal weights, a trainingset is used containing the correctset is used containing the correctclassification for a group of subjects.classification for a group of subjects.
EXAMPLE (lung cancer):EXAMPLE (lung cancer):We need data about persons for whom weWe need data about persons for whom weknow for sure that they had lung cancerknow for sure that they had lung cancer
(e.g. established by means of an(e.g. established by means of anoperation, scan, or xrays)!operation, scan, or xrays)!
8/7/2019 Stat 586 Discriminant Analysis
11/41
1111
Approach (4)Approach (4)
For a new group of subjects for whom weFor a new group of subjects for whom wedo not yet know the group they belong to,do not yet know the group they belong to,we can use the previously calculatedwe can use the previously calculateddiscriminant weights to obtain theirdiscriminant weights to obtain theirdiscriminant scores.discriminant scores.
We call this classification.We call this classification.
8/7/2019 Stat 586 Discriminant Analysis
12/41
1212
Technical detailsTechnical details
The calculation of optimal discriminantThe calculation of optimal discriminantweights involves some mathematics.weights involves some mathematics.
8/7/2019 Stat 586 Discriminant Analysis
13/41
1313
Example (1)Example (1)
The famous (Fisher's or Anderson's) irisThe famous (Fisher's or Anderson's) irisdata set gives the measurements indata set gives the measurements incentimeters of the variablescentimeters of the variables sepal lengthsepal lengthandand widthwidth andand petal lengthpetal length andand widthwidth,,respectively, for50 flowers from each of3respectively, for50 flowers from each of3species of iris.species of iris.
The species are IrisThe species are Iris setosasetosa,,versicolorversicolor,,andand virginicavirginica..
8/7/2019 Stat 586 Discriminant Analysis
14/41
1414
Obs S.Length S.Width P.Length P.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa7 4.6 3.4 1.4 0.3 setosa8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
Fragment of data setFragment of data set
8/7/2019 Stat 586 Discriminant Analysis
15/41
1515
Example (2)Example (2)
Dependent variable?Dependent variable?
Predictor variables?Predictor variables?
Number of discriminant functions?Number of discriminant functions?
8/7/2019 Stat 586 Discriminant Analysis
16/41
1616
Step 1: Analyze dataStep 1: Analyze data
The idea is to start with analyzing the data.The idea is to start with analyzing the data.
We start with linear discriminant analysis.We start with linear discriminant analysis.
Do the predictors vary sufficiently over theDo the predictors vary sufficiently over thedifferent groups?different groups?
If not, they will be bad predictors.If not, they will be bad predictors.
Formal test for this:Wilks testFormal test for this:Wilks test
This test assesses whether the predictorsThis test assesses whether the predictorsvary enough to distinguish differentvary enough to distinguish differentgroups.groups.
8/7/2019 Stat 586 Discriminant Analysis
17/41
4/13/20114/13/2011 1717
Step 1a: Sample statisticsStep 1a: Sample statistics
Call:Call:
iris.lda
8/7/2019 Stat 586 Discriminant Analysis
18/41
VisualizationVisualization plot(iris.lda)plot(iris.lda)
1818
8/7/2019 Stat 586 Discriminant Analysis
19/41
1919
Step 1b: Formal testStep 1b: Formal test
X
8/7/2019 Stat 586 Discriminant Analysis
20/41
2020
Step 2: Discriminant function (1)Step 2: Discriminant function (1)
Look at the coefficients of theLook at the coefficients of thestandardized (!) discriminant functions tostandardized (!) discriminant functions tosee what predictors play an important role.see what predictors play an important role.
The larger the coefficient of a predictor inThe larger the coefficient of a predictor inthe standardized discriminant function, thethe standardized discriminant function, themore important its role in the discriminantmore important its role in the discriminant
function.function.
8/7/2019 Stat 586 Discriminant Analysis
21/41
2121
Step 2: Discriminant function (2)Step 2: Discriminant function (2)
The coefficients represent partialThe coefficients represent partialcorrelations:correlations:
the contribution of a variable to thethe contribution of a variable to thediscriminant function in the context of thediscriminant function in the context of theother predictor variables in the model.other predictor variables in the model.
Limitations: with more than two outcomesLimitations: with more than two outcomesmore difficult to interpret.more difficult to interpret.
8/7/2019 Stat 586 Discriminant Analysis
22/41
4/13/20114/13/2011 2222
Step 2: Getting discr. functionsStep 2: Getting discr. functions
Call:Call:
iris.lda
8/7/2019 Stat 586 Discriminant Analysis
23/41
2323
Step 3: Comparing discr. funcsStep 3: Comparing discr. funcs
Which discriminant function has mostWhich discriminant function has mostdiscriminating power?discriminating power?
Look at the eigenvalues, also called theLook at the eigenvalues, also called the
singular values or characteristic roots. Eachsingular values or characteristic roots. Eachdiscriminant function has such a value. Theydiscriminant function has such a value. Theyreflect the amount of varianceexplained in thereflect the amount of varianceexplained in thegrouping variable by the predictors in agrouping variable by the predictors in adiscriminant function.discriminant function.
Always look at the ratio of theeigenvalues toAlways look at the ratio of theeigenvalues toassess the relative importance of a discriminantassess the relative importance of a discriminantfunction.function.
8/7/2019 Stat 586 Discriminant Analysis
24/41
2424
Step 3: Getting eigenvaluesStep 3: Getting eigenvalues
iris.lda$svdiris.lda$svd
> iris.lda$svd> iris.lda$svd
[1]48.642644 4.579983[1]48.642644 4.579983
svd: the singular values, which give thesvd: the singular values, which give theratio of the betweenratio of the between-- and withinand within--groupgroupstandard deviations on the linearstandard deviations on the lineardiscriminant variables.discriminant variables.
belongsto D1
belongsto D2
8/7/2019 Stat 586 Discriminant Analysis
25/41
2525
Step 4: More interpretationStep 4: More interpretation
TraceTrace
Useful plotsUseful plots
Group centroidsGroup centroids
8/7/2019 Stat 586 Discriminant Analysis
26/41
2626
Step 4a: TraceStep 4a: TraceCall:Call:iris.lda
8/7/2019 Stat 586 Discriminant Analysis
27/41
2727
Step 4a: Trace interpretationStep 4a: Trace interpretation
The first trace number indicates theThe first trace number indicates thepercentage of betweenpercentage of between--group variancegroup variancethat the first discriminant function is able tothat the first discriminant function is able toexplain from the total amount of betweenexplain from the total amount of between--group variance.group variance.
High trace number = discriminant functionHigh trace number = discriminant function
plays an important role!plays an important role!
8/7/2019 Stat 586 Discriminant Analysis
28/41
2828
Step 4b: Useful plotsStep 4b: Useful plots
Takee.g. first and second discriminantTakee.g. first and second discriminantfunction. Plot discriminant function valuesfunction. Plot discriminant function valuesof objects in scatter plot, with predictedof objects in scatter plot, with predicted
groups. Does the discriminant functiongroups. Does the discriminant functiondiscriminate well between the differentdiscriminate well between the differentgroups?groups?
Combine plot with group centroids.Combine plot with group centroids.(Average values of discriminant functions(Average values of discriminant functionsforeach group)foreach group)
8/7/2019 Stat 586 Discriminant Analysis
29/41
2929
Step 4c: R code for plotStep 4c: R code for plot# Plot# Plot
LD1
8/7/2019 Stat 586 Discriminant Analysis
30/41
3030
Step 5: Prediction (1)Step 5: Prediction (1)
Using theestimated discriminant model, classifyUsing theestimated discriminant model, classifynew subjects.new subjects.
Various ways to do this.Various ways to do this.
We consider the following approach:We consider the following approach:
Calculate the probability that a subject belongs to aCalculate the probability that a subject belongs to acertain group using theestimated discriminant model.certain group using theestimated discriminant model.
Do this for all groups.Do this for all groups.
Classification rule: subject is assigned to group it hasClassification rule: subject is assigned to group it hasthe highest probability to fall into.the highest probability to fall into.
8/7/2019 Stat 586 Discriminant Analysis
31/41
3131
Step 5: Bayes ruleStep 5: Bayes rule
Formula used to calculate probability that aFormula used to calculate probability that asubject belongs to a group:subject belongs to a group:
!
!
N
k
kk
ii
GPGP
GPGPiGp
1
)|()(
)|()()|(
priors
8/7/2019 Stat 586 Discriminant Analysis
32/41
3232
Step 5: Prediction (2)Step 5: Prediction (2)
To determine these probabilities, a priorTo determine these probabilities, a priorprobability is required. These priorsprobability is required. These priorsrepresent the probability that a subjectrepresent the probability that a subjectbelongs to a particular groups.belongs to a particular groups.
Usually, we set them equal to the fractionUsually, we set them equal to the fractionof subjects in a particular group.of subjects in a particular group.
8/7/2019 Stat 586 Discriminant Analysis
33/41
3333
Step 5: Prediction (3)Step 5: Prediction (3)
Prediction on training set: to assess howPrediction on training set: to assess howwell the discriminant model predicts.well the discriminant model predicts.
Prediction on a new data set: to predict thePrediction on a new data set: to predict thegroup new object belongs to.group new object belongs to.
8/7/2019 Stat 586 Discriminant Analysis
34/41
3434
Step 5: Prediction in RStep 5: Prediction in R
iris.predict
8/7/2019 Stat 586 Discriminant Analysis
35/41
3535
Step 5: Quality of prediction (1)Step 5: Quality of prediction (1)
To assess thequality of a prediction, makeTo assess thequality of a prediction, makea prediction table.a prediction table.
Rows with observed categories ofRows with observed categories ofdependent variable, columns withdependent variable, columns withforecasted categories.forecasted categories.
Ideally, the offIdeally, the off--diagonal elements shoulddiagonal elements shouldbe zero.be zero.
8/7/2019 Stat 586 Discriminant Analysis
36/41
3636
Step 5: Quality of prediction (2)Step 5: Quality of prediction (2)
The percentage correctly classified objectsThe percentage correctly classified objectsis usually compared tois usually compared to
the random classificationthe random classification(100/N)% probability in group i=1,,N.(100/N)% probability in group i=1,,N.
the probability matching classifcationthe probability matching classifcationProbability of assigning group i=1,,N to anProbability of assigning group i=1,,N to an
object is equal to the fraction of objects inobject is equal to the fraction of objects inclass i.class i.
8/7/2019 Stat 586 Discriminant Analysis
37/41
3737
Step 5: Quality of prediction (3)Step 5: Quality of prediction (3)
the probability maximizing method.the probability maximizing method.Put all subjects in the most likely category (i.e.Put all subjects in the most likely category (i.e.the category with the highest fraction ofthe category with the highest fraction of
objects in it).objects in it).
8/7/2019 Stat 586 Discriminant Analysis
38/41
3838
Step 5: Get table in RStep 5: Get table in R
table(Original=iris$Species,Predicted=table(Original=iris$Species,Predicted=predict(iris.lda)$class)predict(iris.lda)$class)
PredictedPredicted
OriginalOriginal setosa versicolor virginicasetosa versicolor virginica
setosa 50 0 0setosa 50 0 0
versicolor 0 48 2versicolor 0 48 2
virginica 0 1 49virginica 0 1 49
Grouping
variable
Predictedclasses
8/7/2019 Stat 586 Discriminant Analysis
39/41
3939
Step 6: Structure coefficientsStep 6: Structure coefficients
Correlations between predictors andCorrelations between predictors anddiscriminant values indicate whichdiscriminant values indicate whichpredictor is most related to discriminantpredictor is most related to discriminantfunction (not corrected for the otherfunction (not corrected for the othervariables)variables)
Example: cor(iris[,1],LD1)Example: cor(iris[,1],LD1)
(Note difference with discriminant(Note difference with discriminantcoefficients!!!)coefficients!!!)
8/7/2019 Stat 586 Discriminant Analysis
40/41
4040
Assumptions underlying LDAAssumptions underlying LDA
Independent subjects.Independent subjects.
Normality: the varianceNormality: the variance--covariance matrixcovariance matrixof the predictors is the same in all groups.of the predictors is the same in all groups.
If the latter assumption is violated: useIf the latter assumption is violated: usequadratic discriminantquadratic discriminant analysis in theanalysis in thesame manner as linear discriminantsame manner as linear discriminant
analysis.analysis. ALWAYS CHECK YOURALWAYS CHECK YOUR
ASSUMPTIONS.ASSUMPTIONS.
8/7/2019 Stat 586 Discriminant Analysis
41/41
Quadratic discriminant analysisQuadratic discriminant analysis Call qda:Call qda:result