7/31/2019 Chapter11 Slides
1/20
Welcome to Powerpoint slidesfor
Chapter 11
Discriminant Analysis
for
Classification and
Prediction
Marketing Research
Text and Cases
byRajendra Nargundkar
7/31/2019 Chapter11 Slides
2/20
Application Areas
1. The major application area for this technique iswhere we want to be able to distinguish between two or
three sets of objects or people, based on the knowledge
of some of their characteristics.
2. Examples include the selection process for a job, the
admission process of an educational programme in acollege, or dividing a group of people into potential
buyers and non-buyers.
3. Discriminant analysis can be, and is in fact used, by
credit rating agencies to rate individuals, to classifythem into good lending risks or bad lending risks. The
detailed example discussed later tells you how to do
that.
4. To summarise, we can use linear discriminant
analysis when we have to classify objects into two ormore groups based on the knowledge of some variables
(characteristics) related to them. Typically, these groups
would be users-non-users, potentially successful
salesmanpotentially unsuccessful salesman, high risk
low risk consumer, or on similar lines.
Slide 1
7/31/2019 Chapter11 Slides
3/20
Methods, Data etc.
1. Discriminant analysis is very similar to the multiple
regression technique. The form of the equation in a
two-variable discriminant analysis is:Y = a + k1 x1 + k2 x2
2. This is called the discriminant function. Also, like in
a regression analysis, y is the dependent variable and x1
and x2
are independent variables. k1
and k2
are the
coefficients of the independent variables, and a is a
constant. In practice, there may be any number of x
variables.
3. Please note that Y in this case is a categorical
variable (unlike in regression analysis, where it iscontinuous). x1 and x2 are however, continuous
(metric) variables. k1 and k2 are determined by
appropriate algorithms in the computer package used,
but the underlying objective is that these two
coefficients should maximise the separation or
differences between the two groups of the y variable.
4. Y will have 2 possible values in a 2 group
discriminant analysis, and 3 values in a 3 group
discriminant analysis, and so on.
Slide 2
7/31/2019 Chapter11 Slides
4/20
5. K1 and K2 are also called the unstandardised discriminantfunction coefficients
6. As mentioned above, y is a classification into 2 or more
groups and therefore, a grouping variable, in the
terminology of discriminant analysis. That is, groups are
formed on the basis of existing data, and coded as 1 and 2 orsimilar to dummy variable coding.
7. The independent (x) variables are continuous scale
variables, and used as predictors of the group to which the
objects will belong. Therefore, to be able to use discriminant
analysis, we need to have some data on y and the x variables
from experience and / or past records.
Slide 2 contd...
7/31/2019 Chapter11 Slides
5/20
Building a Model for Prediction/Classification
Assuming we have data on both the y and x variables of
interest, we estimate the coefficients of the model which
is a linear equation of the form shown earlier, and use the
coefficients to calculate the y value (discriminant score)
for any new data points that we want to classify into one ofthe groups. A decision rule is formulated for this process
to determine the cut off score, which is usually the
midpoint of the mean discriminant scores of the two
groups.
Accuracy of Classification:
Then, the classification of the existing data points is done
using the equation, and the accuracy of the model is
determined. This output is given by the classification
matrix (also called the confusion matrix), which tells uswhat percentage of the existing data points is correctly
classified by this model.
Slide 3
7/31/2019 Chapter11 Slides
6/20
This percentage is somewhat analogous to the R2 in
regression analysis (percentage of variation in dependentvariable explained by the model). Of course, the actual
predictive accuracy of the discriminant model may be
less than the figure obtained by applying it to the data
points on which it was based.
Stepwise / Fixed Model:
Just as in regression, we have the option of entering one
variable at a time (Stepwise) into the discriminant
equation, or entering all variables which we plan to use.
Depending on the correlations between the independentvariables, and the objective of the study (exploratory or
predictive / confirmatory), the choice is left to the
student.
Slide 3 contd...
7/31/2019 Chapter11 Slides
7/20
Slide 4
Relative Importance of Independent Variables
1. Suppose we have two independent variables, x1 and
x2. How do we know which one is more important indiscriminating between groups?
2. The coefficients of x1 and x2 are the ones which
provide the answer, but not the raw (unstandardised)
coefficients. To overcome the problem of different
measurement units, we must obtain standardiseddiscriminant coefficients. These are available from the
computer output.
3. The higher the standardised discriminant coefficient
of a variable, the higher its discriminating power.
7/31/2019 Chapter11 Slides
8/20
Slide 5
A Priori Probability of Classification into Groups
The discriminant analysis algorithm requires us to
assign an a priori (before analysis) probability of a
given case belonging to one of the groups. There are
two ways of doing this.
.We can assign an equal probability of
assignment to all groups. Thus, in a 2 group
discriminant analysis, we can assign 0.5 as
the probability of a case being assigned to
any group.
.We can formulate any other rule for the
assignment of probabilities. For example, the
probabilities could proportional to the group
size in the sample data. If two thirds of thesample is in one group, the a priori
probability of a case being in that group
would be 0.66 (two thirds).
7/31/2019 Chapter11 Slides
9/20
Slide 6
We will turn now to a complete worked examplewhich will clarify many of the concepts explained
earlier. We will begin with the problem statement
and input data.
Problem
Suppose State Bank of Bhubaneswar wants to start
credit card division. They want to use discriminant
analysis and set up a system to screen applicants and
classify them as either lowrisk or highrisk (risk
of default on credit card bill payments), based oninformation collected from their applications for a
credit card.
Suppose SBB has managed to get from SBI, its
sister bank, some data on SBIs credit card holders
who turned out to be low risk (no default) andhigh risk (defaulting on payments) customers.
These data on 18 customers are given in fig. 1.
7/31/2019 Chapter11 Slides
10/20
Slide 7
Fig. 1
1
RISKL
1
AG
3
INC
4
YRSM1 1 35 4000 82 1 33 4500 63 1 29 3600 54 2 22 3200 05 2 26 3000 16 1 28 3500 67 2 30 3100 78 2 23 2700 2
9 1 32 4800 610 2 24 1200 411 2 26 1500 312 1 38 2500 713 1 40 2000 514 2 32 1800 415 1 36 2400 3
16 2 31 1700 517 2 28 1400 318 1 33 1800 6
7/31/2019 Chapter11 Slides
11/20
Slide 8
We will perform a discriminant analysis and advise SBB
on how to set up its system to screen potential good
customers (low risk) from bad customers (high risk). In
particular, we will build a discriminant function (model)
and find out
.The percentage of customers that it is able to
classify correctly.
.Statistical significance of the discriminant
function.
.Which variables (age, income, or years of
marriage) are relatively better in discriminating
between low and high risk applicants.
.How to classify a new credit card applicant
into one of the two groupslowrisk or high
risk, by building a decision rule and a cut off
score.
7/31/2019 Chapter11 Slides
12/20
Slide 9
Input Data are given in fig. 1.
Interpretation of Computer Output:
We will now find answers to all the four questions
we have raised earlier.
Q1. How good is the Model? How many of the 18
data points does it classify correctly?
To answer this question, we look at the computer
output labelled fig. 3. This is a part of the
discriminant analysis output from any computer
package such as SPSS, SYSTAT, STATISTICA,
SAS etc. (there could be minor variations in the exactnumbers obtained, and major variations could occur
if options chosen by the student are different. For
example, if a priori probabilities chosen for the
classification into the two groups are equal, as we
have assumed while generating this output, then you
will very likely see similar numbers in your output).
Fig. 3 : Classification Matrix
STAT. Classification MatrixGrou Percent G_1 G_2G1 100.0000 9 0
Total 94.4444 10 8
7/31/2019 Chapter11 Slides
13/20
Slide 10
This output (fig. 3) is called the classification matrix
(also known as the confusion matrix), and it indicates
that the discriminant function we have obtained is ableto classify 94.44 percent of the 18 objects correctly. This
figure is in the percent correct column of the
classification matrix. More specifically, it also says that
out of 10 cases predicted to be in group 1, 9 were
observed to be in group 1 and 1 in Group 2, (from
column G-1). Similarly, from the column G-2, we
understand that our of 8 cases predicted to be in group
2, all 8 were found to be in group 2. Thus, on the whole,
only 1 case out of 18 was misclassified by the
discriminant model, thus giving us a classification (or
prediction) accuracy level of (18-1)/18, or 94.444percent.
As mentioned earlier, this level of accuracy may not
hold for all future classification of new cases. But it is
still a pointer towards the model being a good one,
assuming the input data were relevant and scientificallycollected. There are ways of checking the validity of the
model, but these will be discussed separately.
7/31/2019 Chapter11 Slides
14/20
Slide 11
Statistical Significance
Q2. How significant, statistically speaking, is
the discriminant function?
This question is answered by looking at the
Wilks Lambda and the probability value for
the F test given in the computer output, as a
part of fig. 3.(shown below)
Discriminant Function Analysis ResultsNumber of variables in the model: 3Wilks Lambda: .3188764 approx. F (3, 14)= 9.968056 p < .00089
The value of Wilks Lamba is 0.318. This
value is between 0 and 1, and a low value
(closer to 0) indicates better discriminating
power of the model. Thus, 0.318 is an
indicator of the model being good. The
probability value of the F test indicates that thediscrimination between the two groups is
highly significant. This is because p
7/31/2019 Chapter11 Slides
15/20
Slide 12
Q3. We have 3 independent (or predictor) variables
Age, Income and No. of Years Married for. Which of
these is a better predictor of a person being a low
credit risk or high credit risk?
To answer this question, we look at the
standardised coefficients in the output. These aregiven in fig. 5 (shown below).
Fig. 5.
STAT. StandardizedVariable Root 1
AGE _.923955Ei enval 2.136012
This output shows that Age is the best predictor,
with the coefficient of 0.92, followed by Income,
with a coefficient of 0.77, Years of Marriage is the
last, with a coefficient of 0.15, Please recall that theabsolute value of the standardised coefficient of each
variable indicates its relative importance.
7/31/2019 Chapter11 Slides
16/20
Slide 13
Q4. How do we classify a new credit card applicantinto either a high risk or low risk category, and
make a decision on accepting or refusing him a credit
card?
This is the most important question to be answered.
Please remember why we started out with thediscriminant analysis in this problem. State Bank
of Bhubaneswar wished to have a decision model
for screening credit card applicants.
The way to do this is to use the outputs in fig. 4
(Raw or unstandardised coefficients in thediscriminant function) and fig. 6 (Means of
canonical variables). Fig. 6, the means of canonical
variables, gives us the new means for the
transformed group centroids.
Fig. 6.
STAT. Means of CanonicalGrou Root 1
G 1:1 -1.37793
7/31/2019 Chapter11 Slides
17/20
Thus, the new mean for group 1 (low risk) is 1.37793, and the new mean for group 2 (high risk)
is + 1.37792. This means that the midpoint of these
two is 0. This is clear when we plot the two means
on a straight line, and locate their midpoint, as
shown below-
-1.37 0 +1.37
Mean of Group1 Mean of Group2
(Low Risk) (High Risk)
Slide 13 contd...
7/31/2019 Chapter11 Slides
18/20
Slide 14
This also gives us a decision rule for classifying any
new case. If the discriminant score of an applicant
falls to the right of the midpoint, we classify him as
high risk, and if the discriminant score of anapplicant falls to the left of the midpoint, we classify
him as low risk. In this case, the midpoint is 0.
Therefore, any positive (greater than 0) value of the
discriminant score will lead to classification as high
risk, and any negative (less than 0) value of the
discriminant score will lead to classification as lowrisk. But how do we compute the discriminant scores
of an applicant?
We use the applicants Age, Income and Years of
Marriage (from his application) and plug these into
the unstandardised discriminant function. This givesus his discriminant score.
7/31/2019 Chapter11 Slides
19/20
Fig. 4.
STAT. Raw CoefficientsVariable Root 1
AGE -.24560Constan 10.00335
Ei enval 2.13601
From Fig. 4 (reproduced above), the unstandardised
(or raw) discriminant function is
Y = 10.0036 Age (.24560) Income (.00008)
- Yrs. Married (.08465)Where y would give us the discriminant score of any
person whose Age, Income and Yrs. Married were
known.
Slide 14 contd...
7/31/2019 Chapter11 Slides
20/20
Slide 15
Let us take an example of a credit card application to
SBB who is aged 40, has an income of Rs. 25,000 permonth and has been married for 15 years. Plugging
these values into the discriminant function or model
above, we find his discriminant score y to be
10.003640 (.24560)25000 (.00008)
-15 (.08465), which is
= 10.00369.82421.26975
= - 3.09015
According to our decision rule, any discriminant score
to the left of the midpoint of 0 leads to a classificationin the low risk group. Therefore, we should give this
person a credit card, as he is a low risk customer. The
same process is to be followed for any new applicant.
If his discriminant score is to the right of the midpoint
of 0, he should be denied a credit card, as he is a high
risk customer.
We have completed answering the four questions
raised by State Bank of Bhubaneswar.