Download - Chapter11 Slides

7/31/2019 Chapter11 Slides

1/20

Welcome to Powerpoint slidesfor

Chapter 11

Discriminant Analysis

for

Classification and

Prediction

Marketing Research

Text and Cases

byRajendra Nargundkar


2/20

Application Areas

1. The major application area for this technique iswhere we want to be able to distinguish between two or

three sets of objects or people, based on the knowledge

of some of their characteristics.

2. Examples include the selection process for a job, the

admission process of an educational programme in acollege, or dividing a group of people into potential

buyers and non-buyers.

3. Discriminant analysis can be, and is in fact used, by

credit rating agencies to rate individuals, to classifythem into good lending risks or bad lending risks. The

detailed example discussed later tells you how to do

that.

4. To summarise, we can use linear discriminant

analysis when we have to classify objects into two ormore groups based on the knowledge of some variables

(characteristics) related to them. Typically, these groups

would be users-non-users, potentially successful

salesmanpotentially unsuccessful salesman, high risk

low risk consumer, or on similar lines.

Slide 1


3/20

Methods, Data etc.

1. Discriminant analysis is very similar to the multiple

regression technique. The form of the equation in a

two-variable discriminant analysis is:Y = a + k1 x1 + k2 x2

2. This is called the discriminant function. Also, like in

a regression analysis, y is the dependent variable and x1

and x2

are independent variables. k1

and k2

are the

coefficients of the independent variables, and a is a

constant. In practice, there may be any number of x

variables.

3. Please note that Y in this case is a categorical

variable (unlike in regression analysis, where it iscontinuous). x1 and x2 are however, continuous

(metric) variables. k1 and k2 are determined by

appropriate algorithms in the computer package used,

but the underlying objective is that these two

coefficients should maximise the separation or

differences between the two groups of the y variable.

4. Y will have 2 possible values in a 2 group

discriminant analysis, and 3 values in a 3 group

discriminant analysis, and so on.

Slide 2


4/20

5. K1 and K2 are also called the unstandardised discriminantfunction coefficients

6. As mentioned above, y is a classification into 2 or more

groups and therefore, a grouping variable, in the

terminology of discriminant analysis. That is, groups are

formed on the basis of existing data, and coded as 1 and 2 orsimilar to dummy variable coding.

7. The independent (x) variables are continuous scale

variables, and used as predictors of the group to which the

objects will belong. Therefore, to be able to use discriminant

analysis, we need to have some data on y and the x variables

from experience and / or past records.

Slide 2 contd...


5/20

Building a Model for Prediction/Classification

Assuming we have data on both the y and x variables of

interest, we estimate the coefficients of the model which

is a linear equation of the form shown earlier, and use the

coefficients to calculate the y value (discriminant score)

for any new data points that we want to classify into one ofthe groups. A decision rule is formulated for this process

to determine the cut off score, which is usually the

midpoint of the mean discriminant scores of the two

groups.

Accuracy of Classification:

Then, the classification of the existing data points is done

using the equation, and the accuracy of the model is

determined. This output is given by the classification

matrix (also called the confusion matrix), which tells uswhat percentage of the existing data points is correctly

classified by this model.

Slide 3


6/20

This percentage is somewhat analogous to the R2 in

regression analysis (percentage of variation in dependentvariable explained by the model). Of course, the actual

predictive accuracy of the discriminant model may be

less than the figure obtained by applying it to the data

points on which it was based.

Stepwise / Fixed Model:

Just as in regression, we have the option of entering one

variable at a time (Stepwise) into the discriminant

equation, or entering all variables which we plan to use.

Depending on the correlations between the independentvariables, and the objective of the study (exploratory or

predictive / confirmatory), the choice is left to the

student.

Slide 3 contd...


7/20

Slide 4

Relative Importance of Independent Variables

1. Suppose we have two independent variables, x1 and

x2. How do we know which one is more important indiscriminating between groups?

2. The coefficients of x1 and x2 are the ones which

provide the answer, but not the raw (unstandardised)

coefficients. To overcome the problem of different

measurement units, we must obtain standardiseddiscriminant coefficients. These are available from the

computer output.

3. The higher the standardised discriminant coefficient

of a variable, the higher its discriminating power.


8/20

Slide 5

A Priori Probability of Classification into Groups

The discriminant analysis algorithm requires us to

assign an a priori (before analysis) probability of a

given case belonging to one of the groups. There are

two ways of doing this.

.We can assign an equal probability of

assignment to all groups. Thus, in a 2 group

discriminant analysis, we can assign 0.5 as

the probability of a case being assigned to

any group.

.We can formulate any other rule for the

assignment of probabilities. For example, the

probabilities could proportional to the group

size in the sample data. If two thirds of thesample is in one group, the a priori

probability of a case being in that group

would be 0.66 (two thirds).


9/20

Slide 6

We will turn now to a complete worked examplewhich will clarify many of the concepts explained

earlier. We will begin with the problem statement

and input data.

Problem

Suppose State Bank of Bhubaneswar wants to start

credit card division. They want to use discriminant

analysis and set up a system to screen applicants and

classify them as either lowrisk or highrisk (risk

of default on credit card bill payments), based oninformation collected from their applications for a

credit card.

Suppose SBB has managed to get from SBI, its

sister bank, some data on SBIs credit card holders

who turned out to be low risk (no default) andhigh risk (defaulting on payments) customers.

These data on 18 customers are given in fig. 1.


10/20

Slide 7

Fig. 1

1

RISKL

1

AG

3

INC

4

YRSM1 1 35 4000 82 1 33 4500 63 1 29 3600 54 2 22 3200 05 2 26 3000 16 1 28 3500 67 2 30 3100 78 2 23 2700 2

9 1 32 4800 610 2 24 1200 411 2 26 1500 312 1 38 2500 713 1 40 2000 514 2 32 1800 415 1 36 2400 3

16 2 31 1700 517 2 28 1400 318 1 33 1800 6


11/20

Slide 8

We will perform a discriminant analysis and advise SBB

on how to set up its system to screen potential good

customers (low risk) from bad customers (high risk). In

particular, we will build a discriminant function (model)

and find out

.The percentage of customers that it is able to

classify correctly.

.Statistical significance of the discriminant

function.

.Which variables (age, income, or years of

marriage) are relatively better in discriminating

between low and high risk applicants.

.How to classify a new credit card applicant

into one of the two groupslowrisk or high

risk, by building a decision rule and a cut off

score.


12/20

Slide 9

Input Data are given in fig. 1.

Interpretation of Computer Output:

We will now find answers to all the four questions

we have raised earlier.

Q1. How good is the Model? How many of the 18

data points does it classify correctly?

To answer this question, we look at the computer

output labelled fig. 3. This is a part of the

discriminant analysis output from any computer

package such as SPSS, SYSTAT, STATISTICA,

SAS etc. (there could be minor variations in the exactnumbers obtained, and major variations could occur

if options chosen by the student are different. For

example, if a priori probabilities chosen for the

classification into the two groups are equal, as we

have assumed while generating this output, then you

will very likely see similar numbers in your output).

Fig. 3 : Classification Matrix

STAT. Classification MatrixGrou Percent G_1 G_2G1 100.0000 9 0

Total 94.4444 10 8


13/20

Slide 10

This output (fig. 3) is called the classification matrix

(also known as the confusion matrix), and it indicates

that the discriminant function we have obtained is ableto classify 94.44 percent of the 18 objects correctly. This

figure is in the percent correct column of the

classification matrix. More specifically, it also says that

out of 10 cases predicted to be in group 1, 9 were

observed to be in group 1 and 1 in Group 2, (from

column G-1). Similarly, from the column G-2, we

understand that our of 8 cases predicted to be in group

2, all 8 were found to be in group 2. Thus, on the whole,

only 1 case out of 18 was misclassified by the

discriminant model, thus giving us a classification (or

prediction) accuracy level of (18-1)/18, or 94.444percent.

As mentioned earlier, this level of accuracy may not

hold for all future classification of new cases. But it is

still a pointer towards the model being a good one,

assuming the input data were relevant and scientificallycollected. There are ways of checking the validity of the

model, but these will be discussed separately.


14/20

Slide 11

Statistical Significance

Q2. How significant, statistically speaking, is

the discriminant function?

This question is answered by looking at the

Wilks Lambda and the probability value for

the F test given in the computer output, as a

part of fig. 3.(shown below)

Discriminant Function Analysis ResultsNumber of variables in the model: 3Wilks Lambda: .3188764 approx. F (3, 14)= 9.968056 p < .00089

The value of Wilks Lamba is 0.318. This

value is between 0 and 1, and a low value

(closer to 0) indicates better discriminating

power of the model. Thus, 0.318 is an

indicator of the model being good. The

probability value of the F test indicates that thediscrimination between the two groups is

highly significant. This is because p


15/20

Slide 12

Q3. We have 3 independent (or predictor) variables

Age, Income and No. of Years Married for. Which of

these is a better predictor of a person being a low

credit risk or high credit risk?

To answer this question, we look at the

standardised coefficients in the output. These aregiven in fig. 5 (shown below).

Fig. 5.

STAT. StandardizedVariable Root 1

AGE _.923955Ei enval 2.136012

This output shows that Age is the best predictor,

with the coefficient of 0.92, followed by Income,

with a coefficient of 0.77, Years of Marriage is the

last, with a coefficient of 0.15, Please recall that theabsolute value of the standardised coefficient of each

variable indicates its relative importance.


16/20

Slide 13

Q4. How do we classify a new credit card applicantinto either a high risk or low risk category, and

make a decision on accepting or refusing him a credit

card?

This is the most important question to be answered.

Please remember why we started out with thediscriminant analysis in this problem. State Bank

of Bhubaneswar wished to have a decision model

for screening credit card applicants.

The way to do this is to use the outputs in fig. 4

(Raw or unstandardised coefficients in thediscriminant function) and fig. 6 (Means of

canonical variables). Fig. 6, the means of canonical

variables, gives us the new means for the

transformed group centroids.

Fig. 6.

STAT. Means of CanonicalGrou Root 1

G 1:1 -1.37793


17/20

Thus, the new mean for group 1 (low risk) is 1.37793, and the new mean for group 2 (high risk)

is + 1.37792. This means that the midpoint of these

two is 0. This is clear when we plot the two means

on a straight line, and locate their midpoint, as

shown below-

-1.37 0 +1.37

Mean of Group1 Mean of Group2

(Low Risk) (High Risk)

Slide 13 contd...


18/20

Slide 14

This also gives us a decision rule for classifying any

new case. If the discriminant score of an applicant

falls to the right of the midpoint, we classify him as

high risk, and if the discriminant score of anapplicant falls to the left of the midpoint, we classify

him as low risk. In this case, the midpoint is 0.

Therefore, any positive (greater than 0) value of the

discriminant score will lead to classification as high

risk, and any negative (less than 0) value of the

discriminant score will lead to classification as lowrisk. But how do we compute the discriminant scores

of an applicant?

We use the applicants Age, Income and Years of

Marriage (from his application) and plug these into

the unstandardised discriminant function. This givesus his discriminant score.


19/20

Fig. 4.

STAT. Raw CoefficientsVariable Root 1

AGE -.24560Constan 10.00335

Ei enval 2.13601

From Fig. 4 (reproduced above), the unstandardised

(or raw) discriminant function is

Y = 10.0036 Age (.24560) Income (.00008)

- Yrs. Married (.08465)Where y would give us the discriminant score of any

person whose Age, Income and Yrs. Married were

known.

Slide 14 contd...


20/20

Slide 15

Let us take an example of a credit card application to

SBB who is aged 40, has an income of Rs. 25,000 permonth and has been married for 15 years. Plugging

these values into the discriminant function or model

above, we find his discriminant score y to be

10.003640 (.24560)25000 (.00008)

-15 (.08465), which is

= 10.00369.82421.26975

= - 3.09015

According to our decision rule, any discriminant score

to the left of the midpoint of 0 leads to a classificationin the low risk group. Therefore, we should give this

person a credit card, as he is a low risk customer. The

same process is to be followed for any new applicant.

If his discriminant score is to the right of the midpoint

of 0, he should be denied a credit card, as he is a high

risk customer.

We have completed answering the four questions

raised by State Bank of Bhubaneswar.