Nonparametric Correlation Techniques Techniques for Correlating Nominal & Ordinal Variables
2
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
KEY CONCEPTS *****
Nonparametric Correlation Techniques
Scales of measurement Nominal Scale Ordinal scale Interval scale Ratio scale Metric vs. nonmetric variables Spearman Rank-Order Correlation Coefficient: Rho () Rho assumptions Null hypothesis in rho One and two-tailed hypotheses Reducing metric variables to ordinal scales of measurement Resolving the problem of tied ranks Goodman’s & Kruskal’s Gamma () Gamma assumptions Null hypothesis in gamma The concepts of consistency & inconsistency in gamma Using Z to determine the significance of gamma The Phi Coefficient () Phi assumptions Null hypothesis in phi The relationship between phi and chi-square The Contingency Coefficient (C) C assumptions Null hypothesis in C The relationship between C and chi-square The relationship between C and phi Limitation in the values that C can take Cramér’s V V assumptions Null hypothesis in V The relationship between V and chi-square Guttman’s Lambda () Lambda assumptions Null hypothesis in lambda Lambda as an asymmetrical correlation coefficient The concept of the reduction of the error in prediction PRE: Proportionate reduction of error
3
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Lecture Outline
What are nonparametric correlation techniques and what kind of research problems are they designed to solve. Spearman Rank-Order Correlation Coefficient: Rho ()
Goodman’s & Kruskal’s Gamma ()
The Phi Coefficient ()
Contingency Coefficient (C)
Cramér’s V
Guttman’s Coefficient of Predictability
Lambda ()
4
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Nonparametric Correlation Techniques
If the variables X and Y are metric (i.e. interval or ratio measures) and they are to be correlated,
Then the appropriate technique is Pearson’s Product-Moment Correlation Coefficient.
r = xy
x2 y2
Q What if X and/or Y is nonmetric (i.e. nominal or ordinal measures), how can they be correlated? A By use of one of a variety of nonparametric correlational techniques. Nonparametric correlational techniques are designed two estimate the correlation or association between variables measured on nominal and/or ordinal scales, or metric variables that have been reduced to nominal and/or ordinal scales.
5
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Spearman Rank-Order Correlation Coefficient: (rho)
= 1 - (6D2 )/ [N(N2 – 1)]
A technique for determining the correlation between two ordinal variables, or metric variables reduced to an ordinal scale. Assumptions The two variables are ordinal or metric
variables that have been reduced to an ordinal scale of measurement,
The correlation between the variables is
linear, and If a test of significance is applied, the
sample has been selected randomly from the population.
6
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
An Example
A prosecutor received 10 felony cases filed by an interagency organized crime task force and ranked the cases by seriousness (serious=X) and relative prosecutability (prosecute=Y).*
Case
X Serious
Y Prosecute
D
D2
A 6 3 3 9 B 1 10 -9 81 C 4 7 -3 9 D 7 5 2 4 E 10 1 9 81 F 3 8 -5 25 G 8 2 6 36 H 9 4 5 25 I 5 6 -1 1 J 2 9 -7 49 Total 320
*(Rankings: 1= the highest and 10= the lowest)
D = the difference between the rank position of each case on X and Y. N = the number of paired observations, cases.
7
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Calculation of Rho ()
= 1 - (6D2 )/ [N(N2 – 1)] = 1 - (6) (320)/ [10(102 – 1)] = 1 - (1920)/ [10(99)] = 1 - (1920)/ (990) = -0.939 Interpretation The correlation is negative and the
magnitude is high. As the seriousness of the crime increases,
its prosecutability decreases.
8
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Sprearman’s Rho SPSS Results
Rho = -0.939
Two-tailed level of significance: p 0.001
9
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Reducing a Metric Variable to an Ordinal Scale of Measurement
What is the correlation between … The rank-ordered seriousness of 8 offences
(ordinal variable) and The length of sentences received by their
perpetrators (ratio variable)?
Case
Serious-
ness
Sentence Length: In Years
Rank of
Sentence
D
D2
A 5 6 5 0 0 B 2 3 2 0 0 C 7 7 6 -1 1 D 1 2 1 0 0 E 6 8 7 -1 1 F 3 5 4 -1 1 G 8 10 8 0 0 H 4 4 3 +1 1
Total 4
10
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Seriousness of offence is ranked-ordered from least serious (rank = 1) to most serious (rank = 8). The length of sentence is rank-ordered from lowest (rank = 1) to highest (rank = 8) Computation of rho
= 1 - (6D2 )/ [N(N2 – 1)]
= 1 - (6) (4) )/ [8(82 – 1)] = +0.952 = +0.952 Interpretation The relationship is positive and the
magnitude of the correlation is high. As the seriousness of the offence increases,
The length of sentence increases as well.
11
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
The Problem of Tied Ranks
In converting a metric variable to an ordinal scale of measurement, some cases may have tied values. (Shaded cells are tied scores)
Case
Serious-
ness
Sentence Length In
Years
Sentence
Rank Position
Rank:
Sentence
D
D2
A 5 6 4 4.5 0.5 0.25 B 2 2 1 1.5 0.5 0.25 C 7 7 6 6 1.0 1.00 D 1 2 2 1.5 -0.5 0.25 E 6 8 7 7 -1.0 1.00 F 3 6 5 4.5 -1.5 2.25 G 8 10 8 8 0.0 0.00 H 4 4 3 3 +1.0 1.00
Total 6.00 Cases B & D have tied sentences (2 years) as do cases A & F (6 years) In a rank ordering, cases B & D occupy rank positions 1 & 2, while cases A & F occupy rank positions 4 & 5. To determine the appropriate rank of tied cases, add the rank positions and divided by the number of tied cases.
12
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
For cases B & D: (1+2) / 2 = 1.5 1.5 is the rank assigned to cases B & D For cases A & F: (4+5) /2 = 4.5
4.5 is the rank assigned to cases A & F Computation of rho
= 1 - (6D2 )/ [N(N2 – 1)]
= 1 - (6) (6) )/ [8(82 – 1)] = +0.929 Interpretation The relationship is positive and the
magnitude of the correlation is high As the seriousness of the offence increases
The length of sentence increases
13
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Spearman’s Rho With Tied Ranks SPSS Results
Rho with tied ranks = +0.928
Two-tailed level of significance p= 0.001
14
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Significance of Rho
In testing the significance of rho, the null hypothesis H0 states …
That the value of rho in the population from which the sample was drawn is 0.0
Therefore, the statistical question becomes …
What is the probability that the obtained value of rho in the sample could have come from such a population?
Given a sample size of N cases, a statistical table can be used to answer this question.
15
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Table for Determining the Significance of Rho
16
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Critical Values of Rho in Testing Significance
Consider the three previous examples involving: The prosecutor ranking the seriousness &
prosecutability of criminal cases (N = 10) The correlation of offence seriousness and
sentence length (N = 8), and The correlation of offence seriousness and
sentence length involving tied cases (N = 8)
Example
N
Rho
Critical Value
0.05 0.01 Prosecutor
10
-0.939
0.648
0.794
Sentence 8 +0.952 0.738 0.881 Tied ranks 8 +0.929 0.738 0.881
All three sample values of rho exceed the critical value at the p=0.01 level of significance. Therefore, we are more than 99% confident in rejecting each of these H0’s.
17
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Derivation of the Spearman Rank-Order Correlation Coefficient ()
Spearman’s rank-order correlation coefficient () can be derived from Pearson’s correlation coefficient (r).
r = r = xy = = 1 - (6d2 )/ [N(N2 - 1)] x2 y2 If X and Y are ordinal variables ranked 1, 2, …, N, then X = Y = N(N+1) / 2 And X2 = Y2 = N(N+1)(2N+1) / 6 Given that x2 = (X - X) 2 = X2 - (X)2 / N And y2 = (Y - Y) 2 = Y2 - (Y)2 / N
18
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Then for ordinal variables X & Y x2 = N (N+1)(2N+1) - [N(N+1)/2] 2 / N 6 x2 = N(N+1)(2N+1) - 6
1/N [N(N+1)/ 2] [N(N+1)/ 2]
This can be reduced as follows x2 = N(2N2+N+2N+1) -
6 1/N [(N2+N)(N2+N) /4]
x2 = (2N3+N2+2N2+N) - 6
1/N [(N4+N3+N3+N2)/4]
x2 = (2N3+3N2+N) - 1/N [(N4+2N3+ N2)/4] 6
x2 = (2N3+3N2+N) - (N4+2N3+ N2) 6 4N x2 = (2N3+3N2+N) - (N3+2N2+ N) 6 4
19
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Substituting the common denominator 12 x2 = 2(2N3+3N2+N) - 3(N3+2N2+ N)
12 12 x2 = (4N3+6N2+2N - 3N3 - 6N2 - 3N) / 12
x2 =(N3 - N) / 12
And by the same logic
y2 =(N3 - N) / 12 Now let d = (x - y) d2 = (x - y)2 = (x2 - 2xy +y2)
d2 = x2 + y2 - 2xy
Since r = and r = xy
x2 y2 And given that
d2 = x2 + y2 - 2xy Multiply the last term on the right side of the equation by 1
20
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
1 = x2 y2
x2 y2
d2 = x2+y2-2 [xy] [ x2 y2 / x2 y2 ] Since
r = = xy x2 y2 Then by substitution
d2 = x2 + y2 - 2 () x2 y2
Solving for d2 - x2 - y2 = - 2 x2 y2
d2 + x2 + y2 = 2 x2 y2
Recall that when X & Y are ranks
x2 = y2 = (N3 - N) / 12 Then by substitution = (N3 - N) / 12 + (N3 - N) / 12 - d2
21
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
2 [(N3 - N) / 12] [(N3 - N) / 12] = 2 [(N3 - N) / 12] - d2
2 [(N3 - N) / 12]
= 1 - d2 [2 (N3 - N) / 12] = 1 - 6d2
N3 – N
r =
xy = 1 - (6d2 )/ [N(N2 - 1)] x2 y2
22
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Goodman’s & Kruskal’s Gamma ()
= (Na – Ni)/ (Na + Ni)
A technique for determining the correlation between two ordinal variables used to define a two-way cross classification table. Assumptions The two variables are ordinal or metric variables
that have been reduced to an ordinal scale of measurement,
The correlation between the variables is linear,
If a test of significance is applied, the sample has
been selected randomly from the population, The columns in the table are ranked in
decreasing order from left to right, and The rows in the table are ranked in decreasing
order from top to bottom.
23
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
An Example
A survey was conducted to measure the perceptions of citizens concerning:
Faith in the fairness of the criminal justice system (X), and
Their attitude toward the death penalty (Y).
Survey Results (N = 105)
Faith in Fairness
Death Penalty
Very
Favorable
Favorable
Opposed
Very
Opposed
Totals
High
15
12
6
5
38
Medium
12
8
10
8
38
Low
4
6
9
10
29
Totals
31
26
25
23
105
Is the perceived fairness of the justice system correlated with attitudes about the death penalty?
24
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Calculation of Gamma
Gamma measures the degree of agreement (Na) and disagreement (Ni) between the two variables. To calculate Na, begin with the frequency in the upper left-hand cell (i.e. 15) and multiply it by the sum of the frequencies in all cells below and to the right of it.
Faith in Fairness
Death Penalty
Very
Favorable
Favorable
Opposed
Very
Opposed
Totals
High
15
12
6
5
38
Medium
12
8
10
8
38
Low
4
6
9
10
29
Totals
31
26
25
23
105
Calculation of Na
15(8+10+8+6+9+10) = 15(51) = 765 Now do the same for all frequencies that have cells that fall below and to the right.
25
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
15
12
6
5
12
8
10
8
4
6
9
10
12(10+8+9+10) = 12(37) = 444
15
12
6
5
12
8
10
8
4
6
9
10
6(8+10) = 6(18) = 108
15
12
6
5
12
8
10
8
4
6
9
10
12(6+9+10) = 12(25) = 300
15
12
6
5
12
8
10
8
4
6
9
10
26
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
8(9+10) = 8(19) = 152
15
12
6
5
12
8
10
8
4
6
9
10
10(10) = 100 Na = sum of these computations Na = (765+444+108+300+152+100) = 1869
27
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Calculation of Ni, Degree of Inconsistency
The process for determining Ni is similar to that of calculating Na. Begin with the frequency in the upper right-hand cell (i.e. 5) and multiply it by the sum of the frequencies in all cells below and to the left of it.
15
12
6
5
12
8
10
8
4
6
9
10
5(12+8+10+4+6+9) = 5(49) = 245 Now do the same for all frequencies that have cells that fall below and to the left.
15
12
6
5
12
8
10
8
4
6
9
10
28
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
6(12+8+4+6) = 6(30) = 180
15
12
6 5
12
8
10
8
4
6
9
10
12(12+4) = 12(16) = 192
15
12
6
5
12
8
10
8
4
6
9
10
8(4+6+9) = 8(19) = 152
15
12
6
5
12
8
10
8
4
6
9
10
10(4+6) = 10(10) = 100
15
12
6
5
12
8
10
8
4
6
9
10
8(4) = 32
29
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Ni = the sum of these calculation Ni = (245+180+192+152+100+32) = 901
= (Na – Ni)/ (Na + Ni)
= (1869 – 901)/ (1869 + 901) = (968 / 2710) = +0.35 Interpretation There is a positive correlation between the perception of fairness and attitudes towards the death penalty. As faith in the fairness of the justice system
increases, People become more favorably disposed
toward the death penalty.
30
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Gamma Coefficient SPSS Results
Gamma = 0.349
Level of significance: p 0.001
31
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Determining the Significance of Gamma
Gamma may be converted into a Z score and evaluated at 1.96 or 2.58. H0: The correlation in the population from which the sample was drawn is 0.0. Conversion Z = √ (Na + Ni) / [N (1 - 2)] Z = +0.35 √ (1869 + 901) / [105 (1 – 0.352)] Z = +0.35 √ (2770) / [105 (0.8775)] Z = +0.35 √ 30.06 Z = +0.35( 5.483) = 1.92 Since 1.92 is less than 1.96, we conclude that the correlation is not significant, i.e. the correlation in the population is 0.0.
32
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
The Phi Coefficient ()
= 2 / N
Phi is a derivative of chi-square (2). It is a technique for correlating two nominal variables, or variables that have been reduced to a nominal scale of measurement. Assumptions The two variables are nominal,
The data consists of frequencies cast in a
2x2 cross-tabulation table, and The sample has been randomly selected
from the population if the phi coefficient is tested for significance.
33
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
An Example
A study was conducted to determine if there is a relationship between race and the sentences received by 960 misdemeanant offenders.
The Results (2x2 table)
Sentence
White
Non- White
Total
Probation/ Deferred Adjudication
314
196
510
Jail
210
240
450
Totals
524
436
960
34
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Step 1 Calculate the expected frequencies
Sentence
White
Non- White
Total
Probation/ Deferred Adjudication
278.4
231.6
510
Jail
245.6
204.4
450
Totals
524
436
960
Step 2 Calculate chi-square 2 = [ (fo – fe)2 / fe ] 2 = (314-278.4)2/278.4+(196-231.6)2/ 231.6+
(210-245.6)2/245.6+(240- 204.4)2/204.4
2 = 21.38
35
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Step 3 Convert the chi-square to a phi coefficient
= 2 / N
= 21.38 / 960 = 0.149 The correlation between race and sentence type is low, i.e. 0.149 Q Is the obtained correlation statistically significant? A The significance of phi is tested the same way as the significance of chi-square.
36
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Testing the Significance of the Phi Coefficient
If the chi-square statistic is significant at 1 df, so is the phi coefficient. The critical values of chi-square for 1 df are:
3.841 at the p = 0.05 level, and
6.635 at the p = 0.01 level Since 2 = 21.38 is greater than 6.635
It is significant at p 0.01 Therefore
= 0.149 is also significant at p 0.01 Interpretation
There is a significant association between race and sentence type in the population of the order of 0.149.
37
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Chi-Square Table Critical values of 2 at 1 df: 3.841 at p = 0.05, and 6.635 at p = 0.01
38
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Phi Coefficient SPSS Results
Phi = 0.149, significance: p 0.001
39
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Contingency Coefficient (C)
C = 2 / (N+2 )
A technique for determining the correlation between two nominal variables cast in a frequency table larger than 2x2. Assumptions The two variables are nominal or variables
that have been reduced to a nominal scale of measurement
The data have been cast in a 2x2 frequency
table or larger table The sample has been drawn randomly from
the population if the significance of C is to be tested
40
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
An Example
A study was conducted on 960 misdemeanor cases to determine if there is a correlation between race and type of sentence.
The Results (3x3 frequency table) Sentence
White
Black
Other
Totals
Deferred Adjudication
122
61
21
204
Probation 192 96 18 306 Jail 210 83 157 450 Totals 524 240 196 960 Step 1 Calculate the expected frequencies Sentence
White
Black
Other
Totals
Deferred Adjudication
111.35
51.00
41.05
204
Probation 167.03 76.5 62.48 306 Jail 245.63 112.50 91.88 450 Totals 524 240 196 960
41
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Step 2 Calculate 2 2 = [ (fo – fe)2 / fe ] 2 = (122-111.35)2/111.35+(61-51) 2/51+
(21-41.65) 2+(192-167.03) 2/167.03+ (96-76.5) 2/76.5+(18-62.48) 2/62.48+ (210-245.63) 2/245.63+(83-112.5) 2/112. 5+ (157-91.88) 2/91.88
2 = 1.02+1.96+10.24+3.73+4.97+31.67 2 = 112.65 Step 3 Calculate the contingence coefficient C = 2 / (N+2 ) C = 112.65 / (960+112.65)
C = 0.105 = 0.32
42
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Contingency Coefficient SPSS Results
Contingency Coefficient = 0.324
Level of significance: p 0.001
43
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Testing the Significance of the Contingency Coefficient C
The significance of C is tested by testing the significance of 2 for (r – 1)(c – 1) df.
H0: the correlation between race and type of sentence in the population is 0.0.
2 = 21.38, df = (3 – 1)(3 – 1) = 4 Critical values of 2 at 4 df is 9.488 (p = 0.05) and 13.277 (p = 0.01)
Since 21.38 is greater than 13.277, C is significant at p 0.01
Interpretation
The correlation between race and type of sentence in the population is estimated to be 0.32.
The direction of the correlation is
meaningless since the variables are nominal.
44
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Application of the Contingency Coefficient C to a 2x2 Table
Consider the previous example in which the phi coefficient was used to determine the correlation between race and type of sentence in a 2x2 table.
Sentence
White
Non- White
Total
Probation/ Deferred Adjudication
314
196
510
Jail
210
240
450
Totals
524
436
960
2 = 21.38 and = 0.149
The contingency coefficient for the same table would be
C = 21.38 / (960 + 21.38) = 0.148
C in a 2x2 table, within rounding error = 21.38 / 960 = 0.149
45
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Contingency Coefficient: 2x2 Table SPSS Results
Phi = Contingency Coefficient = 0.149
Level of significance: p 0.001
46
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
The Relationship Between the Phi Coefficient and the Contingency
Coefficient in a 2x2 Table
Contingency Coefficient
Phi Coefficient
C = 2 / (N+2 )
= 2 / N
C 2 = 2 / (N+2 )
2 = 2 / N
C 2(N+2 ) = 2
2 N = 2
C 2(N+2 ) - 2 = 0
2 N - 2 = 0
Therefore
C 2(N+2 ) - 2 = 2 N - 2 C 2(N+2 ) = 2 N
47
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Limitation of the Contingency Coefficient C
A correlation coefficient is designed to range on a scale from 0.0 to 1.0 Where 0.0 indicates no linear correlation
and 1.0 indicates a perfect linear correlation.
While a contingency coefficient may not exceed 1.0 it can be limited to less than 1.0, if the frequency table is non-symmetric. Examples of symmetric frequency tables
2x2, 3x3, 5x5, etc. Examples of non-symmetric frequency tables
2x3, 4x7, 5x6, etc
48
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Cramér’s V Correlation Coefficient
V = 2 / [N(k – 1)]
A technique for determining the correlation between two nominal variables An alternative to the Contingency Coefficient C if the data is cast in a non-symmetric frequency table Assumptions The two variables are nominal or variables
that have been reduced to a nominal scale of measurement
The data have been cast in a 2x2 frequency
table or larger table The sample has been drawn randomly from
the population if the significance of C is to be tested
49
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
An Example
Consider the previous examples on the correlation between race and type of sentence, but this time the data has been cast in a 2x3 frequency table. Sentence
White
Black
Other
Totals
Probation/Deferred Adjudication
314
157
39
510
Jail 210 83 157 450 Totals 524 240 196 960 Step 1 Calculate the expected frequencies Sentence
White
Black
Other
Totals
Probation/Deferred Adjudication
278.38
127.5
104.125
510
Jail 245.62 112.5 91.875 450 Totals 524 240 196 960
50
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Step 2 Calculate chi-square 2 = (314-278.38)2/278.38+(157-127.5) 2/127.5+
(39-104.125) 2/104.125+(210-245.62) 2/245.62+ (83-112.5) 2/112.5+(157-91.875) 2/91.875
2 = 111.19 Cramér’s V = 2 / [N(k – 1)]
k = 2, the lesser of the columns or rows, therefore (k – 1) = (2 – 1) = 1
V = 111.19 / [960(2 – 1)]
V = 0.340 The Contingency Coefficient C for the same data is as follows: C = 2 / (N+2 )
C = 111.19 / (960+111.19 ) = 0.322
51
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Cramér’s V SPSS Results
Cramér’s V = 0.340
Level of significance: p 0.001
52
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Lambda (): Guttman’s Coefficient of Predictability
= (Fiv – Mdv) / (N – Mdv)
A technique to determine the extent to which the error in the prediction of one nominal variable can be reduced by knowledge of another nominal variable. Assumptions The two variables cast in the frequency table
are assumed to be nominal variables, or variables reduced to a nominal scale of measurement.
If the significance of lambda is to be tested,
the sample must be selected on a random basis from the population.
53
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
An Example
A survey of 195 people was conducted on attitudes toward the death penalty. The results were cast in a two-way frequency table by gender and attitude.
The Results
Gender
Attitude
Totals
Favorable Mixed Unfavorable Male 60 20 15 95 Female 50 10 40 100 Totals 110 30 55 195 Q To what extent can the error in the prediction of gender be reduced by knowledge of the person’s attitude toward the death penalty? Q To what extent can the error in the prediction of attitude toward the death penalty be reduced by knowledge of the person’s gender?
54
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Predicting Gender from Attitude
Let attitude serve as the independent variable (IV) and gender serve as the dependent variable (DV)
Therefore, the columns in the frequency table are the categories of the IV and the rows the categories of the DV.
Calculating lambda
= (Fiv – Mdv) / (N – Mdv)
Fiv = Sum of the largest cell frequencies within each category of the IV, attitude
Mdv = The largest marginal total in the categories of the DV, gender
N = The total number of cases
55
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Gender (DV)
Attitude (IV)
Totals
Favorable Mixed Unfavorable Male 60 20 15 95 Female 50 10 40 100 Totals 110 30 55 200 Calculation of Fiv, the sum of the largest cell frequencies in each category of the IV, attitude
Fiv = (60+20+40) = 120
Calculation of Mdv, the largest marginal total in the categories of the DV, gender Mdv = 100 Calculation of lambda
= (Fiv – Mdv) / (N – Mdv)
= [(60+20+40) – 100] / (195 – 100) = 0.21 Interpretation The error in predicting gender is reduced by 0.21, (21%), by knowledge of attitude toward the death penalty.
56
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Lambda: SPSS Results
Lambdagender = 0.211, Lambdaattitude = 0.00
Significance: pgender 0.089 pattitude = NA
57
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Predicting Attitude from Gender
Let gender serve as the independent variable (IV) and attitude serve as the dependent variable (DV)
Therefore, the columns in the frequency table are the categories of the DV and the rows the categories of the IV.
Gender (IV)
Attitude (DV)
Totals
Favorable Mixed Unfavorable Male 60 20 15 95 Female 50 10 40 100 Totals 110 30 55 200 Calculating lambda
= (Fiv – Mdv) / (N – Mdv)
Fiv = (60+50) = 110 Sum of the largest cell frequencies within each category of the IV, gender
58
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
Mdv = 110 The largest marginal total in the categories of the DV, attitude = [(60+50) – 110] / (195 – 110) = 0.0
Interpretation
Since = 0.0, the reduction in the error in predicting attitude toward the death penalty by a knowledge of a person’s gender would be 0%, or none at all.
Lambda is asymmetrical Let one variable = X and the other = Y
The reduction in error in predicting X from Y
will not necessarily be the same as the reduction of error in predicting Y from X
59
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
The Concept of the Reduction of Error in Prediction
Gender (IV)
Attitude (DV)
Totals
Favorable Mixed Unfavorable Male 60 20 15 95 Female 50 10 40 100 Totals 110 30 55 195
Consider the problem of predicting gender from attitude. If we knew nothing about attitude, the best guess about a person’s gender would be the modal category, female (nf = 100).
The number of errors, therefore would be 95
Taking into consideration attitude, we would predict
Male if favorable (error = 50 females) Male if mixed attitude (error = 10 females)
Female if unfavorable (error = 15 males)
Total errors = (50+10+15) = 75
60
Nonparametric Correlation Techniques: Charles M. Friel PhD, Criminal Justice Center, Sam Houston State University
The proportionate reduction in error (PER) would be as follows PRE = (errors without IV – errors with IV) errors with out IV PRE = (95 – 75) / (95) = 0.21 NB The PRE is identical to the previously calculated value of lambda using SPSS.
= PRE = 0.21