Date post: | 23-Nov-2014 |
Category: |
Documents |
Upload: | kathiresh-nataraj |
View: | 164 times |
Download: | 5 times |
Correlation
Scatterplots & Correlation
Deviation & Computational Equations
Testing Significance
Intercorrelation Matrix
Partial Correlations
KEY CONCEPTS*****
Correlation
Correlation coefficientInterpretation of the concepts of magnitude and directionUse of a scatterplot to diagnose correlation Deviation score formula for the Pearson Product-Moment Correlation CoefficientKarl Pearson (1857-1936)Concepts of:
Sum of cross productsSums of squares of X & Y
Computational formula for the Pearson Product-Moment Correlation Coefficientt-test for determining the significance of r and dfNull hypothesis in determining the significance of rCoefficient of determinationCoefficient of nondeterminationAssumptions for the Pearson r
Linear relationshipX & Y are metric variablesRandomly drawn sampleX & Y are normally distributed in the population
The concept of a nonlinear relationshipIntercorrelation matrixCaveats in interpreting an intercorrelation matrixInterpretation of a partial correlationZero-order correlation1st , 2nd, etc. order correlations
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
2
Lecture Outline
The concept of correlation
Using a scatterplot to identify a correlation
Pearson Product-Moment Correlation Coefficient
Coefficients of correlation, determination & nondetermination
Intercorrelation matrix
Partial correlation coefficient
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
3
The Problem of Determining Relationships
Science attempts to find the "causes" of phenomena.
Q Why do some prisoners attempt to escape from prison and others do not?
Q Why do some judges have a constant backlogs of cases while others run an efficient docket?
Q What factors account for the fact that some countries have a higher rate of violent crime than others?
Q Is violence in the media related to violent behavior in society?
Q Why do some police officers become "rogue cops"?
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
4
The Problem in Determining Causality
How can one determined if one variable is the “cause” of another?
Example
Do liberal laws on the purchase and possession of firearms cause increases in the incidence of violent crimes involving weapons?
Principles of causality
1st Are the two variables in question related ( X & Y)? Is there a covariance between them?
2nd Is there a replicable time sequence between the two variables, the variable thought to be causal (X) always preceding the variable thought to be the effect (Y)?
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
5
The Problem Determining Causality (cont.)
3rd Having eliminated or controlled for all other extraneous variables, can it be demonstrated that when X occurs Y always follows?
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
6
Correlation and Causality
The first step in determining whether one variable (X) is correlated with another variable (Y) involves …
Determining if the two variables covary
The concept of covariance
Is it true that as X increases …
Y also increases, and to what extent?
Or, is it true that as X increases …
Y decreases, and to what extent?
Caveat
The mere fact that two variables covary (i.e. correlate) is no proof that one is the cause of the other.
Correlation does not necessarily prove causation.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
7
A Correlation Coefficient
A correlation coefficient is an index number that measures …
The magnitude and
The direction of the relationship between two variables
It is designed to range in value between 0.0 and 1.0
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 +0.2 +0.4 +0.6 +0.8 +1.0
Negative PositiveRelationship RelationshipX Y X YX Y X Y
No relationship
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
8
Varieties of Correlational Statistics
Statisticians have developed many techniques for determining the correlation between two or more variables.
The primary difference among these techniques is a function of the types of variables being correlated (i.e. nonmetric: nominal, ordinal, or metric: interval or ratio)
Metric with metric
Metric with nonmetric
Nonmetric with nonmetric
A partial list of correlational techniques
Pearson product-moment correlation coefficient (metric with metric)
Spearman's rank-difference coefficient (rho) (ordinal with ordinal)
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
9
Varieties of Correlational Statistics (cont.)
Biserial coefficient (a metric variable with a metric variable that has been artificially reduced to categories)
Point biserial coefficient (a metric variable with a truly dichotomous variable)
Tetrachoric correlation coefficient (two metric variables that have been artificially reduced to dichotomous categories)
Phi Coefficient (two truly dichotomous variables)
Partial correlation (two metric variables with the intercorrelation with a third variable removed from both of them)
Kendall coefficient of concordance (three or more ordinal variables
Multiple correlation (one metric variable with two or more metric and/or nonmetric variables)
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
10
The Scatterplot A useful tool for visually identifying the presence of a possible relationship between two metric variables.
Correlation r = +0.8257 (p 0.001)
Correlation r = -0.4174 (p 0.001)The Scatterplot (cont.)
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
11
AGE
40302010
30
20
10
0
AGE AT FIRST ARREST
24222018161412
30
20
10
0
Correlation r = -0.0841 (P = 0.489)
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
12
TIME TO DISPOSITION IN DAYS
16014012010080604020
30
20
10
0
An ExampleIs There a Correlation Between
Homicide & Rape?
The incidence of homicide and rape per 100,000 population in a sample of seven
medium size cities
City Homicide (X)
Rape (Y)
A 4 16
B 6 29
C 10 43
D 5 20
E 1 3
F 2 4
G 3 6
Totals 31 121
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
13
Scatterplot of the Homicide v Rape
Do the two variables appear to be related?
What is the magnitude of the relationship on a scale of 0.0 to 1.0?
What is the direction of the relationship, positive or negative?
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
14
RAPE
50454035302520151050
12
11
10
9
8
7
6
5
4
3
2
1
0
Pearson Product-Moment Correlation Coefficient
Karl Pearson (1857-1936) British mathematician and statistician who also developed the Chi-Square Test
r = (X – X) (Y – Y)
(X – X)2 (Y – Y)2
Incidence of Homicide (X) and Rape (Y)
City X Y (X – X) 2 (Y - Y) 2 (X – X) (Y - Y)A 4 16 0.1849 1.6641 0.5547B 6 29 2.4649 137.124 18.4789C 10 43 31.0249 661.004 143.2047D 5 20 0.3249 5.7100 1.5447E 1 3 11.7649 204.204 49.0147F 2 4 5.9049 176.624 32.2947G 3 6 2.0449 127.464 16.1447
Totals 31 121 53.7143 1310.794 261.2371
Mean number of homicides & rapes
X = 4.43 and Y = 17.29
Pearson Product-Moment Correlation Coefficient (cont.)
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
15
Sum of squared deviations in homicides = 53.71
Sum of squared deviations in rapes = 1310.79
Sum of cross products (SP) = 261.24
Calculation of the Pearson r
r = 261.24 = 261.24
(53.71) (1310.79) 70402.53
r = (261.24) / (265.33) = +0.985
Interpretation
The magnitude of the correlation between homicide and rape = 0.985.
The direction of the relationship is positive. As the incidence of homicide increases so does the incidence of rape.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
16
An Alternative Way to Calculate a Pearson Correlation
The previous equation is called a deviation score equation since the mean of each variable is subtracted from each respective case.
An alternative computational equation is given below. It will yield the same result within rounding error.
r = N(XY) – (X) (Y)
[N X2 – (X)2] [NY2 – (Y)2]
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
17
An Alternative Way to Calculate a Pearson Correlation (cont.)
City X Y (X) 2 (Y) 2 (XY)A 4 16 16 256 64B 6 29 36 841 174C 10 43 100 1849 430D 5 20 25 400 100E 1 3 1 9 3F 2 4 4 16 8G 3 6 9 36 18
Totals 31 121 191 3407 797
r = 7 (797) – (31) (121)
[7 (191)– (31)2] [7 (3407) – (121)2]
r = 0.985
This is the same value computed with the deviation score equation.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
18
Determining the Significance of a Correlation Coefficient
The problem
Imagine a population in which X and Y are not related, the correlation = 0.0. ( = rho)
Is it possible to draw a random sample from that population and find that the correlation between X & Y in the sample is not 0.0?
Of course this is possible, but what is the probability of that happening?
A t-test
A t-test can be used to answer this question.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
19
A t-Test for the Significance of a Correlation Coefficient
t = [ r N – 2) ] / 1 – r2
df = (N – 2)
The null hypothesis H0
In the population, the correlation between X & Y is = 0.0
What is the probability, therefore that the correlation obtained in the sample came from a population where the parameter = 0.0?
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
20
A t-Test for the Significance of a Correlation Coefficient (cont.)
For the correlation between homicide and rape
r = 0.985
t = [ 0.985 7 – 2) ] / 1 – (0.985)2
t = (2.203) / (0.1726) = 12.767
df = (N - 2) = (7 cities - 2) = 5
The critical value of t for df = 5 and = 0.05 is t = 2.571
Interpretation
Since 12.767 2.571, r is significant at p 0.05
Decision Reject the null hypothesis and affirm that the two variables are positively related in the population.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
21
Coefficients of Determination& Non-determination
r = the correlation between X and Y
e.g. r = 0.985
r 2 = the coefficient of determination
r2 = (0.985)2 = 0.97
This is the proportion of variance in Y that can be explained by X, in percentage terms 97%
1 - r 2 = the coefficient of nondetermination
1 - r2 = (1 - 0.9852)= 0.03
This is the proportion of variance in Y that can not be explained by X, in percentage terms 3%
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
22
Some Examples of SPSS Correlational Output
Given a random sample of 70 felony cases. Q Is there a correlation between the age of the offender and the length of sentence?
Correlation r = 0.826, p 0.001
Correlations
1.000 .826**
. .000
70 70
.826** 1.000
.000 .
70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
SENTENCE
AGE SENTENCE
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
23
AGE
40302010
30
20
10
0
Some Examples of SPSS Correlational Output (cont.)
Q Is there a correlation between the age of first arrest and the length of sentence?
Correlation r = -0.417, p 0.001
Correlations
1.000 -.417**
. .000
70 70
-.417** 1.000
.000 .
70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE_FIRS
SENTENCE
AGE_FIRS SENTENCE
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
24
AGE AT FIRST ARREST
24222018161412
30
20
10
0
Some Examples of SPSS Correlational Output (cont.)
Q Is there a correlation between the time to case disposition and the length of sentence?
Correlation r = -0.084, p 0.489
Correlations
1.000 -.084
. .489
70 70
-.084 1.000
.489 .
70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
TM_DISP
SENTENCE
TM_DISP SENTENCE
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
25
TIME TO DISPOSITION IN DAYS
16014012010080604020
30
20
10
0
Pearson Correlation Assumptions
That the relationship between X and Y can be represented by a straight line, i.e. it is linear.
That X and Y are metric variables, measured on an interval or ratio scale of measurement.
In using a t distribution to test the significance of the correlation coefficient …
That the sample was randomly drawn from the population, and
That X and Y are normally distributed in the population. This assumption is less important as the sample size increases
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
26
An Intercorrelation Matrix
Multiple correlations and their significance can be computed simultaneously and reported in an intercorrelation matrix
Example
Intercorrelation of age, age at first arrest, number of prior arrests and convictions, and length of sentence
SPSS Intercorrelation ResultsCorrelations
1.000 -.312** .179 .302* .826**
. .009 .138 .011 .000
70 70 70 70 70
-.312** 1.000 -.315** -.358** -.417**
.009 . .008 .002 .000
70 70 70 70 70
.179 -.315** 1.000 .795** .246*
.138 .008 . .000 .040
70 70 70 70 70
.302* -.358** .795** 1.000 .400**
.011 .002 .000 . .001
70 70 70 70 70
.826** -.417** .246* .400** 1.000
.000 .000 .040 .001 .
70 70 70 70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
AGE_FIRS
PR_ARRST
PR_CONV
SENTENCE
AGE AGE_FIRS PR_ARRST PR_CONV SENTENCE
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
27
Caveats in Interpreting an Intercorrelation Matrix
Are all the relationships linear?
Has each variable been checked for outliers that might lead to a Type I or II error?
Has each pair of variables (X & Y) been checked for bivariate outliers that might lead to a Type I or II error?
Can it be assumed that each variable is normally distributed in the population?
Can it be assumed that each pair of variables is a random sample from the population?
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
28
What Is the Meaning of a Linear Relationship?
The Pearson correlation assumes that the two variables are linearly related. What does this mean?
Example
Age and length of sentence
Notice that the straight line is a "fair" representation of the relationship.
The cases are about evenly distributed above and below the line.
This is called homogeneity of the variance of Y over levels of X.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
29
AGE
40302010
30
20
10
0
What Is the Meaning of a Linear Relationship? (cont.)
Example
Age at first arrest and length of sentence
Notice that the straight line is not a "fair" representation of the relationship.
The cases are not evenly distributed above and below the line.
This is called heterogeneity of the variance of Y over levels of X.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
30
AGE AT FIRST ARREST
24222018161412
30
20
10
0
The Problem of Multiple Intercorrelation
Imagine three variables that are interrelated with each other.
How can the correlation between two of them be computed …
Eliminating the intercorrelation that both have with the third variable?
Example
Age
Age at first arrest
Length of sentence
Correlations
1.000 -.312** .826**
. .009 .000
70 70 70
-.312** 1.000 -.417**
.009 . .000
70 70 70
.826** -.417** 1.000
.000 .000 .
70 70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
AGE_FIRS
SENTENCE
AGE AGE_FIRS SENTENCE
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
31
The Problem of Multiple Intercorrelation (cont.)
Q What is the correlation between age and sentence …
Eliminating the intercorrelation of both variables with age at first arrest?
A This problem can be solved by computing the partial correlation between age and sentence, controlling for age at first arrest.
Q How is a partial correlation computed?
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
32
Partial Correlation Coefficient
rXY.Z = [ rXY – (rXZ) (rYZ) ] / [ 1 - r2XZ 1 - r2
YZ ]
What is the correlation of X and Y taking out the intercorrelation of both variables with Z?
X Y
Z
rXY.Z = the partial correlation between X and Y, partialling out the inter-relationship between X and Z, and Y and Z
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
33
Partial Correlation Coefficient (cont.)
Example
What is the correlation between age (X) and length of sentence (Y), partialling out or controlling for age at first arrest (Z)?
Correlations
1.000 -.312** .826**
. .009 .000
70 70 70
-.312** 1.000 -.417**
.009 . .000
70 70 70
.826** -.417** 1.000
.000 .000 .
70 70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
AGE_FIRS
SENTENCE
AGE AGE_FIRS SENTENCE
Correlation is significant at the 0.01 level (2-tailed).**.
rXY.Z = [ .826 - (-.312) (-.417) ]
[ 1 - (-.312 )2 1 - ( -.417) 2 ]
rXY.Z 0.806
Notice the difference between the correlation (0.826) and the partial correlation (0.806) when controlled for age at first arrest.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
34
Partial Correlation Coefficient (cont.)
Example
What is the correlation between age at first arrest (X) and length of sentence (Y), partialling out or controlling for age (Z)?
Correlations
1.000 -.312** .826**
. .009 .000
70 70 70
-.312** 1.000 -.417**
.009 . .000
70 70 70
.826** -.417** 1.000
.000 .000 .
70 70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
AGE_FIRS
SENTENCE
AGE AGE_FIRS SENTENCE
Correlation is significant at the 0.01 level (2-tailed).**.
rXY.Z = [ -.417 - (-.312) (.826) ]
[ 1 - (-.312 )2 1 - ( .826) 2 ]
rXY.Z -0.298
Notice the difference between the correlation (-0.417) and the partial correlation (-0.298) when controlled for age.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
35
SPSS Partial Correlation Results
Age and sentence controlling for age at first arrest
- - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - -
Controlling for.. AGE_FIRS
AGE SENTENCE
AGE 1.0000 .8055 ( 0) ( 67) P= . P= .000
SENTENCE .8055 1.0000 ( 67) ( 0) P= .000 P= .
(Coefficient / (D.F.) / 2-tailed Significance)
" . " is printed if a coefficient cannot be computed
Age at first arrest and sentence controlling for age - - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - -
Controlling for.. AGE
SENTENCE AGE_FIRS
SENTENCE 1.0000 -.2980 ( 0) ( 67) P= . P= .013
AGE_FIRS -.2980 1.0000 ( 67) ( 0) P= .013 P= .
(Coefficient / (D.F.) / 2-tailed Significance)
" . " is printed if a coefficient cannot be computed
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
36
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
37
Multiple Partial Correlation
rxy.zz
More than one variable can be partialled out of a bivariate correlation.
X Y
Z Z
Example
What is the correlation between age (X) and sentence (Y) …
Partialling out prior arrests, time to disposition, prior convictions, drug use and the seriousness of the offense?
Multiple Partial Correlation (cont.)
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
38
The bivariate correlations among the seven variables.
Correlations
Correlations
1.000 .826** .179 .048 .302* .252* .609**
. .000 .138 .692 .011 .036 .000
70 70 70 70 70 70 70
.826** 1.000 .246* -.084 .400** .346** .744**
.000 . .040 .489 .001 .003 .000
70 70 70 70 70 70 70
.179 .246* 1.000 -.072 .795** -.003 .502**
.138 .040 . .556 .000 .979 .000
70 70 70 70 70 70 70
.048 -.084 -.072 1.000 -.066 -.024 .032
.692 .489 .556 . .589 .841 .794
70 70 70 70 70 70 70
.302* .400** .795** -.066 1.000 .056 .578**
.011 .001 .000 .589 . .645 .000
70 70 70 70 70 70 70
.252* .346** -.003 -.024 .056 1.000 .279*
.036 .003 .979 .841 .645 . .019
70 70 70 70 70 70 70
.609** .744** .502** .032 .578** .279* 1.000
.000 .000 .000 .794 .000 .019 .
70 70 70 70 70 70 70
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGE
SENTENCE
PR_ARRST
TM_DISP
PR_CONV
DR_SCORE
SER_INDX
AGE SENTENCE PR_ARRST TM_DISP PR_CONV DR_SCORE SER_INDX
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
39
Multiple Partial Correlation (cont.)
The partial correlation of age and sentence, controlling for five other variables.
- - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - -
Controlling for.. PR_ARRST TM_DISP PR_CONV DR_SCORE SER_INDX
AGE SENTENCE
AGE 1.0000 .7044 ( 0) ( 63) P= . P= .000
SENTENCE .7044 1.0000 ( 63) ( 0) P= .000 P= .
(Coefficient / (D.F.) / 2-tailed Significance)
" . " is printed if a coefficient cannot be computed
Notice the difference between the correlation and the partial correlation between age and sentence.
Correlation = +0.826
Partial correlation = +0.7044
The correlation is lower when the intercorrelation with the other five variables is removed.
Correlation: Charles M. Friel Ph.D., Criminal Justice Center,Sam Houston State University
40