Measuring Association
September 10, 2001
Statistics for Psychosocial Research
Lecture 2
Today’s Topics
• Covariance
• Pearson correlation
• Spearman correlation
• Association with non-linear data– tetrachoric / polychoric correlation– odds ratios
• Association matrices
Measuring Associations
• Goal: Evaluate assocations between pairs of variables being used to measure a construct of interest
• Examples:– depression: sleeping problems ~ guilt?
– disability: time to walk 10 m ~ self-reported difficulty walking 10 m?
– schizophrenia: social class ~ schizophrenia?
– SES: education ~ income?
Associations in Psychosocial Research
• Crucial to the process of defining a construct(1) “too” associated?
(2) not associated?• not appropriately describing “construct”
• measuring different dimensions of “construct” (e.g. mood versus somatic symptoms of depression)
Associations between variables affect….
• Reliability
• Validity
• Factor Analysis
• Latent Class Analysis
• Structural Equation Models
Measurement Issue
Variance and Covariance
• Variance: Measures variability in one variable, X.
• Covariance: Measures how to two variables, X and Y, covary.
s x xx N i xi
N2 1
12 2
1
( )
xy
N
iiiNxy yyxxs
11
1 ))((
Examples of Variance
-10 -5 0 5 10 15
0.0
0.1
0.2
0.3
0.4
X-10 -5 0 5 10 15
0.0
0.1
0.2
0.3
0.4
Y
Examples of Covariance
X-2 0 2 4
-10
-50
510
X-3 -2 -1 0 1 2 3
-10
-50
510
Correlation, r
Correlation is a scaled version of covariance
-1 < r < 1
r = 1 perfect positive correlation
r = -1 perfect negative correlation
r = 0 uncorrelated
22yx
xyxy
ss
sr
Covariance and Correlation
• When are they appropriate measures of association?
• What type of association do they describe?
• Transformations
• Scatterplots
• Outliers
Spearman Correlation
• Use when:– skewed data– outliers– sparse data
• Effect:– downweights outliers– smooths a curve to a straight line
Spearman Correlation
• Method:– sort x and y
– replace data with ranks
– calculate pearson correlation on ranks.
data ranks
x y x* y*
0.1 0.41 1
0.3 0.62 3
0.5 0.5 3 2
0.6 0.94 4
0.8 1.85 6
1.0 1.26 5
r=0.79 r=0.89
x0.2 0.4 0.6 0.8 1.0
0.4
0.8
1.2
1.6
Spearman Correlation
x0.0 0.5 1.0 1.5 2.0 2.5
-50
510
15
x10 20 40 60 80 100
020
4060
8010
0
Pearson r = 0.72 Spearman r = 0.59
Problems with Correlation/Covariance between variables
What if one (or both) variables is (are) not really continuous?
e.g. number of pregnancies and education level
r = -0.6
0 2 4 6 8
0.0
0.05
0.15
Number of Pregnancies1 2 3 4
0.0
0.1
0.2
0.3
0.4
Education Level
Is correlation appropriate?
Education Level1.0 1.5 2.0 2.5 3.0 3.5 4.0
02
46
8
Other issues
• Highly skewed or “floor” or “ceiling” effects– e.g. number of hospital admissions, percent
humidity daily in Baltimore in July, mini-mental exam score
• Ordinal: Takes finite number of values– e.g. on a scale of 1 to 5
• Binary: r = 0.35
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Binary Example: Disability
• Two types of association– redundancy: b and c cells are
close to 0
– hierarchy: either b OR c is close to 0, but other is not.
• Pearson correlation mixes up association and similarity of “marginal”distribution
• Consequences: If hierarchy is relevant, you get low reliability, consistency, and misleading internal validity by using pearson correlation.
Difficulty Walking1/4 Mile
No Yes
Difficulty Walking 1 mile
No 40 0 40
Yes 40 20 60
80 20 100
Alternative Measures
• Tetrachoric Correlation– binary variables
• Polychoric Correlation– ordinal variables
• Odds Ratio– binary variables
Tetrachoric Correlation
• Estimates what the correlation between two binary variables would be if the “ratings” were made on a continuous scale.
• Example: difficulty walking up 10 steps and difficulty lifting 10 lbs.
Level of Difficultyno difficulty difficulty
Tetrachoric Correlation
• Assumes that both “traits” are normally distributed
• Correlation, r, measures how narrow the ellipse is.
• a, b, c, d are the proportions in each quadrant
a
cd
b
Tetrachoric Correlation
For = ad/bc,
Approximation 1:
Approximation 2 (Digby):
Q
1
1
Q
3 4
3 4
1
1
Tetrachoric Correlation
• Example:– Tetrachoric correlation
= 0.61
– Pearson correlation = 0.41
– Odds ratio = 6
• Interpretation? – Same as Pearson
correlation.
Difficulty WalkingUp 10 Steps
No Yes
Difficulty Lifting 10 lb.
No 40 10 50
Yes 20 30 50
60 40 100
Odds Ratio
• Measure of association between two binary variables
• Risk associated with x given y.
• Example:odds of difficulty walking
up 10 steps to the odds of difficulty lifting 10 lb:
O Rp pp p
adbc
1 1
2 2
11
4 0 3 02 0 1 0 6
/ ( )/ ( )
( )( )( )( )
Odds Ratio
adbc ( )( )
( )( )4 0 2 04 0 0
Difficulty Walking1/4 Mile
No Yes
Difficulty Walking 1 mile
No 40 0 40
Yes 40 20 60
80 20 100
Pros and Cons
• Tetrachoric correlation– same interpretation as spearman and pearson correlation
– “difficult” to calculate
• Odds Ratio– easy to understand, but no “perfect” association that is
manageable
– easy to calculate
– not comparable to correlations
• May give you different results/inference!
Association Matrices
• Age, income, education
• Correlation Matrix grade income age
grade 1.00 0.45 -0.25
income 0.45 1.00 -0.13
age -0.25 -0.13 1.00
• Covariance Matrix grade income age grade 6.61 28.18 -5.77
income 28.18 592.69 -29.10
age -5.77 -29.10 81.23
Association Matrices
• Depression: depressed mood, sleep problems, fatigue
• Odds Ratio Matrix
depress sleep fatigue
depress --- 8.17 10.91
sleep 8.17 --- 16.12
fatigue 10.91 16.12 ---