Correlation andCorrelationalResearch
Chapter 5
The Two Disciplines of Scientific Psychology
• Lee Cronbach• APA Presidential Address
Fundamentals of Correlation • correlations reveal the degree of statistical association
between two variables
• used in both experimental and non-experimental research designs
• Correlational/Non-Experimental research • establishes whether naturally occurring variables are
statistically related
Correlational Research
• in correlational research, variables are measured rather than manipulated
• manipulation is the hallmark of experimentation which enables researchers to draw causal inferences
• distinction between measurement and manipulation drives the oft-cited mantra “correlation does not equal causation”
Direction of Relationship
Positive
• two variables tend to increase or decrease together
• higher scores on one variable on average are associated with higher scores on the other variable
• lower scores on one variable on average are associated with lower scores on the other variable
• e.g., relationship between job satisfaction and income
Direction of Relationship
Negative
• two variables tend to move in opposite directions
• higher scores on one variable are on average associated with lower scores on the other variable
• lower scores on one variable are on average associated with higher scores on the other variable
• e.g., relationship between hours video game playing and hours reading
Hypothetical Data
Participant Weekly Hours of
TV Watched
(X)
Perceived Crime
Risk (%)
(Y1)
Trust in
Other People
(Y2)Wilma 2 10 22Jacob 2 40 11Carlos 4 20 18Shonda 4 30 14Alex 5 30 10Rita 6 50 12Mike 9 70 7Kyoko 11 60 9Robert 11 80 10Deborah 19 70 6
Graphing Bivariate Relationships
a two-dimensional graph
• values of one of the variables are plotted on the horizontal axis (labelled as X and known as the abscissa)
• values of the other observations are plotted on the vertical axis (often labelled as Y and known as the ordinate)
Scatterplots/Scattergram
Positive (Direct) Relationship Negative (Inverse) Relationship
Calculating Correlations
Pearson product-moment correlation coefficient• Pearson’s r
• Variables measured on interval or ratio scale
Spearman’s rank-order correlation coefficient• Spearman’s rho
• One or both variables measured on ordinal scale
Depends on scale of measurement
Pearson’s r
• based on a ratio that involve the covariance and standard deviations of the two variables (X and Y)
• the covariance is a number that reflects degree to which two variables vary together
• as with variance, covariance calculation differs for populations and samples
• deal with population calculations
Ordinal or Ratio Scales
Pearson’s r
Covariance -- Definitional Formula
𝜎𝑋𝑌 = 𝑋 − 𝜇𝑋)(𝑌 − 𝜇𝑌
𝑁
Standard Deviation-- Definitional Formula
𝜎𝑋 = (𝑋−𝜇𝑋)2
𝑁𝜎𝑌 =
(𝑌−𝜇𝑌 )2
𝑁Pearson’s r
𝑟𝑋𝑌 =𝜎𝑋𝑌
𝜎𝑋𝜎𝑌
Spearman Rank-Ordered Correlation
Based on Ranks for Each of the Two Variables
If no tied ranks then can use simplified formula
𝑟𝑆𝑝𝑒𝑎𝑟𝑚𝑎𝑛 = 1 −6 𝐷2
𝑁(𝑁2−1)
𝑟𝑆𝑝𝑒𝑎𝑟𝑚𝑎𝑛 =𝜎𝑋𝑌
𝜎𝑋𝜎𝑌
Interpreting Magnitude of Correlations
• In addition to considering the direction of the relationship (i.e., positive or negative), we need to attend to the strength of the relationship.
• correlation only takes on limited range of values
−1.00 ≤ 𝑟 ≤ +1.00
• absolute value reflects strength/degree of relationship between two variables
Interpreting Magnitude of Correlations
• square of the correlation coefficient
• 𝑟2
• aka coefficient of determination
• proportion of variability in one variable that can be accounted for through the linear relationship with the other variable
• thus 𝑟2 = .82 = .64 as does 𝑟2 = −.82 = .64
Interpreting Magnitude of Correlations
• Is the relationship between two variables weak? Moderate? Strong?
Cohen’s Guidelines
Guidelines from Cohen (1988)
Absolute value of r
Weak .10 - .29
Moderate .30 - .49
Strong > .50
Interpreting Magnitude of Correlation
• If a psychological researcher reports a correlation of .33 between integrity and job performance, can one say that the two variables are 33% related?
• No• r2 (coefficient of determination) reveals how much of the
differences in Y scores are attributable to differences in X scores
• .332 = .1089
• so only about 11% of the variability is accounted for
Coefficient of determination
Nonlinear Relationships
• magnitude of the correlation coefficient influenced by degree on non-linearity
test
performance
Alertness
sleepy alert panic
r = 0
• can assess the strength of non-linear relationships with alternative
statistical procedures such as 𝜀2
Range Restriction
Correlation And Causation
• Bidirectionality Issue
• Third Variable Problem
Bidirectionality Problem
GPAReligiosity
GPAReligiosity
Religiosity Causes GPA
GPA Causes Religiosity
Correlation between Religiosity and GPA GPAReligiosity
Third-Variable Problem
GPAReligiosity
Correlation between Religiosity and GPA
Parenting Style
• spurious relationship
Strategies to Reduce Causal Ambiguity in Correlational Research
Statistical approaches• measure and statistically control for a third variable • partial correlation analysis
• e.g., relationship between right-hand palm size (X) & verbal ability (Y)𝑟𝑋𝑌 = 0.70
• perhaps a spurious relationship caused by a common third variable –age (Z)
𝑟𝑋𝑍 = 0.90 𝑟𝑌𝑍 = 0.80
𝑟𝑋𝑌∙𝑍 = −0.076
Research Designs • Cross-Sectional Designs
• bidirectionality potential problem
• Prospective Longitudinal design• X measured at Time 1• Y measured at Time 2 • Rules out bidirectionality problem
• Cross-lagged panel design • Measure X and Y at Time 1• Repeat X and Y measurement at Time 2• Examine pattern of relationships (i.e., cross-lagged
correlations) across variables and time
Cross-Lagged Panel Design Eron et al., 1972
Drawing Causal Conclusions
• How do we rule out all plausible third variables
(confounds) using correlational research designs?
• We can’ t – only the control afforded by rigorous
experimentation provides strong tests of causation
• as noted by some recent researchers employing such designs:
“longitudinal correlational research can be used to compare
the relative plausibility of alternative causal perspectives” but
they “do not provide a strong test of causation”
Correlation/Regression and Prediction
• A goal of science is to forecast future events
• In simple linear regression, scores on X can be used to predict scores on Y assuming a meaningful relationship (r) has been established between X and Y in past research
Linear Regression
• interest in predicting scores on one variable (Y) based upon linear relationship with another variable (X)
• X is the predictor; Y is the criterion
Regression Equation
• based on formula for straight line
𝑌 = 𝑎 + 𝑏𝑋
where 𝑌 is the predicted value of Y for a given value of X
a is the Y-intercept (i.e., 𝑌 for X = 0)
b is the slope of the regression line
• can be plotted on scatterplot
Regression Equation - Calculation
• need to calculate values for • a – the y-intercept and
• b – the slope
𝑏 =𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑋𝑌
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑋= r ∗
𝑆𝐷𝑌
𝑆𝐷𝑋
𝑎 = 𝑌 − 𝑏 𝑋
Interpreting Regression Equation
For example assume were looking at the relationship between how many children a couple has (Y) and the number of years they’ve been married. From a sample we calculate the following:
thus, if a couple is married for 0 years we would predict that they would have -0.84 of a child
for each year they’re married we’d expect couple to have an additional 1.21 children
𝑌 = −0.84 + 1.21𝑋
Multiple (Linear) Regression
• Multiple predictors are used to predict a criterion measure
• ideally want as little overlap as possible between predictors (X’s)
• i.e., want each predictor to account for unique variance in criterion (Y)
𝑌 = 𝑎 + 𝑏1𝑋1 + 𝑏2𝑋2 + … .+𝑏𝑘𝑋𝑘
Multiple Regression
GeneralCAT
Criterion
Structured Interview
WorkSample
GeneralCAT
Criterion
Structured Interview
WorkSample
ideally want to avoid multicollinearity in order to maximize prediction
Example - One Criterion (Y) and Three Predictors (s)
𝐻𝑒𝑟𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 𝑎𝑟𝑒 𝑢𝑛𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑 𝐻𝑒𝑟𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 𝑎𝑟𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑
Benefits of Correlational Research
• prediction in everyday life
• test validation
• broad range of applications
• establishing relationship
• convergence with experiments