Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | roy-carpenter |
View: | 222 times |
Download: | 0 times |
Correlation and Regression
Quantitative Methods in HPELS
440:210
Agenda
Introduction The Pearson Correlation Hypothesis Tests with the Pearson
Correlation Regression Instat Nonparametric versions
Introduction Correlation: Statistical technique used to
measure and describe a relationship between two variables
Direction of relationship: Positive Negative
Form of relationship: Linear Quadratic . . .
Degree of relationship: -1.0 0.0 +1.0
Uses of Correlations
Prediction Validity Reliability
Agenda
Introduction The Pearson Correlation Hypothesis Tests with the Pearson
Correlation Regression Instat Nonparametric versions
The Pearson Correlation Statistical Notation Recall for ANOVA:
r = Pearson correlationSP = sum of products of deviationsMx = mean of x scores
SSx = sum of squares of x scores
Pearson Correlation
Formula Considerations Recall for ANOVA:SP = (X – Mx)(Y – My)
SP = XY – XY / n
SSx = (X – Mx)2
SSy = (Y – My)2
r = SP / √SSxSSy
Pearson Correlation
Step 1: Calculate SP Step 2: Calculate SS for X and Y values Step 3: Calcuate r
Step 1 SP
SP = (X – Mx)(Y – My)SP = (-6*-1)+(4*1)+(-2*-1)+(2*0)+(2*1)SP = 6 + 4 + 2 + 0 + 2SP = 14
SP = XY – XY / nSP = 74 – [30(100)]/5SP = 74 - 60SP = 14
X=30 Y=10
XY = (0*1)+(10*3)+(4*1)+(8*2)+(8*3)XY = 0 + 30 + 4 + 16 + 24XY = 74
Step 2 SSx and SSy
Step 3 r
r = SP / √SSxSSy
r = 14 / √(64)(4) r = 14 / √256 r = 14/16 r = 0.875
Interpretation of r
Correlation ≠ causality Restricted range
If data does not represent the full range of scores – be wary
Outliers can have a dramatic effect Figure 16.9
Correlation and variability Coefficient of determination (r2)
Agenda
Introduction The Pearson Correlation Hypothesis Tests with the Pearson
Correlation Regression Instat Nonparametric versions
The Process
Step 1: State hypotheses Non directional:
H0: ρ = 0 (no population correlation) H1: ρ ≠ 0 (population correlation exists)
Directional: H0: ρ ≤ 0 (no positive population correlation) H1: ρ < 0 (positive population correlation exists)
Step 2: Set criteria = 0.05
Step 3: Collect data and calculate statistic r
Step 4: Make decision Accept or reject
Example
Researchers are interested in determining if leg strength is related to jumping ability
Researchers measure leg strength with 1RM squat (lbs) and vertical jump height (inches) in 5 subjects (n = 5)
Step 1: State Hypotheses
Non-Directional
H0: ρ = 0
H1: ρ ≠ 0
Step 2: Set Criteria
Alpha () = 0.05
Critical Value:
Use Critical Values for Pearson Correlation Table
Appendix B.6 (p 697)
Information Needed:
df = n - 2
Alpha (a) = 0.05
Directional or non-directional?
Critical value = 0.878
0.878
Step 3: Collect Data and Calculate Statistic
Data:
X Y XY
200 25 5000
180 22 3960
225 27 6075
300 27 8100
160 25 4000
1065 126 27135
Calculate SPSP = XY – XY / nSP = 27135 – [1065(126)]/5SP = 27135 - 26838SP = 297
Calculate SSx
X X-Mx (X-Mx)2
200 -13 169
180 -33 1089
225 12 144
300 87 7569
160 -53 2809
213M 11780
Calculate SSy
Y Y-My (Y-My)2
25 -0.2 0.04
22 -3.2 10.24
27 1.8 3.24
27 1.8 3.24
25 -0.2 0.04
25.2M 16.8
X X-Mx (X-Mx)2
200 -13 169
180 -33 1089
225 12 144
300 87 7569
160 -53 2809
213M 11780
r = SP / √SSxSSy
r = 297 / √11780(16.8)
r = 297 / √197904
r = 297 / 444.86
r = 0.667
Step 3: Collect Data and Calculate Statistic
Calculate r Step 4: Make Decision
0.667 < 0.878
Accept or reject?
Agenda
Introduction The Pearson Correlation Hypothesis Tests with the Pearson
Correlation Regression Instat Nonparametric versions
Regression Recall Several uses of correlation:
PredictionValidityReliability
Regression attempts to predict one variable based on information about the other variable
Line of best fit
Regression
Line of best fit can be described with the following linear equation Y = bX + a where:Y = predicted Y valueb = slope of lineX = any X valuea = intercept
Y = bX + a, where:
Y = cost (?)
b = cost per hour ($5)
X = number of hours (?)
a = membership cost ($25)Y = 5X + 25
Y = 5(10) + 25
Y = 50 + 25 = 75
Y = 5X + 25
Y = 5(30) + 25
Y = 150 + 25 = 175
5
25
Line of best fit minimizes
distances of points from line
Calculation of the Regression Line
Regression line = line of best fit = linear equation
SP = (X – Mx)(Y – My)
SSx = (X – Mx)2
b = SP / SSx
a = My - bMx
Example 16.14, p 557
SP = (X – Mx)(Y – My)
SP = 16
SSx = (X – Mx)2
SP = 10
b = SP / SSx
b = 16 / 10 = 1.6
a = My - bMx
a = 6 – 1.6(5) = -2
Mx=5 My=6
Y = bX + a
Y = 1.6(X) - 2
Agenda
Introduction The Pearson Correlation Hypothesis Tests with the Pearson
Correlation Regression Instat Nonparametric versions
Instat - Correlation Type data from sample into a column.
Label column appropriately. Choose “Manage” Choose “Column Properties” Choose “Name”
Choose “Statistics” Choose “Regression”
Choose “Correlation”
Instat – Correlation Choose the appropriate variables to be
correlated Click OK Interpret the p-value
Instat – Regression
Type data from sample into a column. Label column appropriately.
Choose “Manage” Choose “Column Properties” Choose “Name”
Choose “Statistics” Choose “Regression”
Choose “Simple”
Instat – Regression
Choose appropriate variables for: Response (Y) Explanatory (X)
Check “significance test” Check “ANOVA table” Check “Plots” Click OK Interpret p-value
Reporting Correlation Results Information to include:
Value of the r statistic Sample size p-value
Examples: A correlation of the data revealed that strength and
jumping ability were not significantly related (r = 0.667, n = 5, p > 0.05)
Correlation matrices are used when interrelationships of several variables are tested (Table 1, p 541)
Agenda
Introduction The Pearson Correlation Hypothesis Tests with the Pearson
Correlation Regression Instat Nonparametric versions
Nonparametric Versions Spearman rho when at least one of the
data sets is ordinal Point biserial correlation when one set
of data is ratio/interval and the other is dichotomousMale vs. femaleSuccess vs. failure
Phi coefficient when both data sets are dichotomous
Violation of Assumptions Nonparametric Version Friedman Test
(Not covered) When to use the Friedman Test:
Related-samples design with three or more groups
Scale of measurement assumption violation: Ordinal data
Normality assumption violation: Regardless of scale of measurement
Textbook Assignment
Problems: 5, 7, 10, 23 (with post hoc)