Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | lambert-carroll |
View: | 222 times |
Download: | 1 times |
Correlation
What is a correlation?
• A correlation examines the relationship between two measured variables.– No manipulation by the experimenter/just observed. – E.g., Look at relationship between height and weight.
• You can correlate any two variables as long as they are numerical (no nominal variables)
• Is there a relationship between the height and weight of the students in this room?– Of course! Taller students tend to weigh more.
1) Strength of Relationships
• 2 aspects of the relationship2 aspects of the relationship: Strength and Direction.
• The relationship between any 2 variables is rarely a perfect correlation.
• Perfect correlation: +1.00 OR –1.00– strongest possible relationship– Tough to find.
• No correlation: 0.00 (no relationship).– E.g, height and social security #.
2) Direction of the Relationship• Positive relationship – Variables change in the
same direction.• As X is increasing, Y is increasing
• As X is decreasing, Y is decreasing
– E.g., As height increases, so does weight.
• Negative relationship – Variables change in opposite directions.
• As X is increasing, Y is decreasing
• As X is decreasing, Y is increasing
– E.g., As TV time increases, grades decrease
Indicated bysign; (+) or (-).
Positive Correlation–as x increases, y increases
x = SAT scorey = GPA
GP
AScatter Plots and Types of Correlation
4.003.753.50
3.002.752.502.252.00
1.501.75
3.25
300 350 400 450 500 550 600 650 700 750 800
Math SAT
Negative Correlation–as x increases, y decreases
x = hours of trainingy = number of accidents
Scatter Plots and Types of Correlation
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
Acc
iden
ts
No linear correlation
x = height y = IQ
Scatter Plots and Types of Correlation
160
150140
130120
110
100
90
80
60 64 68 72 76 80
Height
IQ
Correlation Coefficient Interpretation
Coefficient
Range
Strength of
Relationship
0.00 - 0.20 Very Low
0.20 - 0.40 Low
0.40 - 0.60 Moderate
0.60 - 0.80 High Moderate
0.80 - 1.00 Very High
Direction
• Positive relationship
Height
Weight
r = +.80
Direction
• Negative relationship
Exam score
TV
watching per
week
r = -.80
Interpreting correlations - Summary
• Absolute size shows strength of relationship
• The higher the absolute number, the stronger the relationship – A correlation of -.80 is reflects as powerful a
relationship as one of +.80
• A correlation of 0.00 means no relationship– E.g., Can’t predict GPA from ID number
• All correlations range from -1.00 to +1.00
Strength of relationship
• Perfect Correlation
Exam score
TV
watching per
week
r = -1.0
Strength of relationship
• Strong Correlation
Exam score
Quality of B
reakfast
r = + 0.8
Strength of relationship
• Moderate Correlation
Weight
Shoe S
ize
r = + 0.4
Strength of relationship
• Weak Correlation (negative)
Weight
Shoe S
ize
r = - 0.2
Strength of relationship
• No Correlation (horizontal line)
Height
IQ
r = 0.0
One more example
Amount ofStudy Time
Exam Grade
Social Security Number
# of classes missed
+.80
-.60.00
More examples
• Positive relationshipsPositive relationships:– water consumption and
temperature.– study time and grades.– time spent in jail to
severity of offense.– What else??
• Negative relationshipsNegative relationships:– alcohol consumption
and driving ability.– # of hateful remarks
and # of friends.– What else??
Why used: 1) Prediction; 2) Validity (does something measure what it’s suppose to measure; 3) Reliability(does something produce a consistent score).
*** Easier to do than experiments ***
Pearson correlation coefficient
• r = the Pearson coefficient
• r measures the amount that the two variables (X and Y) vary together (i.e., covary) taking into account how much they vary apart
• Pearson’s r is the most common correlation coefficient; there are others.
Computing the Pearson correlation coefficient
• To put it another way:
• Or
separately vary Y and X which todegree
ther vary togeY and X which todegreer
separately Y and X ofy variabilit
Y and X ofity covariabilr
Sum of Products of Deviations
• Measuring X and Y individually (the denominator):– compute the sums of squares for each variable
• Measuring X and Y together: Sum of Products– Definitional formula
– Computational formula
• n is the number of (X, Y) pairs
))(( YYXXSP
n
YXXYSP
Correlation Coefficent:
• the equation for Pearson’s r:
• expanded form:YX SSSS
SPr
nY
YnX
X
nYX
XYr
22
22
Limitations of Pearson’s r
1. Correlation does not mean causation!!• Third Variable problem – there’s always the
possibility of a third factor causing the relationship.
• E.g., Moderate, positive relationship between viewing violent TV and engaging in aggressive behaviors.
Possibilities
Viewing violent television
Tendency to engagein aggressive behaviors
Viewing violent television
Tendency to engagein aggressive behaviors
A third factor;EX. genetic tendencyto like violence Viewing violent
television
Tendency to engagein aggressive behaviors
Limitations of Pearson’s r
1. Correlation does not mean causation
2. Restriction of range– Restricted range of measured values can lead
to inaccurate conclusions about the data
Limitations of Pearson’s r
3. Outliers (extreme scores) Scores with extreme X and/or Y value can drastically
effect Pearson’s r
4. Ambiguity of the strength of the relationship Pearson r does not give a directly interpretable strength
of the relationship between X and Y
5. Interval or ratio data.
Coefficient of Determination
• r2 = percentage of variance in Y accounted for by X
• Calculated by squaring r (Pearson correlational coefficient)
• Ranges from 0 to 1 (positive only)
• This number is a meaningful proportion (unlike the Pearson’s r).
Coefficient of Determination: An example
• Example: – What percentage of variance is accounted for in
Y by X with a Pearson r = 0.50?– The r2 = (0.50)2 = 0.25 = 25%
• The number is always positive