MATH& 146
Lesson 36
Sections 5.2
Correlation
1
Correlation
If the ordered pairs (x,y) tend to follow a straight-
line path, there is linear correlation, or
correlation for short.
2
Positive Correlation
A positive correlation is a relationship between two
variables where if one variable increases, the other
one also increases.
• The more time you spend running on a treadmill, the
more calories you will tend to burn.
• Taller people tend to have larger shoe sizes.
• The longer your hair grows, the more shampoo you
will tend to need.
3
Negative Correlation
A negative correlation means that there is an inverse
relationship between two variables - when one variable
decreases, the other increases.
• A student who has many absences tends to have
lower grades.
• As weather gets colder, air conditioning costs tend
to decrease.
• The older a man gets, the less hair that he tends to
have.
4
Correlation Coefficient
If you suspect a correlation exists between two
variables, it can be measured with the correlation
coefficient.
The correlation coefficient, r or R, is a measure
of the form, strength, and direction of a scatter
plot, all in one number between –1 and +1.
5
Example 1
Using the 8 graphs below, describe the relationship
between a scatterplot and its correlation coefficient.
6
The Correlation Coefficient
Calculating R involves multiplying the z-scores for
the x- and y-coordinates of each ordered pair,
adding up the products, and dividing by n – 1.
1
x yz zR
n
7
The Correlation Coefficient
Fortunately, while we can compute the correlation
using the formula, we will usually perform the
calculations on a computer or calculator.
For graphing calculators, the correlation coefficient
can also be computed with the LinRegTTest
command.
8
Example 2
Use the formula to find the correlation coefficient.
The mean for the x-values is 2 and y-values is 5.
The standard deviation for both is 1.
9
Example 3
Calculate the linear correlation coefficient, R, for
the following set of data.
10
x y
2 5
8 7
5 6
3 4
6 8
Correlation Coefficient
One way to interpret R is in terms of focus. Values
close to +1 or –1 will have very clear pictures. Values
close to 0 will often look like a vague swarm of dots.
11
Unfocused FocusedFocused
No Correlation
However, a correlation coefficient close to zero can
mean other things, such as a nonlinear pattern.
It can't be emphasized enough that the scatter plot
must be analyzed first before making any conclusions
about correlation.
12
Example 4
Match the calculated correlations to the corresponding
scatterplot.
a) R = 0.49
b) R = – 0.48
c) R = – 0.03
d) R = – 0.85
13
Example 5
It appears no straight line would fit any of the
datasets represented in the graphs. Try drawing
nonlinear curves on each plot. Once you create a
curve for each, describe what is important in your
fit.
14
Possible Explanations for
Correlation
"Correlation does not imply causation" is a phrase
used in science and statistics to emphasize that a
correlation between two variables does not necessarily
imply that one causes the other.
For example, every person who learned math in the
17th century is dead. However, learning math does not
necessarily cause death!
Correlations can help us search for cause-and-effect
relationships. But causality is not the only possible
explanation for a correlation.
15
Possible Explanations for
Correlation
For example, children with bigger feet tend to read
better than children with smaller feet, but bigger feet
will not cause a child to be a better reader. In this
case, there is an underlying cause: children with
bigger feet also tend to be older and have been in
school longer.
For another example, the more firemen fighting a fire,
the bigger the fire is observed to be. However, more
firemen does not cause the fire to increase.
16
Possible Explanations for
Correlation
1) The correlation may be a coincidence.
2) Both variables might be directly influenced by
some common underlying cause.
3) One of the correlated variables may actually be
a cause of the other. Note that, even in this
case, we may have identified only one of
several causes.
17
Coincidence
A coincidence is a sequence of events that, although
accidental, seem to have been planned, arranged, or
correlated. Some examples include:
• As stock prices go up, skirt lengths get shorter.
• As Internet Explorer's market share decreased, so
has the U.S. murder rate.
• As the number of pirates in the world decreased,
global temperatures have increased.
18
Underlying Cause
An underlying cause (sometimes referred as a "lurking
variable" or "confounding factor") is an overlooked
variable that is actually causing a sequence of events
to seem to be correlated. Some examples include:
• Ice cream sales reflect the number of shark attacks
on swimmers.
• Sleeping with one's shoes on is strongly correlated
with waking up with a headache.
19
Causation
We say one event causes a second event when the
second event is a consequence of the first. Some
examples include:
• Flossing regularly reduces gingivitis.
• Outdoor temperature determines how fast a cricket
chirps.
• Flattering your statistics instructor will give you
better grades.
20
Example 6
Describe a possible explanation for each
correlation.
a) When I exercise regularly, I tend to lose weight.
b) Cities with more homicides also tend to have
more churches.
c) With a decrease in the wearing of hats, there
has been an increase in global warming over
the same period.
21
Coefficient of Determination
If we are convinced that the association we are
examining is linear, then the regression line provides
the best numerical summary of the relationship. But
how good is "best"?
A common way to explain the strength of a linear fit is
the coefficient of determination, R 2. Literally, it is
the correlation squared, and describes the amount of
variation in the response that is explained by the least
squares line.
22
Example 7
At a concert, the number of tattoos, X, and the number of piercings, Y, that a person had was recorded:
Find the regression line for the data. What is Rand R2?
23
X Y
2 5
8 7
5 6
3 4
6 8
Coefficient of Determination
Because R is always between –1 and 1, R-
squared is always between 0 and 1. Often, R-
squared is written as a percent.
A value of 100% means the relationship is
perfectly linear and the regression line perfectly
predicts the observations.
A value of 0% means there is no linear relationship
and the regression line does a very poor job.
24
Coefficient of Determination
In the previous example, the correlation between x
and y was R = 0.7947.
So the coefficient of determination is 0.79472 =
0.6316, which we report as 63.2%.
25
Coefficient of Determination
What does this value of 63.2% mean?
A useful interpretation of R-squared is that it
measures how much of the variation in the
response variable is explained by the explanatory
variable. For example, 63.2% of the variation in
the y-values can be explained by the x-values.
26
Example 8
If a linear model has a strong negative relationship
with a correlation of –0.87, how much of the
variation in the response is explained by the
explanatory variable?
27