MATH& 146 Lesson 36 - Amazon S3• As the number of pirates in the world decreased, global...

MATH& 146

Lesson 36

Sections 5.2

Correlation

1

Correlation

If the ordered pairs (x,y) tend to follow a straight-

line path, there is linear correlation, or

correlation for short.

2

Positive Correlation

A positive correlation is a relationship between two

variables where if one variable increases, the other

one also increases.

• The more time you spend running on a treadmill, the

more calories you will tend to burn.

• Taller people tend to have larger shoe sizes.

• The longer your hair grows, the more shampoo you

will tend to need.

3

Negative Correlation

A negative correlation means that there is an inverse

relationship between two variables - when one variable

decreases, the other increases.

• A student who has many absences tends to have

lower grades.

• As weather gets colder, air conditioning costs tend

to decrease.

• The older a man gets, the less hair that he tends to

have.

4

Correlation Coefficient

If you suspect a correlation exists between two

variables, it can be measured with the correlation

coefficient.

The correlation coefficient, r or R, is a measure

of the form, strength, and direction of a scatter

plot, all in one number between –1 and +1.

5

Example 1

Using the 8 graphs below, describe the relationship

between a scatterplot and its correlation coefficient.

6

The Correlation Coefficient

Calculating R involves multiplying the z-scores for

the x- and y-coordinates of each ordered pair,

adding up the products, and dividing by n – 1.

1

x yz zR

n

7

The Correlation Coefficient

Fortunately, while we can compute the correlation

using the formula, we will usually perform the

calculations on a computer or calculator.

For graphing calculators, the correlation coefficient

can also be computed with the LinRegTTest

command.

8

Example 2

Use the formula to find the correlation coefficient.

The mean for the x-values is 2 and y-values is 5.

The standard deviation for both is 1.

9

Example 3

Calculate the linear correlation coefficient, R, for

the following set of data.

10

x y

2 5

8 7

5 6

3 4

6 8

Correlation Coefficient

One way to interpret R is in terms of focus. Values

close to +1 or –1 will have very clear pictures. Values

close to 0 will often look like a vague swarm of dots.

11

Unfocused FocusedFocused

No Correlation

However, a correlation coefficient close to zero can

mean other things, such as a nonlinear pattern.

It can't be emphasized enough that the scatter plot

must be analyzed first before making any conclusions

about correlation.

12

Example 4

Match the calculated correlations to the corresponding

scatterplot.

a) R = 0.49

b) R = – 0.48

c) R = – 0.03

d) R = – 0.85

13

Example 5

It appears no straight line would fit any of the

datasets represented in the graphs. Try drawing

nonlinear curves on each plot. Once you create a

curve for each, describe what is important in your

fit.

14

Possible Explanations for

Correlation

"Correlation does not imply causation" is a phrase

used in science and statistics to emphasize that a

correlation between two variables does not necessarily

imply that one causes the other.

For example, every person who learned math in the

17th century is dead. However, learning math does not

necessarily cause death!

Correlations can help us search for cause-and-effect

relationships. But causality is not the only possible

explanation for a correlation.

15


Correlation

For example, children with bigger feet tend to read

better than children with smaller feet, but bigger feet

will not cause a child to be a better reader. In this

case, there is an underlying cause: children with

bigger feet also tend to be older and have been in

school longer.

For another example, the more firemen fighting a fire,

the bigger the fire is observed to be. However, more

firemen does not cause the fire to increase.

16


Correlation

1) The correlation may be a coincidence.

2) Both variables might be directly influenced by

some common underlying cause.

3) One of the correlated variables may actually be

a cause of the other. Note that, even in this

case, we may have identified only one of

several causes.

17

Coincidence

A coincidence is a sequence of events that, although

accidental, seem to have been planned, arranged, or

correlated. Some examples include:

• As stock prices go up, skirt lengths get shorter.

• As Internet Explorer's market share decreased, so

has the U.S. murder rate.

• As the number of pirates in the world decreased,

global temperatures have increased.

18

Underlying Cause

An underlying cause (sometimes referred as a "lurking

variable" or "confounding factor") is an overlooked

variable that is actually causing a sequence of events

to seem to be correlated. Some examples include:

• Ice cream sales reflect the number of shark attacks

on swimmers.

• Sleeping with one's shoes on is strongly correlated

with waking up with a headache.

19

Causation

We say one event causes a second event when the

second event is a consequence of the first. Some

examples include:

• Flossing regularly reduces gingivitis.

• Outdoor temperature determines how fast a cricket

chirps.

• Flattering your statistics instructor will give you

better grades.

20

Example 6

Describe a possible explanation for each

correlation.

a) When I exercise regularly, I tend to lose weight.

b) Cities with more homicides also tend to have

more churches.

c) With a decrease in the wearing of hats, there

has been an increase in global warming over

the same period.

21

Coefficient of Determination

If we are convinced that the association we are

examining is linear, then the regression line provides

the best numerical summary of the relationship. But

how good is "best"?

A common way to explain the strength of a linear fit is

the coefficient of determination, R 2. Literally, it is

the correlation squared, and describes the amount of

variation in the response that is explained by the least

squares line.

22

Example 7

At a concert, the number of tattoos, X, and the number of piercings, Y, that a person had was recorded:

Find the regression line for the data. What is Rand R2?

23

X Y

2 5

8 7

5 6

3 4

6 8


Because R is always between –1 and 1, R-

squared is always between 0 and 1. Often, R-

squared is written as a percent.

A value of 100% means the relationship is

perfectly linear and the regression line perfectly

predicts the observations.

A value of 0% means there is no linear relationship

and the regression line does a very poor job.

24


In the previous example, the correlation between x

and y was R = 0.7947.

So the coefficient of determination is 0.79472 =

0.6316, which we report as 63.2%.

25


What does this value of 63.2% mean?

A useful interpretation of R-squared is that it

measures how much of the variation in the

response variable is explained by the explanatory

variable. For example, 63.2% of the variation in

the y-values can be explained by the x-values.

26

Example 8

If a linear model has a strong negative relationship

with a correlation of –0.87, how much of the

variation in the response is explained by the

explanatory variable?

27

Date post:	27-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

MATH& 146 Lesson 36 - Amazon S3• As the number of pirates in the world decreased, global...

Documents