+ All Categories
Home > Documents > Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 141 Describing Relationships: Scatterplots and Correlation.

Date post: 22-Dec-2015
Category:
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
26
Chapter 14 1 Chapter 14 Describing Relationships: Scatterplots and Correlation
Transcript
Page 1: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 14 1

Chapter 14

Describing Relationships: Scatterplots and Correlation

Page 2: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 2

CorrelationObjective: Analyze a collection of paired data (sometimes called bivariate data).

A correlation exists between two variables when there is a relationship (or an association) between them.

We will consider only linear relationships.

- when graphed, the points approximate a

straight-line pattern.

Page 3: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 3

ScatterplotA scatterplot is a graph in which paired (x, y) data (usually collected on the same individuals) are plotted with one variable represented on a horizontal (x -) axis and the other variable represented on a vertical (y-) axis. Each individual pair (x, y) is plotted as a single point.

Example:

Page 4: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 4

Examining a ScatterplotYou can describe the overall pattern of a scatterplot by the

Form – linear or non-linear ( quadratic, exponential, no correlation etc.)

Direction – negative, positive.

Strength – strong, very strong, moderately strong, weak etc.

Look for outliers and how they affect the correlation.

Page 5: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 5

Scatterplot

x 1 2 3 4 5

y -4 -2 1 0 2

x

2 4

–2

– 4

y

2

6

Example: Draw a scatter plot for the data below. What is the nature of the relationship between X and Y.

Strong, positive and linear.

Page 6: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 6

Examining a Scatterplot

Two variables are positively correlated when high values of the variables tend to occur together and low values of the variables tend to occur together. The scatterplot slopes upwards from left to right.

Two variables are negatively correlated when high values of one of the variables tend to occur with low values of the other and vice versa. The scatterplot slopes downwards from left to right.

Page 7: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 7

Types of Correlation

x

y

Negative Linear Correlation

x

y

No Correlation

x

y

Positive Linear Correlation

x

y

Non-linear Correlation

As x increases, y tends to decrease.

As x increases, y tends to increase.

Page 8: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 8

Examples of Relationships

0

10

20

30

40

50

60

$0 $10 $20 $30 $40 $50 $60 $70

Income

Hea

th S

tatu

s M

easu

re

0

10

20

30

40

50

60

70

0 20 40 60 80 100

Age

Hea

th S

tatu

s M

easu

re0

2

4

6

8

10

12

14

16

18

0 20 40 60 80 100

Age

Ed

uca

tion

Lev

el

30

35

40

45

50

55

60

65

0 20 40 60 80

Physical Health Score

Men

tal H

ealt

h S

core

Page 9: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 9

Thought Question 1What type of association would the following pairs of variables have – positive, negative, or none?

1. Temperature during the summer and electricity bills

2. Temperature during the winter and heating costs

3. Number of years of education and height

4. Frequency of brushing and number of cavities

5. Number of churches and number of bars in cities

6. Height of husband and height of wife

Page 10: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 10

Thought Question 2

Consider the two scatterplots below. How does the outlier impact the correlation for each plot?

– does the outlier increase the correlation, decrease the correlation, or have no impact?

Page 11: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 11

Measuring Strength & Directionof a Linear Relationship

How closely does a non-horizontal straight line fit the points of a scatterplot?

The correlation coefficient (often referred to as just correlation): r– measure of the strength of the relationship: the

stronger the relationship, the larger the magnitude of r.

– measure of the direction of the relationship: positive r indicates a positive relationship, negative r indicates a negative relationship.

Page 12: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 12

Correlation Coefficient

yx s

yy

s

xx

nr

1

1

yx s

yy

s

xx

nr

1

1

yx s

yy

s

xx

nr

1

1

Greek Capital Letter Sigma – denotes summation or addition.

yxyx

zzns

yy

s

xx

nr

1

1

1

1

Page 13: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 13

Correlation Coefficient

The range of the correlation coefficient is -1 to 1.

-1 0 1

If r = -1 there is a perfect negative

correlation

If r = 1 there is a perfect positive

correlation

If r is close to 0 there is no linear

correlation

Page 14: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 14

Linear Correlation

Strong negative correlation

Weak positive correlation

Strong positive correlation

Non-linear Correlation

x

y

x

y

x

y

x

y

r = 0.91 r = 0.88

r = 0.42 r = 0.07

Try

Page 15: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 15

Correlation Coefficient

special values for r : a perfect positive linear relationship would have r = +1 a perfect negative linear relationship would have r = -1 if there is no linear relationship, or if the scatterplot

points are best fit by a horizontal line, then r = 0 Note: r must be between -1 and +1, inclusive

r > 0: as one variable changes, the other variable tends to change in the same direction

r < 0: as one variable changes, the other variable tends to change in the opposite direction

Page 16: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 16

Examples of Correlations Husband’s versus Wife’s ages

r = .94 Husband’s versus Wife’s heights

r = .36 Professional Golfer’s Putting Success:

Distance of putt in feet versus percent success

r = -.94Plot

Page 17: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 17

Correlation Coefficient Because r uses the z-scores for the observations, it does not change

when we change the units of measurements of x , y or both.

Correlation ignores the distinction between explanatory and response variables.

r measures the strength of only linear association between variables.

A large value of r does not necessarily mean that there is a strong linear relationship between the variables – the relationship might not be linear; always look at the scatterplot.

When r is close to 0, it does not mean that there is no relationship between the variables, it means there is no linear relationship.

Outliers can inflate or deflate correlations. Try

Page 18: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 18

Not all Relationships are LinearMiles per Gallon versus Speed

Curved relationship(r is misleading)

Speed chosen for each subject varies from 20 mph to 60 mph

MPG varies from trial to trial, even at the same speed

Statistical relationship

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er g

allo

n

r=-0.06

Page 19: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 19

Common Errors Involving Correlation

1. Causation: It is wrong to conclude that correlation implies causality.

2. Averages: Averages suppress individual variation and may inflate the correlation coefficient.

3. Linearity: There may be some relationship between x and y even when there is no linear correlation.

Page 20: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 20

Correlation and Causation

The fact that two variables are strongly correlated does not in itself imply a cause-and-effect relationship between the variables.

If there is a significant correlation between two variables, you should consider the following possibilities.

1. Is there a direct cause-and-effect relationship between the variables? Does x cause y?

Page 21: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 21

Correlation and Causation

2. Is there a reverse cause-and-effect relationship between the variables?• Does y cause x?

3. Is it possible that the relationship between the variables can be caused by a third variable or by a combination of several other variables?

4. Is it possible that the relationship between two variables may be a coincidence?

Page 22: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 22

ExampleA survey of the world’s nations in 2004 shows a strongpositive correlation between percentage of countriesusing cell phones and life expectancy in years at birth.

a) Does this mean that cell phones are good for your health?

No. It simply means that in countries where cell phone use is high, the life expectancy tends to be high as well.

b) What might explain the strong correlation?The economy could be a lurking variable. Richer countries generally have more cell phone use and better health care.

Page 23: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 23

ExampleThe correlation between Age and Income as measured on 100

people is r = 0.75. Explain whether or not each of these

conclusions is justified.

a) When Age increases, Income increases as well.

b) The form of the relationship between Age and Income is linear.

c) There are no outliers in the scatterplot of Income vs. Age.

d) Whether we measure Age in years or months, the correlation will still be 0.75.

Page 24: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 24

ExampleExplain the mistakes in the statements below:

a) “My correlation of -0.772 between GDP and Infant Mortality Rate shows that there is almost no association between GDP and Infant Mortality Rate”.

b) “There was a correlation of 0.44 between GDP and Continent”

c) “There was a very strong correlation of 1.22 between Life Expectancy and GDP”.

Page 25: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 25

Warnings aboutStatistical Significance

“Statistical significance” does not imply the relationship is strong enough to be considered “practically important.”

Even weak relationships may be labeled statistically significant if the sample size is very large.

Even very strong relationships may not be labeled statistically significant if the sample size is very small.

Page 26: Chapter 141 Describing Relationships: Scatterplots and Correlation.

Chapter 13 26

Key Concepts Strength of Linear Relationship

Direction of Linear Relationship

Correlation Coefficient

Problems with Correlations

r can only be calculated for quantitative data.


Recommended