+ All Categories
Home > Documents > Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 41 Describing Relationships: Scatterplots and Correlation.

Date post: 19-Dec-2015
Category:
View: 230 times
Download: 4 times
Share this document with a friend
Popular Tags:
32
Chapter 4 1 Chapter 4 Describing Relationships: Scatterplots and Correlation
Transcript
Page 1: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 1

Chapter 4

Describing Relationships: Scatterplots and Correlation

Page 2: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Objectives (BPS chapter 4)

Relationships: Scatterplots and correlation

Explanatory and response variables

Displaying relationships: scatterplots

Interpreting scatterplots

Adding categorical variables to scatterplots

Measuring linear association (correlation)

Facts about correlation

Chapter 4 2

Page 3: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 3

ScatterplotA scatterplot is a graph in which paired (x, y) data (usually collected on the same individuals) are plotted with one variable represented on a horizontal (x -) axis and the other variable represented on a vertical (y-) axis. Each individual pair (x, y) is plotted as a single point.

Example:

Page 4: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Student Number of Beers

Blood Alcohol Level

1 5 0.1

2 2 0.03

3 9 0.19

6 7 0.095

7 3 0.07

9 3 0.02

11 4 0.07

13 5 0.085

4 8 0.12

5 3 0.04

8 5 0.06

10 5 0.05

12 6 0.1

14 7 0.09

15 1 0.01

16 4 0.05

Here we have two quantitative variables

for each of 16 students.

1. How many beers they drank,

and

2. Their blood alcohol level (BAC)

We are interested in the relationship

between the two variables: How is one

affected by changes in the other one?

Page 5: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Student Beers BAC

1 5 0.1

2 2 0.03

3 9 0.19

6 7 0.095

7 3 0.07

9 3 0.02

11 4 0.07

13 5 0.085

4 8 0.12

5 3 0.04

8 5 0.06

10 5 0.05

12 6 0.1

14 7 0.09

15 1 0.01

16 4 0.05

ScatterplotsIn a scatterplot one axis is used to represent each of the variables,

and the data are plotted as points on the graph.

Page 6: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Explanatory (independent) variable: number of beers

Response

(dependent)

variable:

blood alcohol

contentx

y

Explanatory and response variables

A response variable measures or records an outcome of a study. An

explanatory variable explains changes in the response variable.

Typically, the explanatory or independent variable is plotted on the x

axis and the response or dependent variable is plotted on the y axis.

Page 7: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Some plots don’t have clear explanatory and response variables.

Do calories explain

sodium amounts?

Does percent return on Treasury bills

explain percent return on common stocks?

Page 8: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 8

Examining a ScatterplotYou can describe the overall pattern of a scatterplot by the

Form – linear or non-linear ( quadratic, exponential, no correlation etc.)

Direction – negative, positive.

Strength – strong, very strong, moderately strong, weak etc.

Look for outliers and how they affect the correlation.

Page 9: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 9

Scatterplot

x 1 2 3 4 5

y -4 -2 1 0 2

x

2 4

–2

– 4

y

2

6

Example: Draw a scatter plot for the data below. What is the nature of the relationship between X and Y.

Strong, positive and linear.

Page 10: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 10

Examining a Scatterplot

Two variables are positively correlated when high values of the variables tend to occur together and low values of the variables tend to occur together. The scatterplot slopes upwards from left to right.

Two variables are negatively correlated when high values of one of the variables tend to occur with low values of the other and vice versa. The scatterplot slopes downwards from left to right.

Page 11: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 11

Types of Correlation

x

y

Negative Linear Correlation

x

y

No Correlation

x

y

Positive Linear Correlation

x

y

Non-linear Correlation

As x increases, y tends to decrease.

As x increases, y tends to increase.

Page 12: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 13 12

Examples of Relationships

0

10

20

30

40

50

60

$0 $10 $20 $30 $40 $50 $60 $70

Income

Hea

lth

Sta

tus

Mea

sure

0

10

20

30

40

50

60

70

0 20 40 60 80 100

Age

Hea

lth

Stat

us M

easu

re0

2

4

6

8

10

12

14

16

18

0 20 40 60 80 100

Age

Ed

uca

tion

Lev

el

30

35

40

45

50

55

60

65

0 20 40 60 80

Physical Health Score

Men

tal H

ealt

h S

core

Page 13: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Caution: Relationships require that both variables be quantitative (thus the order of the data points is

defined entirely by their value).

Correspondingly, relationships between categorical data are meaningless.

Example: Beetles trapped on boards of different colors

What association? What relationship?

Blue White Green Yellow Board color

Blue Green White Yellow Board color

Describe one category at a time.

?

Page 14: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 14

Thought Question 1What type of association would the following pairs of variables have – positive, negative, or none?

1. Temperature during the summer and electricity bills

2. Temperature during the winter and heating costs3. Number of years of education and height (Elementary School)

4. Frequency of brushing and number of cavities

5. Number of churches and number of bars in cities

6. Height of husband and height of wife

Page 15: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 15

Thought Question 2

Consider the two scatterplots below. How does the outlier impact the correlation for each plot?

– does the outlier increase the correlation, decrease the correlation, or have no impact?

Page 16: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Strength of the associationThe strength of the relationship between the two variables can be seen

by how much variation, or scatter, there is around the main form.

With a strong relationship, you can get a pretty good estimate

of y if you know x.

With a weak relationship, for any x you might get a wide

range of y values.

Page 17: Chapter 41 Describing Relationships: Scatterplots and Correlation.

How to scale a scatterplot

Using an inappropriate scale for a scatterplot can give an incorrect impression.

Both variables should be given a similar amount of space:

• Plot roughly square• Points should occupy all the plot space (no blank space)

Same data in all four plots

Page 18: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Adding categorical variables to scatterplots

Often, things are not simple and one-dimensional. We need to group

the data into categories to reveal trends.

What may look like a positive

linear relationship is in fact a

series of negative linear

associations.

Plotting different habitats in

different colors allowed us to

make that important distinction.

Page 19: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Comparison of men’s and

women’s racing records over

time.

Each group shows a very

strong negative linear

relationship that would not be

apparent without the gender

categorization.

Relationship between lean body mass

and metabolic rate in men and women.

While both men and women follow the

same positive linear trend, women show

a stronger association. As a group, males

typically have larger values for both

variables.

Page 20: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 20

Measuring Strength & Directionof a Linear Relationship

How closely does a non-horizontal straight line fit the points of a scatterplot?

The correlation coefficient (often referred to as just correlation): r

– measure of the strength of the relationship: the stronger the relationship, the larger the magnitude of r.

– measure of the direction of the relationship: positive r indicates a positive relationship, negative r indicates a negative relationship.

Page 21: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 21

Correlation Coefficient

Greek Capital Letter Sigma – denotes summation or addition.

1

1

1

1

x y

x y

x x y yrn s s

x x y yn s s

Page 22: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Example: Find the correlation between X and Y

Chapter 4 22

x 1 2 3 4 5

y -4 -2 1 0 2

x y

1 -2 -4 -3.4 6.8

2 -1 -2 -1.4 1.4

3 0 1 1.6 0

4 1 0 0.6 0.6

5 2 2 2.6 5.2

3, 0.6x y

1.58, 2.41x ys s

140.9192

4 1.58 2.41r

x x y y x x y y

Page 23: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 23

Correlation Coefficient

The range of the correlation coefficient is -1 to 1.

-1 0 1

If r = -1 there is a perfect negative

correlation

If r = 1 there is a perfect positive

correlation

If r is close to 0 there is no linear

correlation

Page 24: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 24

Linear Correlation

Strong negative correlation

Weak positive correlation

Strong positive correlation

Non-linear Correlation

x

y

x

y

x

y

x

y

r = 0.91 r = 0.88

r = 0.42 r = 0.07

Try

Page 25: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 25

Correlation Coefficient

special values for r : a perfect positive linear relationship would have r = +1 a perfect negative linear relationship would have r = -1 if there is no linear relationship, or if the scatterplot

points are best fit by a horizontal line, then r = 0 Note: r must be between -1 and +1, inclusive

r > 0: as one variable changes, the other variable tends to change in the same direction

r < 0: as one variable changes, the other variable tends to change in the opposite direction

Page 26: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 26

Correlation Coefficient Because r uses the z-scores for the observations, it does not change

when we change the units of measurements of x , y or both.

Correlation ignores the distinction between explanatory and response variables.

r measures the strength of only linear association between variables.

A large value of r does not necessarily mean that there is a strong linear relationship between the variables – the relationship might not be linear; always look at the scatterplot.

When r is close to 0, it does not mean that there is no relationship between the variables, it means there is no linear relationship.

Outliers can inflate or deflate correlations. Try

Page 27: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 27

Not all Relationships are LinearMiles per Gallon versus Speed

Curved relationship(r is misleading)

Speed chosen for each subject varies from 20 mph to 60 mph

MPG varies from trial to trial, even at the same speed

Statistical relationship

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er g

allo

n

r=-0.06

Page 28: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 28

Common Errors Involving Correlation

1. Causation: It is wrong to conclude that correlation implies causality.

2. Averages: Averages suppress individual variation and may inflate the correlation coefficient.

3. Linearity: There may be some relationship between x and y even when there is no linear correlation.

Page 29: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 29

ExampleA survey of the world’s nations in 2004 shows a strongpositive correlation between percentage of countriesusing cell phones and life expectancy in years at birth.

a) Does this mean that cell phones are good for your health?

No. It simply means that in countries where cell phone use is high, the life expectancy tends to be high as well.

b) What might explain the strong correlation?The economy could be a lurking variable. Richer countries generally have more cell phone use and better health care.

Page 30: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 30

ExampleThe correlation between Age and Income as measured on 100

people is r = 0.75. Explain whether or not each of these

conclusions is justified.

a) When Age increases, Income increases as well.

b) The form of the relationship between Age and Income is linear.

c) There are no outliers in the scatterplot of Income vs. Age.

d) Whether we measure Age in years or months, the correlation will still be 0.75.

Page 31: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 31

ExampleExplain the mistakes in the statements below:

a) “My correlation of -0.772 between GDP and Infant Mortality Rate shows that there is almost no association between GDP and Infant Mortality Rate”.

b) “There was a correlation of 0.44 between GDP and Continent”

c) “There was a very strong correlation of 1.22 between Life Expectancy and GDP”.

Page 32: Chapter 41 Describing Relationships: Scatterplots and Correlation.

Chapter 4 32

Key Concepts Strength of Linear Relationship

Direction of Linear Relationship

Correlation Coefficient

Common Problems with Correlations

r can only be calculated for quantitative data.


Recommended