+ All Categories
Home > Documents > BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

Date post: 29-Dec-2015
Category:
Upload: alexia-morton
View: 218 times
Download: 2 times
Share this document with a friend
26
BPS - 3rd Ed . Chapter 4 1 Chapter 4 Scatterplots and Correlation
Transcript
Page 1: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 1

Chapter 4

Scatterplots and Correlation

Page 2: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 2

Variable (X) and Variable (Y)

Prior chapters one variable at a time This chapter relationship between two

variables One variable is an “outcome”: response

variable (Y) The other variable is a “predictor”:

explanatory variable (X) Are X and Y related? X Y?

Page 3: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 3

Question

A study investigates whether the there is a relationship between gross domestic product and life expectancy:

Which is the explanatory variable (X)?

Which is the response variable (Y)?

All other variables that may influence life expectancy are “lurking” and may confound the relation between X and Y. Are there lurking variables in this analysis?

Page 4: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 4

This chapter considers the case in which both X and Y are quantitative variables

Bivariate data points (xi, yi) are plotted on graph paper to form a scatterplot

Scatterplot

Page 5: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 5

X = percent of students taking SAT

Y = mean SAT verbal score

What is the relationship between X and Y?

Example of a scatterplot

Page 6: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 6

Interpreting scatterplots Form

Can data be described by straight line? [Linearity] Direction

Does the line slope upward or downward Positive association = above-average values of Y

accompany above-average values of X (and vice versa) Negative association = above-average values of Y

accompany below-average values of X (and vice versa)

StrengthDo data point adhere to imaginary line?

Page 7: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 7

Form [discuss]

0

10

20

30

40

50

60

$0 $10 $20 $30 $40 $50 $60 $70

Income

Hea

th S

tatu

s M

easu

re

0

10

20

30

40

50

60

70

0 20 40 60 80 100

Age

Hea

th S

tatu

s M

easu

re0

2

4

6

8

10

12

14

16

18

0 20 40 60 80 100

Age

Ed

uca

tion

Lev

el

30

35

40

45

50

55

60

65

0 20 40 60 80

Physical Health Score

Men

tal H

ealt

h S

core

Page 8: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 8

Strength and direction

Direction: positive, negative or flat

Strength: How closely does a non-horizontal straight line fit the points of a scatterplot?Close fitting strong

Loose fitting weak

Page 9: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 9

Strength cannot be reliably judged visually

These two scatterplots are of the same data (they have the exact same correlation)

The second scatter plot looks like a stronger correlation, but this is an artifact of the axis scaling

Page 10: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 10

Correlation coefficient (r)

Let r denote the correlation coefficient r is always between -1 and +1, inclusive Sign of r denotes direction of association Special values for r :

r = +1 all points on upward sloping line r = -1 all points on downward sloping line r = 0 no line or horizontal line The closer r is to +1 or –1, the better the fit of

points to the line

Page 11: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 11

Examples of Correlations Husband’s versus Wife’s ages

r = .94 Husband’s versus Wife’s heights

r = .36 Professional Golfer’s Putting Success:

Distance of putt in feet versus percent success

r = -.94

Page 12: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 12

Correlation Coefficient r Data on variables X and Y for n

individuals:x1, x2, … , xn and y1, y2, … , yn

Each variable has a mean and std dev:2) ch. (see and )) yx

s ,y (s ,x (

n

1i y

i

x

i

s

yy

s

xx

1-n

1r

Page 13: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 13

Correlation coefficient r

y

iY

x

iX

s

yyz

s

xxz

The formula for r can be understood by converting data points to standardized scores:

n

1i1-n

1r YX zz where

Page 14: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 14

Illustrative example (gdp_life.sav)

Per Capita Gross Domestic Productand Average Life Expectancy for

Countries in Western Europe

Does GDP predict life expectancy?

Page 15: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 15

Illustrative example (gdp_life.sav)

Country Per Capita GDP (X) Life Expectancy (Y)

Austria 21.4 77.48

Belgium 23.2 77.53

Finland 20.0 77.32

France 22.7 78.63

Germany 20.8 77.17

Ireland 18.6 76.39

Italy 21.5 78.51

Netherlands 22.0 78.15

Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Page 16: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 16

Illustrative example (gdp_life.sav) Scatterplot

GDP

24232221201918

LIF

E_

EX

P

79.5

79.0

78.5

78.0

77.5

77.0

76.5

76.0

Page 17: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 17

Illustrative example (gdp_life.sav)x y

21.4 77.48 -0.078 -0.345 0.027

23.2 77.53 1.097 -0.282 -0.309

20.0 77.32 -0.992 -0.546 0.542

22.7 78.63 0.770 1.102 0.849

20.8 77.17 -0.470 -0.735 0.345

18.6 76.39 -1.906 -1.716 3.271

21.5 78.51 -0.013 0.951 -0.012

22.0 78.15 0.313 0.498 0.156

23.8 78.99 1.489 1.555 2.315

21.2 77.37 -0.209 -0.483 0.101

= 21.52 = 77.754sum = 7.285

sx =1.532 sy =0.795

yi /syy xi /sxx

x y

y

i

x

i

s

y-y

s

x-x

Page 18: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 18

Illustrative example (gdp_life.sav)

0.809

(7.285)110

1

n

1i y

i

x

i

s

yy

s

xx

1-n

1r

Page 19: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 19

Interpretation of r

Direction of association: positive or negative

Strength of association: the closer |r| is to 1, the stronger the correlation. Here are guidelines:

0.0 |r| < 0.3 weak correlation

0.3 |r| < 0.7 moderate correlation

0.7 |r| < 1.0 strong correlation

|r| = 1.0 perfect correlation

Page 20: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 20

Interpretation of r

For GDP / life expectancy example, r = 0.809. This indicates a strong positive correlation

GDP

24232221201918

LIF

E_

EX

P

79.5

79.0

78.5

78.0

77.5

77.0

76.5

76.0

Page 21: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 21

Problems with Correlations

Not all relations are linear Outliers can have large influence on r Lurking variables confound relations

Page 22: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 22

Not all Relationships are Linear Miles per Gallon versus Speed

r 0 (flat line) But there is a non-

linear relationy = - 0.013x + 26.9

r = - 0.06

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er

gall

on

Page 23: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 23

Not all Relationships are Linear Miles per Gallon versus Speed

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er g

allo

n Curved relationship.

r was misleading.

Page 24: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 24

Outliers and Correlation

The outlier in the above graph decreases r

If we remove the outlier strong relation

Page 25: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 25

Exercise 4.15: Calories and sodium content of hot dogs

(a) What are the lowest and highest calorie counts? …lowest and highest sodium levels?

(b) Positive or negative association?

(c) Any outliers? If we ignore outlier,is relation still linear? Does the correlation become stronger?

Page 26: BPS - 3rd Ed. Chapter 41 Scatterplots and Correlation.

BPS - 3rd Ed. Chapter 4 26

Exercise 4.13: IQ and school grades

(a) Positive or negative association?

(b) Is form linear? Does it appear strong?

(c) What is the IQ and GPA for the outlier on the bottom there?


Recommended