+ All Categories
Home > Documents > Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the...

Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the...

Date post: 17-Dec-2015
Category:
Upload: donald-leonard
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions -- Any model has predicted values and residuals. (Do we always want a model with small residuals ? ) -- The “regression effect” (Why did Galton call these things “regressions” ? ) -- Pitfalls: Outliers -- Pitfalls: Extrapolation -- Conditions for a good regression
Transcript
Page 1: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Correlations and scatterplots-- Optical illusion ?-- Finding the marginal distributions in the scatterplots

(shoe size vs. hours of TV)

Regressions-- Any model has predicted values and residuals.

(Do we always want a model with small residuals ? )-- The “regression effect”

(Why did Galton call these things “regressions” ? )-- Pitfalls: Outliers-- Pitfalls: Extrapolation-- Conditions for a good regression

Page 2: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Which looks like a stronger relationship?

-1

1

3

-0.43 0.32 1.07 1.82

X

Y

-6

-4

-2

0

2

4

6

8

-4 -3 -2 -1 0 1 2 3 4 5

X

Y

Page 3: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Mortality vs. Education

9

9.5

10

10.5

11

11.5

12

12.5

13

800 850 900 950 1000 1050 1100 1150

Education

Mortality vs. Education

9

9.5

10

10.5

11

11.5

12

12.5

13

800 850 900 950 1000 1050 1100 1150

Education

Optical Illusion ?

Page 4: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

-1

0

1

2

-1 0 1 2

X

Y

correlation = .97

-1

0

1

2

-2 0 2

X

Y

correlation = .71

Page 5: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Shoe size vs. hours of TV…

Page 6: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Linear models and non-linear models

Model A: Model B:

y = a + bx + error y = a x1/2 + error

Model B has smaller errors. Is it a better model?

Page 7: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

aa opas asl poasie ;aaslkf 4-9043578

y = 453209)_(*_n &*^(*LKH l;j;)(*&)(*& + error

This model has even smaller errors. In fact, zero errors.

Tradeoff: Small errors vs. complexity.

(We’ll only consider linear models.)

Page 8: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

The “Regression” Effect

A preschool program attempts to boost children’s reading scores.

Children are given a pre-test and a post-test.

Pre-test: mean score ≈ 100, SD ≈ 10Post-test: mean score ≈ 100, SD ≈ 10

The program seems to have no effect.

Page 9: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

A closer look at the data shows a surprising result:

Children who were below average on the pre-test tended to gain about 5-10 points on the post-test

Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.

Page 10: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

A closer look at the data shows a surprising result:

Children who were below average on the pre-test tended to gain about 5-10 points on the post-test

Children who were above average on the pre-test tended to lose about 5-10 points on the post-test.

Maybe we should provide the program only for children whose pre-test scores are below average?

Page 11: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Fact:In most test–retest and analogous situations, the

bottom group on the first test will on average tend to improve, while the top group on the first test will on average tend to do worse.

Other examples:• Students who score high on the midterm tend on

average to score high on the final, but not as high.

• An athlete who has a good rookie year tends to slump in his or her second year. (“Sophomore jinx”, "Sports Illustrated Jinx")

• Tall fathers tend to have sons who are tall, but not as tall. (Galton’s original example!)

Page 12: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

80

90

100

110

120

130

80 90 100 110 120

pre-test

post-test

Page 13: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

JPM (vertical axis) vs. DJI (horizontal axis) daily changes

-10.0000

-8.0000

-6.0000

-4.0000

-2.0000

0.0000

2.0000

4.0000

6.0000

8.0000

10.0000

-6 -4 -2 0 2 4 6

DJI

JPM

Page 14: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

JPM (vertical axis) vs. DJI (horizontal axis) daily changes

-10.0000

-8.0000

-6.0000

-4.0000

-2.0000

0.0000

2.0000

4.0000

6.0000

8.0000

10.0000

-6 -4 -2 0 2 4 6

DJI

JPM

Page 15: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

It works the other way, too:

• Students who score high on the final tend to have scored high on the midterm, but not as high.

• Tall sons tend to have fathers who are tall, but not as tall.

• Students who did well on the post-test showed improvements, on average, of 5-10 points, while students who did poorly on the post-test dropped an average of 5-10 points.

Page 16: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Students can do well on the pretest…-- because they are good readers, or-- because they get lucky.

The good readers, on average, do exactly as well on the post-test. The lucky group, on average, score lower.

Students can get unlucky, too, but fewer of that group are among the high-scorers on the pre-test.

So the top group on the pre-test, on average, tends to score a little lower on the post-test.

Page 17: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Outliers

http://www.whfreeman.com/scc/con_index.htm?99spt

(W. H. Freeman, publishers)

Page 18: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Extrapolation

Interpolation: Using a model to estimate Yfor an X value within the range on which the model was based.

Extrapolation: Estimating based on an X value outside the range.

Page 19: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Extrapolation

Interpolation: Using a model to estimate Yfor an X value within the range on which the model was based.

Extrapolation: Estimating based on an X value outside the range.

Interpolation Good, Extrapolation Bad.

Page 20: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Nixon’s Graph:Economic Growth

Page 21: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Nixon’s Graph:Economic Growth

Start ofNixon Adm.

Page 22: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Nixon’s Graph:Economic Growth

Now

Start ofNixon Adm.

Page 23: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Nixon’s Graph:Economic Growth

Now

Start ofNixon Adm. Projectio

n

Page 24: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Conditions for regression

“Straight enough” condition (linearity)

Errors are mostly independent of X

Errors are mostly independent of anything else you can think of

Errors are more-or-less normally distributed

Page 25: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

How to test the quality of a regression—

Plot the residuals.Pattern bad, no pattern good

R2

How sure are you of the coefficients ?

Page 26: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

6

8

10

4 6 8 10 12

x1

y1

4.5

6.0

7.5

9.0

4 6 8 10 12

x2

y2

6

8

10

12

4 6 8 10 12

x3

y3

6

8

10

12

10.0 12.5 15.0 17.5

x4

y4

Page 27: Correlations and scatterplots -- Optical illusion ? -- Finding the marginal distributions in the scatterplots (shoe size vs. hours of TV) Regressions --

Computing correlation…

1. Replace each variable with its standardized version.

2. Take an “average” of ( xi’ times yi’ ):

' ( ) /

' ( ) /i i x

i i y

x x x s

y y y s

' '

1i ix y

rn


Recommended