+ All Categories
Home > Documents > 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen...

1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen...

Date post: 13-Dec-2015
Category:
Upload: tyrone-poole
View: 217 times
Download: 3 times
Share this document with a friend
Popular Tags:
34
1 Regression & Correlation (1) 1. A relationship between 2 variables X and Y 2. The relationship seen as a straight line 3. Two problems 4. How can we tell if our regression line is useful? 5. Test of hypothesis about the slope, β 1 6. Correlation 7. Useful features of r 8. Test of hypothesis about ρ 9. Examples
Transcript
Page 1: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

1

Regression & Correlation (1)

1. A relationship between 2 variables X and Y2. The relationship seen as a straight line3. Two problems4. How can we tell if our regression line is useful?

5. Test of hypothesis about the slope, β1

6. Correlation7. Useful features of r8. Test of hypothesis about ρ9. Examples

Page 2: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

2

A relationship between two variables X & Y

We often have pairs of scores for a given set of cases. For example, we might have:

* # of years of education and annual income, or* IQ and GPA* income and # of books in the household

More generally, we have any X and Y, and our question is, does knowing something about X tell us anything about Y?

Page 3: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

3

A relationship between two variables X & Y

Does knowing something about X tell us anything about Y?

For example, knowing how many years of education a person has, could you usefully estimate their annual income, or the number of cigarettes they smoke in a year?

Page 4: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

4

A relationship between two variables X & Y

Often, the answer to that question is, Yes – there is a relationship between the X and Y scores you have measured.

* On average, as number of years of education goes up (across a set of people), number of cigarettes smoked per year goes down.

Page 5: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

5

A relationship between two variables X & Y

In the graph on the next slide, we see two things:

1. X goes down as Y goes up.

2. At each value of X, there is some variability in Y – but substantially less than there is in Y overall.

Page 6: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

6

X = Years of education

Y =

Cig

are

ttes

per

yea

rNote that the range of the Y values for this value of X is small, compared to the whole range of Y in the data set.

Page 7: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

7

The relationship seen as a straight line

The relationship between an X and a Y can be described using the equation for a straight line.

Y = β0 + β1X + ε

Y-intercept Slope Error

Note: this is the (theoretical) population equation relating Y to X

Page 8: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

8

Two problems

Y = β0 + β1X + ε

In principle, this equation would let us predict the value of Y for a given X without error IF

A. X were the only variable that influenced Y* Usually, it isn’t

B. We knew the population values of β0 + β1

* Usually, we don’t

Page 9: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

9

Two problems

Be sure to distinguish between

A. Actual values of Y in the population.B. Values of Y we would predict using

Y = β0 + β1X + ε

if we had the population values for β0 + β1.C. Values of Y we predict on the basis of the X-Y

relationship in our sample data:

Y = β0 + β1X^ ^

Why no ε here?

Page 10: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

10

Two problems

When we predict Y on the basis of X for a given case, two things can cause the predicted values to be different from the values we would find if we actually measured Y for that case:

1. We don’t know the population values of β0 and β1 – only the sample values β0 and β1.

Note that if we did know β0 & β1, this source of error would disappear.

^ ^

Page 11: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

11

Two problems

2. In the population, Y is not uniquely determined by X. As a result, for each value of X, there is a distribution of Y values.* relative to our predicted Y for a given value of X, the observed values of Y will sometimes be higher and sometimes be lower.* these “errors” are random – over the long term, they will cancel each other out* but even if we knew β0 and β1, this source of error would still exist.

Page 12: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

12

Two problems

In other words

1. We don’t have population values for the slope and the intercept of the line relating X to Y. That’s one problem.

2. Even if we had population values for the slope and the intercept, the equation relating X to Y would still not perfectly predict Y. That’s the other problem.

Page 13: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

13

How can we tell if our regression line is useful?

The line is useful if the predicted values of Y are close to the observed values of Y (in the sample).We use our sample X and Y values to compute the regression line, Y = β0 + β1X.

We then use this line to predict the same Y values, and compare our predicted values with the observed values in the sample data. If the prediction is good, we can then use the regression line to predict Y for values of X not in our sample.

^ ^

Page 14: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

14

How can we tell if our regression line is useful?

(Yi – Yi) = Yi – (β0 + β1Xi) (since Yi = β0 + β1Xi)

Therefore, the sum of the squared deviations of predicted Y values from actual Y values is:

SSE = Σ[Yi – (β0 + β1Xi)]2

Now β0 and β1 are the “least squares estimators” of β0 + β1 – giving smaller SSE than any other values of β0 and β1 would.

^ ^ ^ ^ ^ ^

^ ^

^ ^

^ ^

Page 15: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

15

X

Y When there is no relation between X and Y, the best estimator of the Y value for any case is the mean, Y.

Notice that the slope of this line is zero!

Page 16: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

16

How can we tell if our regression line is useful?

If X is completely unrelated to Y, the best estimate we could make of Y would be the mean, Y, for any value of X.

We find out whether our regression line is useful by asking whether its slope is different from 0.

H0: β1 = 0[Why not β1?]

^

Page 17: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

17

How can we tell if our regression line is useful?

To test that null hypothesis, we use the fact that β1 is one slope taken from the sampling distribution of β1.

β1 = SSXY β0 = Y - β1X

SSXX

Where SSXY = Σ(Xi – X) (Yi –Y) = ΣXiYi – ΣXi ΣYi

n

^

^

^ ^ ^

Page 18: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

18

How can we tell if our regression line is useful?

SSXX = Σ(Xi – X)2 = ΣX2 – (ΣX)2

n

(n = sample size)

For the sampling distribution of β1:

The mean = β1 β1 =

√SSXX

^

^

Page 19: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

19

How can we tell if our regression line is useful?

We estimate β1 by sβ1

= s

√SSXX

Where s = SSE

n-2

^ ^

Page 20: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

20

Test of hypothesis about the slope, β1

Since is unknown, we use t to test H0:

H0: β1 = 0 H0: β1 = 0

HA: β1 < 0 HA: β1 ≠ 0

or β1 > 0

Test statistic: t = β1 – 0

Sβ1

^

^

Page 21: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

21

Test of hypothesis about the slope, β1

Rejection region:

tobt < t │tobt│ > t/2

tobt > t

tcrit is based on n-2 degrees of freedom.

Page 22: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

22

Correlation

The Pearson Correlation coefficient r is a numerical, descriptive measure of the strength and direction of relationship between two variables X and Y.

r = SSXY

SSXXSSYY

r gives much the same information as β1. However r is “scale-less” and (-1 ≤ r ≤1)

√^

Page 23: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

23

Useful features of r

r indexes the X-Y relationship:

r > 0 means Y increases as X increases

r < 0 means Y decreases as X increases

r = 0 means there is no relationship between X & Y

r is the sample correlation coefficient. We can use it to estimate rho (ρ), the population correlation coefficient, and use r to test H0: ρ = 0

Page 24: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

24

Test of hypothesis about ρ

H0: ρ = 0 H0: ρ = 0

HA: ρ < 0 HA: ρ ≠ 0 or ρ > 0

Test statistic: t = r – ρ 1 – r2

n – 2

tcrit has n-2 degrees of freedom.

Page 25: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

25

Example 1

H0: ρ = 0

HA: ρ ≠ 0

Test statistic: t = r – ρ 1 – r2

n – 2

tcrit = t(5, α/2 = .025) = 2.571.

Page 26: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

26

Example 1 – Sum formulas

First, calculations involving X:

ΣX = 74 (ΣX)2 = 5476 ΣX2 = 922

Then, analogous calculations involving Y:

ΣY = 82 (ΣY)2 = 6724 ΣY2 = 1076

Then, calculations involving X and Y:

ΣXY = 976

Page 27: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

27

Example 1 – Sums of squares formulas

SSXY = Σ(Xi – X) (Yi –Y) = ΣXiYi – ΣXi ΣYi

n

SSXX = Σ(Xi – X)2 = ΣX2 – (ΣX)2

n

SSYY = Σ(Yi – Y)2 = ΣY2 – (ΣY)2

n

Page 28: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

28

Example 1 – calculate r

SSXY = 109.143

SSXX = 139.71

SSYY = 115.429

r = SSXY r = .859

SSXXSSYY√

Page 29: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

29

Example 1 – do t-test

t = r – ρ 1 – r2

n – 2

t = .859 - 0 = .859 = 3.751 1 - .738.229

5

Reject H0: A significant correlation exists.

Page 30: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

30

Example 2

H0: ρ = 0

HA: ρ > 0

Test statistic: t = r – ρ

1 – r2

n – 2

tcrit = t(7-2 = 5, α = .05) = 2.015

Note – these are the Greek letter rho, NOT the English letter P

Page 31: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

31

Example 2 – Sum formulas

First, calculations involving X:

ΣX = 4.2 (ΣX)2 = 17.64 ΣX2 = 2.86

Then, analogous calculations involving Y:

ΣY = 32 (ΣY)2 = 1024 ΣY2 = 161.5

Then, calculations involving X and Y:

ΣXY = 21.35

Page 32: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

32

Example 2 – calculate r

SSXY = 21.35 – (4.2)(32) = 2.15

7

SSXX = 2.86 – 17.64 = .34

7

Page 33: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

33

Example 2 – calculate r

SSYY = 161.5 – 1024 = 15.2143

7

r = SSXY

SSXXSSYY

r = .945

Page 34: 1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

34

Example 2 – do t-test

t = r – ρ 1 – r2

n – 2

t = .945 - 0 = .945 = 6.48 1 - .893.146

5

Reject H0: A significant correlation exists.


Recommended