+ All Categories
Home > Documents > Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Date post: 20-Jan-2016
Category:
Upload: alberta-marshall
View: 219 times
Download: 0 times
Share this document with a friend
42
Objective Find the line of regression. Use the Line of Regression to Make Predictions.
Transcript
Page 1: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

ObjectiveFind the line of regression.

Use the Line of Regression to Make Predictions.

Page 2: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

RelevanceTo be able to find a model to best represent quantitative data with 2 variables and use it to make predictions.

Page 3: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

A better alternative to β€œstoring” numbers!

2-Variable Statistics

Page 4: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

2-Variable StatisticsNow that we have used one variable statistics to β€œstore” our necessary numbers, let’s learn another way that’s even better

Page 5: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Find the mean and standard deviation of the x’s and y’s using 2-var stats.

x y

21 6

18 9

30 3

35 4

Page 6: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Find the mean and standard deviation of the x’s and y’s using 2-var stats.

x y

21 6

18 9

30 3

35 4

Use this when using your lists to find r.

Page 7: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Find the correlation Coefficient:

x y

4 6      

8 15      

15 22      

19 18      

22 27      

π‘Ÿ=3.599887921

4=0.900

Page 8: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Find the correlation Coefficient:

x y32 27      40 82      30 34      18 14      15 1      25 22      

π‘Ÿ=4.558674571

5=0.912

Page 9: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Find the correlation Coefficient:

x y2 72      8 60      10 64      14 52      28 43      32 40      18 32      

π‘Ÿ=βˆ’4.868894211

6=βˆ’0.811

Page 10: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman dates. Draw & discuss the scatterplot and calculate the correlation coefficient.

Women(x)

Men(y)

66 72

64 68

66 70

65 68

70 71

65 65

Page 11: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman dates. Draw & discuss the scatterplot and calculate the correlation coefficient.

Women

(x)

Men(y)

66 72 0 1.1859 0

64 68 -0.9535

-0.3953

0.3769

66 70 0 0.3953 0

65 68 -0.4767

-0.3953

0.1884

70 71 1.9069 0.7906 1.5076

65 65 -0.4767

-1.581 0.7538

π‘Ÿ=2.826668855

50.565

Page 12: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Linear Regression

Page 13: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Guess the correlation coefficient

http://istics.net/stat/Correlations/

Page 14: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Can we make a Line of Best Fit

Want:1)The distances

to the line to be

the same.2) The smallest

distances.

Page 15: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Regression LineWhen a scatterplot shows a linear relationship, we’d like to

summarize the overall pattern by drawing a line on the scatterplot.

A regression line summarizes the relationship between two variables, but only in a specific setting: when one of the variables helps explain or predict the other.

Regression – unlike scatter plots – REQUIRES that we have an explanatory variable and a response variable.

Page 16: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Regression LineThis is a line that describes how a response variable (y) changes as an explanatory variable (x) changes.

It’s used to predict the value of (y) for a given value of (x).

The regression line is a model for the data.

Page 17: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Let’s try some!http://illuminations.nctm.org/ActivityDetail.aspx?ID=146

Page 18: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Regression LineWhen given the

response variable (y) and the explanatory variable (x), the regression line relating y to x has equation of the following form:

Predicted Value: ( – The predicted value of y for a given value of x.

y-intercept: () - the predicted value of

the y when x is 0.

Slope: () – the amount by which y is predicted to change when x increases by 1 unit.

οΏ½Μ‚οΏ½=π’ƒπŸŽ+π’ƒπŸ 𝒙

Page 19: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

The following data shows the number of miles driven and advertised price for 11 used Honda CR-Vs from the 2002-2006 model years (prices found at www.carmax.com). The scatterplot below shows a strong, negative linear association between number of miles and advertised cost. The correlation is -0.874. The line on the plot is the regression line for predicting advertised price based on number of miles.

ThousandMiles

Driven

Cost(dollars)

22 1799829 1645035 1499839 1399845 1459949 1498855 1359956 1459969 1199870 1445086 10998

10

12

14

16

18

ThousandMilesDriven20 30 40 50 60 70 80 90

Cost = 1.88e+04 - 86.2ThousandMilesDriven

Page 20: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Use the regression line to answer the following.

Slope y-interceptThe predicted price of the car decreases by $86.18 for every additional thousand miles driven.

The predicted cost ($18,773) of a used Honda 2002 to 2006 CR-V with 0 miles.

πΆπ‘œπ‘ π‘‘=18773βˆ’86 .18(π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘šπ‘–π‘™π‘’π‘ )

Page 21: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Predict the price for a Honda with 50,000 miles. (Use 50 in equation!)

π‘π‘œπ‘ π‘‘=18773βˆ’86.18 (π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘šπ‘–π‘™π‘’π‘ )

$14, 464.

Page 22: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

ExtrapolationThis refers to using a

regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line.

They are not usually very accurate predictions.

Should we predict the asking price for a used 2002-2006 Honda CR-V with 250,000 miles?

No! We only have data for cars with between 22,000 and 86,000 miles. We don’t know if the linear pattern will continue beyond these values. In fact, if we did predict the asking price for a car with 250 thousand miles, it would be βˆ’$2772!

Page 23: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

 Slope:

Y-int:

Predict weight after 16 wk

Predict weight at 2 years:

π‘†π‘™π‘œπ‘π‘’=40. h𝑇 π‘’π‘Ÿπ‘Žπ‘‘π‘€π‘–π‘™π‘™ π‘–π‘›π‘π‘Ÿπ‘’π‘Žπ‘ π‘’ h𝑑 π‘’π‘–π‘Ÿ h𝑀𝑒𝑖𝑔 𝑑 40π‘”π‘Ÿπ‘Žπ‘šπ‘ π‘π‘’π‘Ÿπ‘€π‘’π‘’π‘˜ .

π‘¦βˆ’π‘–π‘›π‘‘=100. h𝑇 π‘’π‘Ÿπ‘Žπ‘‘ 𝑀𝑖𝑙𝑙 h𝑀𝑒𝑖𝑔 100π‘”π‘Ÿπ‘Žπ‘šπ‘ π‘Žπ‘‘ hπ‘π‘–π‘Ÿπ‘‘ .

οΏ½Μ‚οΏ½=100+40 (16 )=740π‘”π‘Ÿπ‘Žπ‘šπ‘ 

This is unreasonable and is a result of extrapolation.

Page 24: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Residual

A residual is the difference between an observed value of the response variable and the value predicted by the regression line.

residual = observed y – predicted y

residual =

Page 25: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

ExampleThe equation of the least-squares regression line

for the sprint time and long-jump distance data is.

Find and interpret the residual for the student who had a sprint time of 8.09 seconds.

Λ‡π’…π’Šπ’”π’•π’‚π’π’„π’†=πŸ‘πŸŽπŸ’ .πŸ“πŸ”πŸπŸ• .πŸ”πŸ‘ (πŸ– .πŸŽπŸ—)=πŸ–πŸ .πŸŽπŸ‘π’Šπ’π’„π’‰π’†π’”

This student jumped 69.97 inches farther than we expected based on his sprint time.

Λ‡π’…π’Šπ’”π’•π’‚π’π’„π’†=πŸ‘πŸŽπŸ’ .πŸ“πŸ”πŸπŸ• .πŸ”πŸ‘ (π’”π’‘π’“π’Šπ’π’•π’•π’Šπ’Žπ’† )=πŸ–πŸ .πŸŽπŸ‘π’Šπ’π’„π’‰π’†π’”

Page 26: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

RegressionLet’s see how a regression line is calculated.

Page 27: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Fat vs Calories in BurgersFat (g) Calories

19 410

31 580

34 590

35 570

39 640

39 680

43 660

Page 28: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Let’s standardize the variables

Fat Cal z - x's z - y's

19 410 -1.959 -2

31 580 -0.42 -0.1

34 590 -0.036 0

35 570 0.09 -0.2

39 640 0.6 0.56

39 680 0.6 1

43 660 1.12 0.78

The line must contain the point and pass through the origin.

,x y

𝒙

π’š

Page 29: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Let’s clarify a little. (Just watch & listen)

The equation for a line that passes through the origin can be written with just a slope & no intercept:

But, we’re using z-scores so our equation should reflect this and thus it’s

Many lines with different slope pass through the origin. Which one fits our data the best? That is, which slope determines the line that minimizes the sum of the squared residuals.

y xz mz

Page 30: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Line of Best Fit –Least Squares Regression Line

It’s the line for which the sum of the squared residuals is smallest. We want to find the mean squared residual.

Focus on the vertical deviations from the line.

π‘Ήπ’†π’”π’Šπ’…π’–π’‚π’=π‘Άπ’ƒπ’”π’†π’“π’—π’†π’…βˆ’π‘·π’“π’†π’…π’Šπ’„π’•π’†π’…

Page 31: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Let’s find it. (just watch & soak it in)

2

2

2 2 2

2 22

2

1

1

2

1

21 1 1

1 2

yy

y x

y x y x

y x y x

z zMSR

n

z mzMSR

n

z mz z m zMSR

nz z z z

MSR m mn n n

MSR mr m

since y xz mz

St. Dev of z scores is 1 so variance is 1 also.

This is r!

Note: MSR is β€œMean Squared Residual”

𝑀𝑆𝑅=1βˆ’2π‘Ÿπ‘š+π‘š2

Page 32: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Continue……

Since this is a parabola – it reaches it’s minimum at 2

bx

a

This gives us

Hence – the slope of the best fit line for z-scores is the correlation coefficient β†’ r.

𝑀𝑆𝑅=1βˆ’2π‘Ÿπ‘š+π‘š2

π’Ž=βˆ’(βˆ’πŸπ’“ )𝟐(𝟏)

=𝒓

Page 33: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Slope – rise over run A slope of r for z-scores means that for every increase of 1

standard deviation in there is an increase of r standard deviations in . β€œOver 1 and up r.”

Translate back to x & y values – β€œover one standard deviation in x, up r standard deviations in y.

Slope of the regression line is:

xz

yz

π’ƒπŸ=𝒓 𝒔 π’š

𝒔𝒙

Page 34: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Why is correlation β€œr”Because it was calculated from the

regression of y on x after standardizing the variables – just like we have just done – thus he used r to stand for (standardized) regression.

Page 35: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Let’s Write the Equation

Fat (g) Calories

19 410

31 580

34 590

35 570

39 640

39 680

43 660

0

0 1

1

from algebra

y-intercept

slope

y mx b

by b b x

b

Slope:

Explain the slope:

𝑏1=π‘Ÿ 𝑠𝑦𝑠π‘₯

=0.961 (89.815)

7.804=11.056

Your calories increase by 11.056 for every additional gram of fat.

οΏ½Μ‚οΏ½=π’ƒπŸŽ+π’ƒπŸ 𝒙

Page 36: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Now for the final part – the equation!y-intercept: Remember – it has to pass through the

point . ,x y

𝒃𝒐=π’š βˆ’π’ƒπŸπ’™

π’ƒπŸŽ=π’š βˆ’π’ƒπ’™=𝟐𝟏𝟎 .πŸ—πŸ“πŸ’

𝑦=𝑏0+𝑏1π‘₯

Solve for y-intercept

Find the value of the y-intercept

Page 37: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Put the parts together to form the equation of the regression line. Now it can be used to predict.

How many calories do I expect to find in a hamburger that has 25 grams of fat?

Page 38: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Try another problem

Mean call -to-shock

timeSurvival

Rate

2 90

6 45

7 30

9 5

12 2

𝑏0=π‘¦βˆ’π‘1π‘₯=101.3285

π‘Ÿ=βˆ’3.84030233

4=βˆ’0.960

𝑏1=π‘Ÿ 𝑠𝑦𝑠π‘₯

=βˆ’9.2956

Page 39: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Interpret the slope:

Interpret the y-intercept:

Predict the survival rate for a 10 min. call to shock time

Predict the survival rate for a 20 min. call to shock time

The survival rate will decrease by 9.2956 for every additional minute of call-to-shock.

The survival rate is 101.3285 when there is NO call to shock time.

οΏ½Μ‚οΏ½π’–π’“π’—π’Šπ’—π’‚π’π’“π’‚π’•π’†=𝟏𝟎𝟏 .πŸ‘πŸπŸ–πŸ“βˆ’πŸ— .πŸπŸ—πŸ“πŸ” (𝟏𝟎 )=πŸ– .πŸ‘πŸ•πŸπŸ“π’Žπ’Šπ’π’–π’•π’†π’”

οΏ½Μ‚οΏ½π’–π’“π’—π’Šπ’—π’‚π’π’“π’‚π’•π’†=𝟏𝟎𝟏 .πŸ‘πŸπŸ–πŸ“βˆ’πŸ— .πŸπŸ—πŸ“πŸ” (𝟐𝟎 )=βˆ’πŸ–πŸ’ .πŸ“πŸ–πŸ‘πŸ“π’Žπ’Šπ’π’–π’•π’†π’”π‘¬π’™π’•π’“π’‚π’‘π’π’π’‚π’•π’Šπ’π’

Page 40: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Try another problem

SAT Math SAT Verbal

600 650

720 800

540 600

450 500

620 620

π‘Ÿ=3.853

4=0.963

𝑏1=1.05

𝑏0=20.7

�̂�𝑨𝑻𝑽𝒆𝒓𝒃𝒂𝒍=𝟐𝟎 .πŸ•+𝟏 .πŸŽπŸ“ (𝑺𝑨𝑻𝑴𝒂𝒕𝒉 )

Page 41: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

Interpret the slope:

Interpret the y-intercept:

Predict the verbal score for a math score of 400

Predict the verbal score for a math score of 500

Verbal score will increase by 1.05 pts for every additional point in math.

Verbal score with no math score.

�̂�𝒆𝒓𝒃𝒂𝒍 𝑺𝒄𝒐𝒓𝒆=𝟐𝟎 .πŸ•+𝟏 .πŸŽπŸ“ (πŸ’πŸŽπŸŽ )=πŸ’πŸ‘πŸ— .πŸ‘πŸ‘

�̂�𝒆𝒓𝒃𝒂𝒍 𝑺𝒄𝒐𝒓𝒆=𝟐𝟎 .πŸ•+𝟏 .πŸŽπŸ“ (πŸ“πŸŽπŸŽ )=πŸ“πŸ’πŸ‘ .πŸ—πŸ—

�̂�𝒆𝒓𝒃𝒂𝒍 𝑺𝒄𝒐𝒓𝒆=𝟐𝟎 .πŸ•+𝟏 .πŸŽπŸ“ (𝑴𝒂𝒕𝒉 )

Extrapolated!

Page 42: Objective Find the line of regression. Use the Line of Regression to Make Predictions.

That’s…all…..Folks!Homework: p. 191 (27-32, 35, 37,39,41, 47)


Recommended