+ All Categories
Home > Documents > MVS 250: V. Katch

MVS 250: V. Katch

Date post: 21-Jan-2016
Category:
Upload: thanh
View: 35 times
Download: 0 times
Share this document with a friend
Description:
S TATISTICS. Chapter 5 Regression. MVS 250: V. Katch. Definition Regression Equation. ^. ( y = m x + b ). Regression. Given a collection of paired data, the regression equation. algebraically describes the relationship between the two variables. Regression Line - PowerPoint PPT Presentation
33
1 MVS 250: V. Katch STATISTICS Chapter 5 Regression
Transcript
Page 1: MVS 250: V. Katch

1

MVS 250: V. KatchMVS 250: V. Katch

STATISTICSChapter 5 Regression

Page 2: MVS 250: V. Katch

2

RegressionDefinition Regression Equation

Given a collection of paired data, the regression equation

Regression Line (line of best fit or least-squares line)

the graph of the regression equation

algebraically describes the relationship between the two variables

(y = mx + b)^

Page 3: MVS 250: V. Katch

3

Regression Line Plotted on Scatter Plot

Page 4: MVS 250: V. Katch

4

Regression Line

Page 5: MVS 250: V. Katch

5

Two different lines, one to predict X and one to predict Y.

Page 6: MVS 250: V. Katch

6

The Regression Equationx is the independent variable

(predictor variable)

y is the dependent variable (response variable)

^

y = mx +b b = slope

Page 7: MVS 250: V. Katch

7

Assumptions1. We are investigating only linear relationships.

2. For each x value, y is a random variable having a normal (bell-shaped) distribution. All of these y distributions have the same variance. Also, for a given value of x, the distribution of y-values has a mean that lies on the regression line. (Results are not seriously affected if departures from normal distributions and equal variances are not too extreme.)

Page 8: MVS 250: V. Katch

8

Formula for y-intercept and slope

Formula 2

b =

(y/n) (x2/n) - (x/n) (xy/n)

(x2/n) - (x/n)2

(slope)

Formula 1

SD2x

(xy/n) - (x/n) (y/n)

(x2/n) - (x/n)2m =

SD2x

(y-intercept)

Page 9: MVS 250: V. Katch

9

If you find r, then

Formula 3

where y is the mean of the y-values and x is the mean of the x values

b = y - mxIntercept = Formula 4

slope = m = r sy/sx

where y is the mean of the y-values, x is the mean of the x-values and m is the slope

Page 10: MVS 250: V. Katch

10

Rounding the y-intercept and the slope

Round to three significant digits

If you use the formulas 1 and 2, and 3 try not to round intermediate values.

Page 11: MVS 250: V. Katch

11

The regression line fits the sample

points best.

Page 12: MVS 250: V. Katch

12

DefinitionsResidual

for a sample of paired (x,y) data, the difference (y - y)

between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation.

Least-Squares PropertyA straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.

^

Residuals and the Least-Squares Property

^

Page 13: MVS 250: V. Katch

13

Residuals and the Least-Squares Property

x 1 2 4 5y 4 24 8 32

y = 5 + 4x

02468

101214161820222426283032

1 2 3 4 5

x

yResidual = 7

Residual = -13Residual = -5

Residual = 11

^

Page 14: MVS 250: V. Katch

14

In predicting a value of y based on some given value of x ...

1. If there is not a significant linear correlation, the best predicted y-value is y.

2. If there is a significant linear correlation, the best predicted y-value is found by substituting the x-value into the regression equation.

Predictions

Page 15: MVS 250: V. Katch

15

Predicting the Value of a Variable

Use the regressionequation to makepredictions. Substitutethe given value in theregression equation.

Calculate the value of rand test the hypothesis

that = 0

Isthere a

significant linearcorrelation

?

Given any value of onevariable, the best predictedvalue of the other variableis its sample mean.

Yes

No

Start

Page 16: MVS 250: V. Katch

16

1. If there is no significant linear correlation, don’t use the regression equation to make predictions.

2. When using the regression equation for predictions, stay within the scope of the available sample data.

3. A regression equation based on old data is not necessarily valid now.

4. Don’t make predictions about a population that is different from the population from which the sample data was drawn.

Guidelines for Using TheRegression Equation

Page 17: MVS 250: V. Katch

17

Example1

2

3

4

5

10

15

18

20

30

34

36

37

39

41

50

59

64

68

86

X Y

Compute r, slope, intercept, regression

What is this equation used for?

Page 18: MVS 250: V. Katch

18

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

What is the best predicted size of a household that discard 0.50 lb of plastic?

Page 19: MVS 250: V. Katch

19

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

What is the best predicted size of a household that discard 0.50 lb of plastic?

b = 0.549

m = 1.48

Using a calculator:

y = 0.549 + 1.48 (0.50)y = 1.3

A household that discards 0.50 lb of plastic has approximately one person.

Page 20: MVS 250: V. Katch

20

Definitions Marginal Change

the amount a variable changes when the other variable changes by exactly one unit

Outlier a point lying far away from the other data

points

Influential Points points which strongly affect the graph of the

regression line

Page 21: MVS 250: V. Katch

21

Example 5.4 Height and Foot Length (cont)

Regression equation uncorrected data: 15.4 + 0.13 heightcorrected data: -3.2 + 0.42 height

Correlationuncorrected data: r = 0.28corrected data: r = 0.69

Three outliers were data entry errors.

Page 22: MVS 250: V. Katch

22

Example 5.10 Earthquakes in US

Correlationall data: r = 0.73w/o SF: r = –0.96

San Francisco earthquake of 1906.

Page 23: MVS 250: V. Katch

23

Example: Predict the quiz score of a student who spends 30 hours a week watching television.

One more step…….

Page 24: MVS 250: V. Katch

24

Compute the Standard Error of the Estimate

The predicted score is 56.56 points 7.978 points+

SY*X = SDY√1-r2

SY*X = 13.83√1-(-8.17)2

SY*X = ±7.978

Page 25: MVS 250: V. Katch

25

Multiple RegressionDefinition

Multiple Regression Equation

A linear relationship between a dependent

variable y and two or more independent

variables (x1, x2, x3 . . . , xk)

y = m0 + m1x1 + m2x2 + . . . + mkxk ^

Page 26: MVS 250: V. Katch

26

Generic Models

Linear: y = a + bx

Quadratic: y = ax2 + bx + c

Logarithmic: y = a + b lnx

Exponential: y = abx

Power: y = axb

Logistic: y = c1 + ae -bx

Page 27: MVS 250: V. Katch

27

Page 28: MVS 250: V. Katch

28

Page 29: MVS 250: V. Katch

29

Page 30: MVS 250: V. Katch

30

Page 31: MVS 250: V. Katch

31

Page 32: MVS 250: V. Katch

32

Page 33: MVS 250: V. Katch

33

Development of a Good Mathematics Model

Look for a Pattern in the Graph: Examine the graph of the plotted points and compare the basic pattern to the known generic graphs.

Find and Compare Values of R2: Select functions that result in larger values of R2, because such larger values correspond to functions that better fit the observed points.

Think: Use common sense. Don’t use a model that lead to predicted values known to be totally unrealistic.


Recommended