+ All Categories
Home > Documents > Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate...

Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate...

Date post: 12-Jan-2016
Category:
Upload: suzanna-king
View: 236 times
Download: 1 times
Share this document with a friend
Popular Tags:
66
Regression Analysis
Transcript
Page 1: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Regression Analysis

Page 2: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Introduction to Regression Analysis (RA)

• Regression Analysis is used to estimate a function f( ) that describes the relationship between a continuous dependent variable and one or more independent variables.

Y = f(X1, X2, X3,…, Xn) +

Note:• f( ) describes systematic variation in the relationship. represents the unsystematic variation (or random

error) in the relationship.

Page 3: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

• In other words, the observations that we have interest can be separated into two parts:

Y = f(X1, X2, X3,…, Xn) +

Observations = Model + Error

Observations = Signal + Noise

Ideally, the noise shall be very small, comparing to the model.

Page 4: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Signal to NoiseWhat we observe can be divided into:

what we see

signal

noise

Page 5: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Model specification

yi = 0 + 1Xi + 2Zi

If the true function is:

And we fit:

yi = 0 + 1Xi + 2Zi + ei

Our model is exactly specified and we obtain an unbiased and efficient estimate.

Page 6: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Model specification

yi = 0 + 1Xi + 2Zi + 3XiZi + 4Zi

And finally, if the true function is:

And we fit:

yi = 0 + 1Xi + 2Zi + ei

Our model is underspecified, we excludedsome necessary terms, and we

obtain a biased estimate.

2

Page 7: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Model specification

yi = 0 + 1Xi + 2Zi

On the other hand, if the true function is:

And we fit:

yi = 0 + 1Xi + 2Zi + 3XiZi + ei

Our model is overspecified, we includedsome unnecessary terms, and we

obtain an inefficient estimate.

Page 8: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Model specification• if specify the model exactly, there is no bias• if you overspecify the model (add more terms than

needed), result is unbiased, but inefficient• if you underspecify the model (omit one or more

necessary terms (the result is biased)• Overall Strategy

– best option is to exactly specify the true function

– we would prefer to err by overspecifying our model because that only leads to inefficiency

– Therefore, start with a likely overspecified model and reduce it

Page 9: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

An Example

• Consider the relationship between advertising (X1) and sales (Y) for a company.

• There probably is a relationship......as advertising increases, sales

should increase.• But how would we measure and

quantify this relationship?

Page 10: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

A Scatter Plot of the Data

0.0

100.0

200.0

300.0

400.0

500.0

600.0

20 30 40 50 60 70 80 90 100

Advertising (in $1,000s)

Sales (in $1,000s)

Page 11: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

The Nature of a Statistical Relationship

Regression Curve

Probability distributions for Y at different levels of X

Y

X

Page 12: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

A Simple Linear Regression Model

• The scatter plot shows a linear relation between advertising and sales.

• So the following regression model is suggested by the data,

This refers to the true relationship between the entire population of advertising and sales values.

Y Xi 0 1 1i i

• The estimated regression function (based on our sample) will be represented as,

Y Xi b bi

0 1 1

Xof level given aat Y of valuefitted) (of estimated the is Yiˆ

Page 13: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Determining the Best Fit• Numerical values must be assigned to b0 and b1

ESS Y Y Y X ( ) ( ( ))ii

n

i ii

n

b bi

1

2

10 1 1

2

• The method of “least squares” selects the values that minimize:

• If ESS=0 our estimated function fits the data perfectly.

• We could solve this problem using Solver...

Page 14: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Estimation – Linear Regressin

Formula for a straight lineFormula for a straight line

y = by = b00 + b + b11x + ex + e

xx

yy

want to solve forwant to solve forwant to solve forwant to solve forb0 = interceptb0 = intercept

b1 = slopeb1 = slope

yy

xx

yyxx

==

outcomeoutcomeoutcomeoutcome programprogramprogramprogram

Page 15: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

The Estimated Regression Function

• The estimated regression function is:

. .Y Xi i 36 342 5550 1

Page 16: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Evaluating the “Fit”

R2 = 0.9691

0.0

100.0

200.0

300.0

400.0

500.0

600.0

20 30 40 50 60 70 80 90 100

Advertising (in $000s)

Sal

es (

in $

000s

)

Page 17: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

The R2 Statistic

• The R2 statistic indicates how well an estimated regression function fits the data.

• 0<= R2 <=1• It measures the proportion of the total

variation in Y around its mean that is accounted for by the estimated regression equation.

• To understand this better, consider the following graph...

Page 18: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Error Decomposition

Y

X

Y

Y = b0 + b1X^

*Yi (actual value)

Yi - Y Yi (estimated value)^

Yi - Y^

Yi - Yi^

Page 19: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Partition of the Total Sum of Squares

( ( ) ( )Y Y) Y Y Y Y2i

i

n

i

n

i ii

n

i

1 1

2

1

2

or,TSS = ESS + RSS

RRSS

TSS1

ESS

TSS2

Page 20: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.
Page 21: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Degree of Linear Correlation

• R2 = 1 = perfect linear correlation; R2 = 0 = no correlation• High R2 = good fit only if linear model is appropriate;

always check with a scatterplot• Correlation does not prove causation; x and y may both be

correlated to a third (possibly unidentified) variable• A more popular (but less meaningful) measure is the

“correlation coefficient”:

R2 = RSQ([y-range],[x-range]

r = CORREL([y-range],[x-range])

x

x y y

scov(x,y)r b

s s s

Page 22: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

R2 = 0.67 R2 = 0.67

R2 = 0.67R2 = 0.67

Page 23: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Testing for Significance: Testing for Significance: FF Test Test

HypothesesHypotheses

HH00: : 11 = 0 = 0

HHaa: : 11 = 0 = 0 Test StatisticTest Statistic

Rejection RuleRejection Rule

Reject Reject HH00 if if FF > > FF

where where FF is based on an is based on an FF distribution with 1 distribution with 1 d.f. in d.f. in

the numerator and the numerator and nn - 2 d.f. in the - 2 d.f. in the denominator.denominator.

2/

1/

nESS

RSSF

Page 24: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Some Cautions about theInterpretation of Significance Tests• Rejecting H0: b1 = 0 and concluding that the

relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y.

• Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y.

Page 25: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

An Example of Inappropriate Interpretation

• A study shows that, in elementary schools, the ability of spelling is stronger for the students with larger feet.

Could we conclude that the size of foot can influence the ability of spelling?

Or there exists another factor that can influence the foot size and the spelling ability?

Page 26: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Making Predictions

• Estimated Sales = 36.342 + 5.550 * 65= 397.092

• So when $65,000 is spent on advertising, we expect the average sales level to be $397,092.

. .Y Xi i 36 342 5550 1

• Suppose we want to estimate the average levels of sales expected if $65,000 is spent on advertising.

Page 27: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

The Standard Error• The standard error measures the scatter in

the actual data around the estimate regression line.

Sn ke

i ii

n

( )Y Y 2

1

1

where k = the number of independent variables

• For our example, Se = 20.421

• This is helpful in making predictions...

Page 28: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

An Approximate Prediction Interval• An approximate 95% prediction interval

for a new value of Y when X1=X1h is given

by Yh eS2

Y Xh b bh

0 1 1

where:

• Example: If $65,000 is spent on advertising:

95% lower prediction interval = 397.092 - 2*20.421 = 356.25095% upper prediction interval = 397.092 + 2*20.421 = 437.934

• If we spend $65,000 on advertising we are approximately 95% confident actual sales will be between $356,250 and $437,934.

Page 29: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

An Exact Prediction Interval• A (1-)% prediction interval for a new

value of Y when X1=X1h is given by

Y Xh b bh

0 1 1

( / , )Y th n pS 1 2 2

where:

S Snp e

i

nh

i

11 1

2

12

1

( )

( )

X X

X X

Page 30: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example• If $65,000 is spent on advertising:

95% lower prediction interval = 397.092 - 2.306*21.489 = 347.556

95% upper prediction interval = 397.092 + 2.306*21.489 = 446.666

• If we spend $65,000 on advertising we are 95% confident actual sales will be between $347,556 and $446,666.

• This interval is only about $20,000 wider than the approximate one calculated earlier but was much more difficult to create.

• The greater accuracy is not always worth the trouble.

Page 31: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Comparison of Prediction Interval Techniques

125

175

225

275

325

375

425

475

525

575

25 35 45 55 65 75 85 95Advertising Expenditures

Sales

Regression Line

Prediction intervals created using standard error Se

Prediction intervals created using

standard prediction error Sp

Page 32: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Confidence Intervals for the Mean

• A (1-)% confidence interval for the true mean value of Y when X1=X1h

is given by

( / , )Y th n aS 1 2 2

Y Xh b bh

0 1 1

where:

S Sna e

i

nh

i

1 12

12

1

( )

( )

X X

X X

Page 33: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

A Note About Extrapolation

• Predictions made using an estimated regression function may have little or no validity for values of the independent variables that are substantially different from those represented in the sample.

Page 34: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

55

60

65

70

75

55 60 65 70 75

Height of Mother

Hei

gh

t o

f D

aug

hte

rWhat Does “Regression” Mean?

Page 35: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

What Does “Regression” Mean?

1. Draw “best-fit” line free hand

2. Find mother’s height = 60”, find average daughter’s height

3. Repeat for mother’s height = 62”, 64”… 70”; draw “best-fit” line for these points

4. Draw line daughter’s height = mother’s height

5. For a given mother’s height, daughter’s height tends to be between mother’s height and mean height: “regression toward the mean”

Page 36: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

What Does “Regression” Mean?

y = 0.45x + 36.29R2 = 0.21

55

60

65

70

75

55 60 65 70 75

Height of Mother

Hei

gh

t o

f D

aug

hte

r

Page 37: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

• Residual for Observation i

yi – yi

• Standardized Residual for Observation i

where:

Residual Analysis

y ysi i

y yi i

y ysi i

y yi i

^̂̂̂

s s hy y ii i 1s s hy y ii i 1^̂

Page 38: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.
Page 39: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Residual Analysis• Detecting Outliers

– An outlier is an observation that is unusual in comparison with the other data.

– Minitab classifies an observation as an outlier if its standardized residual value is < -2 or > +2.

– This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier.

– This rule’s shortcoming can be circumvented by using studentized deleted residuals.

– The |i th studentized deleted residual| will be larger than the |i th standardized residual|.

Page 40: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Multiple Regression Analysis

• Most regression problems involve more than one independent variable.

• If each independent variables varies in a linear manner with Y, the estimated regression function in this case is:Y X X Xi k kb b b b

i i i 0 1 1 2 2

• The optimal values for the bi can again be found by minimizing the ESS.

• The resulting function fits a hyperplane to our sample data.

Page 41: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example Regression Surface for Two Independent Variables

Y

X1X2

*

* *

**

**

*

* **

*

**

* **

**

**

*

*

Page 42: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Multiple Regression Example:Real Estate Appraisal

• A real estate appraiser wants to develop a model to help predict the fair market values of residential properties.

• Three independent variables will be used to estimate the selling price of a house:– total square footage– number of bedrooms– size of the garage

Page 43: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Selecting the Model

• We want to identify the simplest model that adequately accounts for the systematic variation in the Y variable.

• Arbitrarily using all the independent variables may result in overfitting.

• A sample reflects characteristics: – representative of the population– specific to the sample

• We want to avoid fitting sample specific characteristics -- or overfitting the data.

Page 44: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Models with One Independent Variable• With simplicity in mind, suppose we fit

three simple linear regression functions:Y Xi b b

i 0 1 1

Y Xi b bi

0 2 2

Y Xi b bi

0 3 3

Variables Adjusted Parameterin the Model R2 R2 Se Estimates

X1 0.870 0.855 10.299 b0=9.503, b1=56.394X2 0.759 0.731 14.030b0=78.290, b2=28.382X3 0.793 0.770 12.982b0=16.250, b3=27.607

• Key regression results are:

• The model using X1 accounts for 87% of the variation in Y, leaving 13% unaccounted for.

Page 45: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Important Software Note

When using more than one independent variable, all variables for the X-range must be in one contiguous block of cells (that is, in adjacent columns).

Page 46: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Models with Two Independent Variables• Now suppose we fit the following models with

two independent variables:

Y X Xi b b bi i

0 1 1 2 2Y X Xi b b b

i i 0 1 1 3 3

Variables Adjusted Parameterin the Model R2 R2 Se Estimates

X1 0.870 0.855 10.299 b0=9.503, b1=56.394 X1 & X2 0.939 0.924 7.471 b0=27.684, b1=38.576

b2=12.875 X1 & X3 0.877 0.847 10.609 b0=8.311, b1=44.313

b3=6.743

• Key regression results are:

• The model using X1 and X2 accounts for 93.9% of the variation in Y, leaving 6.1% unaccounted for.

Page 47: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

The Adjusted R2 Statistic

• As additional independent variables are added to a model:– The R2 statistic can only increase.– The Adjusted-R2 statistic can increase or decrease.

RESS

TSSan

n k2 1

1

1

• The R2 statistic can be artificially inflated by adding any independent variable to the model.

• We can compare adjusted-R2 values as a heuristic to tell if adding an additional independent variable really helps.

Page 48: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

A Comment On Multicollinearity

• It should not be surprising that adding X3 (# of

bedrooms) to the model with X1 (total square footage)

did not significantly improve the model.

• Both variables represent the same (or very similar) things -- the size of the house.

• These variables are highly correlated (or

collinear).

• Multicollinearity should be avoided.

Page 49: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Testing for Significance: Multicollinearity • The term multicollinearity refers to the correlation

among the independent variables.• When the independent variables are highly correlated

(say, |r | > .7), it is not possible to determine the separate effect of any particular independent variable on the dependent variable.

• If the estimated regression equation is to be used only for predictive purposes, multicollinearity is usually not a serious problem.

• Every attempt should be made to avoid including independent variables that are highly correlated.

Page 50: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Model with Three Independent Variables• Now suppose we fit the following model with

three independent variables:

Y X X Xi b b b bi i i

0 1 1 2 2 3 3

Variables Adjusted Parameterin the Model R2 R2 Se Estimates

X1 0.870 0.855 10.299 b0=9.503, b1=56.394 X1 & X2 0.939 0.924 7.471 b0=27.684, b1=38.576, b2=12.875

X1, X2 & X3 0.943 0.918 7.762 b0=26.440, b1=30.803, b2=12.567, b3=4.576

• Key regression results are:

• The model using X1 and X2 appears to be best:– Highest adjusted-R2

– Lowest Se (most precise prediction intervals)

Page 51: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Making Predictions

• Let’s estimate the avg selling price of a house with 2,100 square feet and a 2-car garage:

Y X Xi b b bi i

0 1 1 2 2

. . * . . * .Yi 27 684 38576 21 12 875 2 134 444

• The estimated average selling price is $134,444

• A 95% prediction interval for the actual selling price is approximately:

95% lower prediction interval = 134.444 - 2*7.471 = $119,502

95% lower prediction interval = 134.444 + 2*7.471 = $149,386

Yh eS2

Page 52: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Binary Independent Variables• Other types of non-quantitative factors could

independent variables could be included in the analysis using binary variables.

• Example: The presence (or absence) of a swimming pool,

Xi

pi

1

0

, if house has a pool

otherwise,

Xi

ri

1

0

, if the roof of house is in good condition

otherwise,

• Example: Whether the roof is in good, average or poor condition,

Xi

r i 1

1

0

, if the roof of house is in average condition

otherwise,

Page 53: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Polynomial Regression• Sometimes the relationship between a dependent

and independent variable is not linear.

$50

$75

$100

$125

$150

$175

0.900 1.200 1.500 1.800 2.100 2.400Square Footage

Sel

lin

g P

rice

• This graph suggests a quadratic relationship between square footage (X) and selling price (Y).

Page 54: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

The Regression Model

• An appropriate regression function in this case might be,

Y X Xi b b bi i

0 1 1 2 12

or equivalently,Y X Xi b b b

i i 0 1 1 2 2

where,X X2 1

2i i

Page 55: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Implementing the Model

Page 56: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Graph of Estimated Quadratic Regression Function

$50

$75

$100

$125

$150

$175

0.900 1.200 1.500 1.800 2.100 2.400Square Footage

Sel

lin

g P

rice

Page 57: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Fitting a Third Order Polynomial Model

• We could also fit a third order polynomial model, Y X X Xi b b b b

i i i 0 1 1 2 1

23 1

3

or equivalently,Y X X Xi b b b b

i i i 0 1 1 2 2 3 3

where,X X2 1

2i i

X X3 13

i i

Page 58: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Graph of Estimated Third Order Polynomial Regression Function

$50

$75

$100

$125

$150

$175

0.900 1.200 1.500 1.800 2.100 2.400Square Footage

Sel

lin

g P

rice

Page 59: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Overfitting

• When fitting polynomial models, care must be taken to avoid overfitting.

• The adjusted-R2 statistic can be used for this purpose here also.

Page 60: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example: Programmer Salary Survey

A software firm collected data for a sample of 20 computer programmers. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s programmer aptitude test. The years of experience, score on the aptitude test, and corresponding annual salary ($1000s) for a sample of 20 programmers is shown on the next slide.

Page 61: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example: Programmer Salary Survey

Exper. Score Salary Exper. Score Salary

4 78 24 9 88 38

7 100 43 2 73 26.6

1 86 23.7 10 75 36.2

5 82 34.3 5 81 31.6

8 86 35.8 6 74 29

10 84 38 8 87 34

0 75 22.2 4 79 30.1

1 80 23.1 6 94 33.9

6 83 30 3 70 28.2

6 91 33 3 89 30

Page 62: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example: Programmer Salary Survey• Multiple Regression Model

Suppose we believe that salary (y) is related to the years of experience (x1) and the score on the programmer aptitude test (x2) by the following regression model:

y = 0 + 1 x1 + 2 x2 +

where

y = annual salary ($000)

x1 = years of experience

x2 = score on programmer aptitude test

Page 63: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example: Programmer Salary Survey• Multiple Regression Equation

Using the assumption E () = 0, we obtain

E(y ) = 0 + 1 x1 + 2 x2

• Estimated Regression Equation

b0, b1, b2 are the least squares estimates of 0, 1, 2.

Thus

y = b0 + b1x1 + b2x2.

Page 64: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example: Programmer Salary Survey• Solving for the Estimates of 0, 1,

2

ComputerComputerPackagePackage

for Solvingfor SolvingMultipleMultiple

RegressionRegressionProblemsProblems

ComputerComputerPackagePackage

for Solvingfor SolvingMultipleMultiple

RegressionRegressionProblemsProblems

bb00 = = bb11 = = bb22 = =RR22 = =

etc.etc.

bb00 = = bb11 = = bb22 = =RR22 = =

etc.etc.

Input DataInput DataLeast SquaresLeast Squares

OutputOutput

xx11 xx22 yy

4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30

xx11 xx22 yy

4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30

Page 65: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example: Programmer Salary Survey

• Data Analysis Output

The regression is

Salary = 3.17 + 1.40 Exper + 0.251 Score

Predictor Coef Stdev t-ratio p

Constant 3.174 6.156 .52 .613

Exper 1.4039 .1986 7.07 .000

Score .25089 .07735 3.24 .005

s = 2.419 R-sq = 83.4% R-sq(adj) = 81.5%

Page 66: Regression Analysis Introduction to Regression Analysis (RA) Regression Analysis is used to estimate a function f ( ) that describes the relationship.

Example: Programmer Salary Survey• Computer Output (continued)

Analysis of Variance

SOURCE DF SS MS F P

Regression 2 500.33 250.16 42.760.000

Error 17 99.46 5.85

Total 19 599.79


Recommended