The Ordinary Least Squares (OLS) Estimatorzhu/ams571/Lecture2_571.pdfThe Ordinary Least Squares...

1

The Ordinary Least Squares (OLS)

Estimator

2

Regression Analysis

• Regression Analysis: a statistical technique for

investigating and modeling the relationship

between variables.

• Applications: Engineering, the physical and

chemical science, economics, management, life

and biological science, and the social science

• Regression analysis may be the most widely used

statistical technique

3

• Example 1: delivery time v.s. delivery

volume

– Suspect that the time required by a route

deliveryman to load and service a machine is

related to the number of cases of product

delivered

– 25 randomly chosen retail outlet

– The in-outlet delivery time and the volume of

product delivery

– Scatter diagram: display a relationship between

delivery time and delivery volume

4

5

6

• Y: delivery time, x: delivery volume

Y = 0 + 1 x + ε

• Error, ε:

– The difference between y and 0 + 1 x

– A statistical error, i.e. a random variable

– The effects of the other variables on delivery

time, measurement errors, …

7

• Simple linear regression model:

Y = 0 + 1 x + ε

– x: independent (predictor, regressor) variable

– Y: dependent (response) variable

– ε : error

• If x is fixed, Y is determined by ε.

• Suppose that E(ε) = 0 and Var(ε) = 2 .

Then

E(Y|x) = E(0 + 1 x + ) = 0 + 1 x

Var(Y|x) = Var(0 + 1 x + ) = 2

8

• The true regression line is a line of mean

values: the height of the regression line at

any x is the expected value of Y for that x.

• The slope, 1: the change in the mean of Y

for a unit change in x

• The variability of Y at x is determined by

the variance of the error

9

• Example:

– E(Y|x) = 3.5 + 2 x, and Var(Y|x) = 2

– Y|x ~ N(0 + 1 x , 2 )

– 2 small: the observed values will fall close the

line.

– 2 large: the observed values may deviate

considerably from the line.

10

11

• The regression equation is only an

approximation to the true functional

relationship between the variables.

• Regression model: Empirical model

12

13

• Valid only over the region of the regressor

variables contained in the observed data!

14

• Multiple linear regression model:

Y = 0 + 1 x1 + … + k xk + ε

• Linear: the model is linear in the

parameters, 0, 1, …, k, not because Y is a

linear function of x’s.

15

• Two important objectives:

– Estimate the unknown parameters (fitting

the model to the data): The method of least

squares.

– Model adequacy checking: An iterative

procedure to choose an appropriate regression

model to describe the data.

• Remarks:

– Don’t imply a cause-effect relationship between

the variables

– Can aid in confirming a cause-effect

relationship, but it is not the sole basis!

– Part of a broader data-analysis approach

16

The Least Squares Estimator

• Y = 0 + 1 x + ε

– x: regressor variable

– Y: response variable

– 0: the intercept, unknown

– 1: the slope, unknown

– ε: error with E(ε) = 0 and Var(ε) = 2

(unknown)

• The errors are uncorrelated.

17

• Given x,

E(Y|x) = E(0 + 1 x + ) = 0 + 1 x

Var(Y|x) = Var(0 + 1 x + ) = 2

• Responses are also uncorrelated.

• Regression coefficients: 0, 1

– 1: the change of E(Y|x) by a unit change in x

– 0: E(Y|x=0)

18

Least-squares Estimation of the

Parameters

Estimation of 0 and 1

• Data: n pairs: (yi, xi), i = 1, …, n

• Method of least squares: Minimize

n

i

ii xyS1

2

1010 )]([),(

19

•

• Least-squares normal equations:

20

• The least-squares estimator:

21

• The fitted simple regression model:

– A point estimate of the mean of y for a

particular x

• Residual:

– An important role in investigating the

adequacy of the fitted regression model and in

detecting departures from the underlying

assumption!

22

• Example 2: The Rocket Propellant Data

– Shear strength is related to the age in weeks of

the batch of sustainer propellant.

– 20 observations

– From scatter diagram, there is a strong

relationship between shear strength (Y) and

propellant age (x).

– Assumption

Y = 0 + 1 x + ε

23

24

•

•

• The least-square fit:

65.41112

56.110622

yxnyxS

xnxS

i

iixy

i

ixx

82.2627ˆˆ

15.37ˆ

10

1

xy

S

S

xx

xy

xy 15.3782.2627ˆ

25

• How well does this equation fit the data?

• Is the model likely to be useful as a

predictor?

• Are any of the basic assumption violated

and if so how serious is this?

26

Properties of the Least-Squares Estimators

and the Fitted Regression Model

• are linear combinations of yi

• are unbiased estimators.

01ˆ and ˆ

xxii

n

i

ii Sxxcyc /)( ,ˆ

1

1

xy 10ˆˆ

01ˆ and ˆ

27

•

•

011010

110

1

1

)ˆ()ˆ(

)(

)()()ˆ(

xxxyEE

xc

yEcycEE

i

ii

i

ii

n

i

ii

i xxi

i

xx

i

i

ii

i

ii

Sxx

Sc

yVarcycVarVar

22

2

222

2

1

)(

)()()ˆ(

)1

()ˆ(2

2

0

xxS

x

nVar

Classical Linear Regression Assumptions

• 1. Regression is linear in parameters

• 2. Error term has zero population mean

• 3. Error term is not correlated with X’s

• 4. No serial correlation

• 5. No heteroskedasticity

• 6. No perfect multicollinearity

• and we usually add:

• 7. Error term is normally distributed

(*We did not use this in deriving the OLS – for it is a non-parametric estimator. A good property.)

Gauss-Markov Theorem

• Given OLS assumptions 1 through 6, the OLS

estimator of βk is the minimum variance estimator

from the set of all linear unbiased estimators of βk

for k=0,1,2,…,K. That is, the OLS is the BLUE

(Best Linear Unbiased Estimator)

~~~~~~~~

* Furthermore, by adding assumption 7 (normality),

one can show that OLS = MLE and is the BUE (Best

Unbiased Estimator) also called the UMVUE.

Gauss-Markov Theorem

• Can you prove this theorem?

• This is your Quiz 2.

• Last but not the least, we thank colleagues

who have uploaded their lecture notes on

the internet!

Date post:	22-Mar-2021
Category:	Documents
Upload:	others
View:	13 times
Download:	0 times

The Ordinary Least Squares (OLS) Estimatorzhu/ams571/Lecture2_571.pdfThe Ordinary Least Squares...

Documents