+ All Categories
Home > Documents > Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction...

Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction...

Date post: 02-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
71
Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 1 of 83 Diploma in Statistics Introduction to Regression Lecturer: Prof John Haslett Department of Statistics Lloyd Building, Room 146 email: [email protected]
Transcript
Page 1: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

1 of 83

Diploma in StatisticsIntroduction to Regression

Lecturer: Prof John Haslett

Department of Statistics

Lloyd Building, Room 146

email: [email protected]

Page 2: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

2 of 83

Lecture 1.11. Introduction to course

– Course outline

– Learning objectives

– References

2. Illustrative examples– Scatter plots

3. Objectives of regression

3. Introduction to SLR– Case study: Productivity in mail sorting

4. Exercises and Readings

M.Stuart

Page 3: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

3 of 83

1 Introduction to CourseCourse Outline

Introduction:

• case studies• graphical analysis, scatter plots

Review of Simple Linear Regression

• Initial analysis• Linear model• Prediction formula• Estimation and Testing• Correlation

Non-linear relationships

• the log transformation,• and others

M.Stuart

Page 4: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

4 of 83

Course Outline

Multiple regression analysis

• Initial data analysis• Least squares fit and interpretation• Diagnostic analysis of residuals• Application

Special topics

• indicator variables• correlated explanatory variables• model selection

M.Stuart

Page 5: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

5 of 83

Course Outline

Issues in interpreting regression

– "cause and effect"– control of study environment– observational studies

vscontrolled experiments

Specialisations and extensions

a list!

Statistical computing laboratories

M.Stuart

Page 6: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

7 of 83

Introduction to Course

Learning ObjectivesBe able to • use standard methods in straightfoward applications

• explain the results of applying the methods

• outline the basis for the methods

• describe and check the assumptions underlying the methods

• identify sources of further information regarding more detailed understanding and more extensive methodology

M.Stuart

Page 7: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

8 of 83

Introduction to Course

ReferencesStuart, M. Introduction to Statistical Analysis for Business and Industry,

Arnold, 2003, particularly Chapters 6, 8, ISBN 0340808446, Hamilton 519.5 N33, Lecky LEN 658.072 P32

Mullins, E., Statistics for the Quality Control Chemistry Laboratory, Royal Society of Chemistry, 2003, particularly Chapter 6, ISBN 0854046712, Hamilton, 543 P51

Hamilton, L.C., Regression with Graphics, Duxbury Press, 1992, ISBN 0534159001, Hamilton 519.5 N27

Ryan, T.P. Modern Regression Methods (2nd ed) Wiley, 2009, ISBN 9780470081860 (Santry JL- 7-204 1st ed)

M.Stuart

Page 8: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

9 of 83

2. Illustrative Examples

Example 1Ambient Temperature and Gas Consumption

Weekly household gas consumption (in 1000 cubic

feet) and the average outside temperature (in

degrees Celsius) were recorded for 56 weeks.

The house thermostat was set at 20°C throughout.

M.Stuart

Page 9: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

10 of 83

Ambient Temperature and Gas Consumption

Week Temperature Gas Week Temperature Gas

1 -0.8 7.2 27 -0.7 4.8 2 -0.7 6.9 28 0.8 4.6 3 0.4 6.4 29 1.0 4.7 4 2.5 6.0 30 1.4 4.0 5 2.9 5.8 31 1.5 4.2 6 3.2 5.8 32 1.6 4.2 7 3.6 5.6 33 2.3 4.1 8 3.9 4.7 34 2.5 4.0 9 4.2 5.8 35 2.5 3.5 10 4.3 5.2 36 3.1 3.2 11 5.4 4.9 37 3.9 3.9 12 6.0 4.9 38 4.0 3.5 13 6.0 4.3 39 4.0 3.7 14 6.0 4.4 40 4.2 3.5 15 6.2 4.5 41 4.3 3.5 16 6.3 4.6 42 4.6 3.7 17 6.9 3.7 43 4.7 3.5

18 7.0 3.9 44 4.9 3.4 19 7.4 4.2 45 4.9 3.7 20 7.5 4.0 46 4.9 4.0 21 7.5 3.9 47 5.0 3.6 22 7.6 3.5 48 5.3 3.7 23 8.0 4.0 49 6.2 2.8 24 8.5 3.6 50 7.1 3.0 25 9.1 3.1 51 7.2 2.8 26 10.2 2.6 52 7.5 2.6 53 8.0 2.7 54 8.7 2.8 55 8.8 1.3 56 9.7 1.5

M.Stuart

Page 10: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

11 of 83

Example 1

Ambient Temperature and Gas Consumption

M.Stuart

Page 11: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

13 of 83

Example 1

Ambient Temperature and Gas Consumption

Regression Analysis: Gas versus Temperature

The regression equation is

Gas = 5.49 - 0.290 Temperature

Predictor Coef SE Coef T P

Constant 5.4862 0.2357 23.28 0.000

Temperature -0.29021 0.04220 -6.88 0.000

S = 0.860608

M.Stuart

Page 12: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

14 of 83

Example 1Ambient Temperature and Gas

Consumption

M.Stuart

Page 13: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

15 of 83

3. Objectives of Simple Lin RegressionFrom Base Module

• Detailing a known quantitative relation for Y and X– Spot strength of welds and diameter

• Is there a relation?– Dietary fat and prostate cancer levels

• Reliable prediction bounds– Shelf life and potency

• Precision in instrument calibration– Peak area and Dye concentration

Also:

Simple summary

Checking to

confirm there is

no (evidence of)

a relationship

M.Stuart

Page 14: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

16 of 83

Galton’s Data

Galton’s heights data - 1078 pairs; Corr = 0.501

Y=Offspring

X=Mid-parent

Slope = 0.514

Reversion to the mediocre

Regression to the mean

Page 15: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

17 of 83

Issues in Linear Regression

Details of a known relationship

• Symmetry which is X? Does it matter?

– Regression is asymmetric slopes -0.29, -1.60

– Correlation is symmetric -0.683– Note -0.29 -1.60= (-0.683)2

87654321

10

8

6

4

2

0

Gas

Te

mp

era

ture

Scatterplot of Temperature vs Gas

M.Stuart

Page 16: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

18 of 83

Issues in Linear RegressionDetailing a known quantitative relation for Y and X

Direction Important?

Galton’s heights data - 1078 pairs;

Corr = 0.501

offspring =

33.9 + 0.514 mid-parent

mid-parent =

34.1 + 0.489 offspring

Page 17: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

19 of 83

Issues in Linear Regression

Details of a known relationship

• Linear relationship? Does it matter? Why?

• Are coefficients important?

Page 18: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

21 of 83

Are precision details important?

Precision about what?

Coefficients

Gas = 5.49 - 0.290 Temp (- 0.290 ± ?)

offspring = 33.9 + 0.514 mid-parent ( 0.514 ± ?)

Corr( Hts) = 0.501 (0.501 ± ?)

Page 19: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

22 of 83

Issues in Linear RegressionInverse use in calibration

Fit forward Regression model; use inversely

M.Stuart

Page 20: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

24 of 83

Example 1: DemonstrationUsing ‘Case’ variable

M.Stuart

Page 21: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

25 of 83

Example 1: DemonstrationUser 1 There are two different lines!!

That must be because something

happened. I bet is was.........

M.Stuart

Page 22: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

27 of 83

Example 1: DemonstrationUser 2 Of course there are. Insulation was

installed. What seems to be

interesting is that they have

different slopes.

That’s not what we were expecting!

Or is it just my

eyes..........?

M.Stuart

Page 23: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

28 of 83

Example 1: DemonstrationUser 2 Of course there are. Insulation was

installed. They don’t seem to be

linear. That’s not what we were

expecting.

Or is it just my

eyes?

Page 24: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

29 of 83

Example 1: Lesson

• Omitting a key variable makes nonsense of the analysis and interpretation

• Typically Naive Researcher who

– Is not quite sure what is the ROLE of regression in answering a question OR

– Is not quite sure what data will answer a question OR

– Is not quite sure what is the interesting research question

M.Stuart

Page 25: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

30 of 83

Example 2CEO Compensation (US$) and Company Sales (US$m)

(Forbes Magazine, May 1994)

Total comp Industry Sales

28816 Financial 242

52000 ComputersComm 553

100000 Insurance 3653

102308 ComputersComm 2195

221641 Financial 238

250000 Entertainment 415

M.Stuart

Page 26: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

31 of 83

Example 2CEO Compensation and Company Sales,

logarithmic transformation

M.Stuart

Page 27: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

32 of 83

Example 2CEO Compensation and Company Sales,

logarithmic transformation

User 1: Which scale to use?

User 2: What does this tell us

about the nature of the

relationship?

M.Stuart

Page 28: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

33 of 83

Example 2CEO Compensation and Company Sales,

logarithmic transformation

User 2: What does this tell us

about the nature of the

relationship?

10 10

10 10 10

roughly

log log

log log log

where 1

roughly

Compconst

Sales

Compconst

Sales

Comp const Sales

Lcomp Lconst LSales

Lcomp a bLSales b

Regression

Log10Comp

= 5.28 + 0.26Log10Sales

Page 29: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

34 of 83

Example 3Multiple regression

Relating

Respiratory Muscle Strength

to

other measures of lung function

in patients suffering from cystic fibrosis,

adjusting for sex and body size.

M.Stuart

Page 30: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

35 of 83

The variablesPEmax Maximal static expiratory pressure

a measure of expiratory muscle strength

FEV1 Forced expiratory volume in 1 second

RV Residual volume (after 1 second)

FRC Functional residual capacity

TLC Total lung capacity

Sex 0 = Male, 1 = Female

Height cms.

Weight kg.

BMP Body mass (percent of median of normal cases)

M.Stuart

Page 31: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

36 of 83

The regression equation is

PEmax = 102 + 1.36 FEV1 + 0.172 RV - 0.206 FRC + 0.275 TL

- 0.8 Sex - 0.373 Height + 2.15 Weight - 1.39 BMP

Predictor Coef SE Coef T P

Constant 101.7 172.3 0.59 0.563

FEV1 1.3626 0.9185 1.48 0.157

RV 0.1723 0.1860 0.93 0.368

FRC -0.2056 0.4410 -0.47 0.647

TLC 0.2751 0.4614 0.60 0.559

Sex -0.76 14.04 -0.05 0.958

Height -0.3731 0.8721 -0.43 0.675

Weight 2.152 1.195 1.80 0.091

BMP -1.3868 0.9135 -1.52 0.148

M.Stuart

Page 32: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

37 of 83

Multiple vs Simple Regression

Prediction

• SLR Model Y predicted linearly by X varapart from random error

Y = a+ bX + error

• MLR Y predicted by several X vars

Y = a+ b1X1 + b2X2 + ...error

M.Stuart

Page 33: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

38 of 83

Multiple vs Simple Regression

Prediction

• SLR Model Y = a+ bX + error

Y changes by b when X increases by 1

on average, or apart from error

• MLR Model Y = a+ b1X1 + b2X2 + ...error

Y changes by b1 when X1 increases by 1

on average, or apart from error

when other X variables do not change

M.Stuart

Page 34: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

39 of 83

Multiple RegressionInterpretation of coefficients

Y changes by b1 when X1 increases by 1

on average, or apart from error

when other X variables do not changeResearch Issue

adjusting for sex and body size.

PEmax = 102 +1.36FEV1 +0.172RV -0.206FRC +0.275TL

-0.8Sex -0.373Height +2.15Weight

- 1.39BMP + error

M.Stuart

Page 35: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

40 of 83

Part 3: Introduction to SLR

Case study: Productivity in mail sorting

– Initial analysis

– Linear model

– Prediction formula (least squares)

– Significance testing

– Confidence intervals

M.Stuart

Page 36: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

41 of 83

Mail Processing Hours(Fiscal Years 1962 -63)

Fiscal Year 1962 Fiscal Year 1963

Four-week

accounting

period

Pieces of mail

handled

(in millions)

Manhours

used

(in thousands)

Four-week

accounting

period

Pieces of mail

handled

(in millions)

Manhours

used

(in thousands)

1 157 572 1 154 569

2 161 570 2 157 564

3 168 645 3 164 573

4 186 645 4 188 667

5 183 645 5 191 700

6 184 671 6 180 765

7 268 1053 7 270 1070

8 180 675 8 180 637

9 175 670 9 172 650

10 193 710 10 184 655

11 184 656 11 179 665

12 179 640 12 169 599

13 164 599 13 160 605

M.Stuart

Page 37: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

42 of 83

Exercise 1

Discuss the reasonableness of "man hours used" as a measure of cost of handling mail.

Suggest and discuss alternatives, where appropriate.

M.Stuart

Page 38: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

47 of 83

Initial analysis

Line plots of Manhours and Volume

600

700

800

900

1000

Manhours

0 13 26

Four-week periods

150

175

200

225

250

275

Volume

M.Stuart

Page 39: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

48 of 83

Line plots of Manhours and Volume,Christmas excluded

550

600

650

700

750

800

Manhours

0 13 26

Four-week periods

150

160

170

180

190

200

Volume

M.Stuart

Page 40: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

49 of 83

Scatter plots of Manhours and Volume

150 175 200 225 250 275

Volume

600

700

800

900

1000

1100

Manhours

exceptions?

exception?

stable system?

M.Stuart

Page 41: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

50 of 83

Scatter plots of Manhours and Volumewith curve representing return to scale

150 175 200 225 250 275

Volume

600

700

800

900

1000

1100

Manhours

M.Stuart

Page 42: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

51 of 83

Exercise 2

On a rough graph of y against x, plot the points

(3,4), (0,3), (9,6),

and the line with equation

y = 3 + ⅓ x

M.Stuart

Page 43: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

52 of 83

Linear model

Scatter plot, reduced data set

150 160 170 180 190

Volume

550

600

650

700

Manhours

M.Stuart

Page 44: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

55 of 83

Simple linear regression modelwith Normal model for chance variation

150 160 170 180 190

Volume

550

600

650

700

Manhours

Y = α + βX + ε

M.Stuart

Page 45: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

56 of 83

The simple linear regression model• Y = α + βX + ε

Y is the Response variable

X is the Explanatory variable

ε represents chance variation

• Model parameters:

α and β are the linear parameters

hidden parameter, standard deviation σ,

measures spread of Normal curve

M.Stuart

Page 46: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

57 of 83

The estimated model

Illustration

Use the prediction formula to estimate the loss incurred through equipment breakdown in Period 6, Fiscal 1962, when Y was 765 and X was 180

40X3.350Y

M.Stuart

Page 47: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

58 of 83

The estimated model

Illustration

Use the prediction formula to estimate the loss incurred through equipment breakdown in Period 6, Fiscal 1962, when Y was 765 and X was 180

X = 180 implies Y = 50 + 3.3 180 40

= 644 40

= 604 to 684

Y = 765 considerably exceeds these limits and so is not consistent with predicted behaviour of the system

40X3.350Y

M.Stuart

Page 48: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

59 of 83

Exercise 3

Use the prediction formula to predict the extra manpower requirement during Christmas period, based on the experience of Period 7, Fiscal 1962,

when Y was 1,053 and X was 268

Compare with actual.

Comment.

M.Stuart

Page 49: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

60 of 83

Homework

Exercise 3 (continued)

Use the prediction formula to predict the extra manpower requirement during Christmas period, based on the experience of Period 7, Fiscal 1963,

when Y was 1,070 and X was 270.

Compare with actual.

Comment.

M.Stuart

Page 50: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

62 of 83

Choosing a prediction formula

Find values for and that minimise the deviations

Y1 − − X1,Y2 − − X2,Y3 − − X3,

Yn − − Xn

M.Stuart

Page 51: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

63 of 83

Trial regression lines, with "residuals"

M.Stuart

Page 52: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

64 of 83

The method of least squares

Find values for and that minimise the sum of the squared deviations:

(Y1 − − X1)2

+ (Y2 − − X2)2

+ (Y3 − − X3)2

+ (Yn − − Xn)2

M.Stuart

Page 53: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

65 of 83

"Least squares" regression line, with "residuals"

M.Stuart

Page 54: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

66 of 83

The method of least squares

Solution:

For these data,

2in

1

iin1

)XX(

)YY)(XX(ˆ

XˆYˆ

3.3ˆ

50ˆ

M.Stuart

Page 55: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

67 of 83

Interpretation

ˆ

ˆ

is the marginal change in Y for a unit change in X.

Check the measurement units!

is overheads.

M.Stuart

Page 56: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

68 of 83

"Least squares" regression line,with non-linear extensions

M.Stuart

Page 57: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

69 of 83

Using the fitted line; predictionPrediction equation:

Prediction equation allowing for chance variation:

Original model:

SD =

XˆˆY

ˆ2XˆˆY

XY

M.Stuart

Page 58: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

70 of 83

Estimating

measures spread of deviations from the true line;

s measures spread of deviations from the fitted line

Define

fitted values:

residuals:

N.B. no e-bar; n 2 instead of n (or n 1)

ii XˆˆY

iii YYe

2n

esˆ

2i

M.Stuart

Page 59: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

71 of 83

Minitab results

Regression Analysis: Manhours versus Volume

The regression equation is

Manhours = 50.4 + 3.35 Volume

23 cases used, 3 cases contain missing values

Predictor Coef SE Coef T P

Constant 50.44 59.46 0.85 0.406

Volume 3.3454 0.3401 9.84 0.000

S = 18.9300

M.Stuart

Page 60: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

72 of 83

Standard errors ofestimated regression coefficients

• Regression coefficient estimate subject to chance variation

• Normal model applies

• Standard deviation of the Normal model is the standard error of the coefficient estimate

M.Stuart

Page 61: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

73 of 83

Parameter Estimation

Confidence interval for marginal change

Recall confidence interval for

Confidence interval for :

)ˆ(SE2ˆ

n/2X

)ˆ(SE2ˆ

n/2X

)ˆ(SE2ˆ

M.Stuart

Page 62: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

74 of 83

Parameter estimation

Confidence interval for marginal change

Recall confidence interval for

or

Confidence interval for :

)ˆ(SE2ˆ

n/2X

)ˆ(SE2ˆ

n/2X

)ˆ(SE2ˆ

M.Stuart

Page 63: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

75 of 83

Exercise 4Calculate a 95% confidence interval for .

Calculate a 95% CI for change in manhours corresponding to a 10m. increase in pieces of mail handled.

M.Stuart

Page 64: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

76 of 83

Testing the statistical significance of the intercept

Formal test:

H0: = 0

Test statistic:

Critical value: 2

Calculated value: 0.848

Comparison: Z < 2

Conclusion: Accept H0

)ˆ(SE

ˆ

)ˆ(SE

0ˆZ

M.Stuart

Page 65: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

77 of 83

Testing the statistical significance of the intercept

Informal test:

is less than its standard error,

Equivalently, t, their ratio, (check it!) is less than 2.

46.59)ˆ(SE

4394.50ˆ

M.Stuart

Page 66: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

78 of 83

Administrative applications

Process monitoring:

– compare latest manhours with prediction, given latest volume

plot point on scatter plot with "band"

(a "regression control chart")

M.Stuart

Page 67: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

79 of 83

Administrative applicationsBudgeting

– based on historical data

or

– reflecting marketing efforts

or

– reflecting local knowledge

"variance" analysis

M.Stuart

Page 68: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

80 of 83

Administrative applications

Strategic, rather than operational, changes in procedures

e.g., productivity improvement,

monitored through regression control chart

M.Stuart

Page 69: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

81 of 83

HomeworkIn a study of a wholesaler's distribution costs, undertaken with a view to cost control, the volume of goods handled and the overall costs were recorded for one month in each of ten depots in a distribution network. The results are presented in the following table.

Depot 1 2 3 4 5 6 7 8 9 10

Volume 48 57 49 45 50 62 58 55 38 51 (£ thousands)

Costs 20 22 19 18 20 24 21 21 15 20 (£ hundreds)

M.Stuart

Page 70: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

82 of 83

Homework

The simple linear regression of costs (Y) on volume (X) was calculated, and resulted in the following numerical summary.

Dependent variable is:

No Selector

Costs

R squared = 93.1% R squared (adjusted) = 92.3%

s = 0.6676 with 10 - 2 = 8 degrees of freedom

Source

Regression

Residual

Sum of Squares

48.4344

3.56555

df

1

8

Mean Square

48.4344

0.445694

F-ratio

109

Variable

Constant

Volume

Coefficient

2.98160

0.331743

s.e. of Coeff

1.646

0.0318

t-ratio

1.81

10.4

prob

0.1077

Š 0.0001

M.Stuart

Page 71: Diploma in Statistics Introduction to Regression...Lecture 1.1 Diploma in Statistics: Introduction to Regression 2010 8 of 83 Introduction to Course References Stuart, M. Introduction

Lecture 1.1Diploma in Statistics: Introduction to

Regression 2010

83 of 83

Reading

SA, 1.5, 6.1, 6.3

EM, 6.1, 6.2.1, 6.2.2

M.Stuart


Recommended