+ All Categories
Home > Documents > Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is...

Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is...

Date post: 31-Mar-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
68
Introduction to Regression Myra O’ Regan [email protected] Room 142 Lloyd Institute 1
Transcript
Page 1: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Introduction to Regression

Myra O’ Regan

[email protected]

Room 142 Lloyd Institute

1

Page 2: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Description of module

• Practical module on regression

• Focussing on the application of multiple regression

• Software

• Lots of computer output – will use R sometimes

• 2 labs

• Some Mathematics but no linear Algebra

2

Page 3: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Topics to be covered

• Revision of Simple linear regression • Introduction to Multiple regression • Use of logs and other transformations • Regression Diagnostics • Use of Indicator Variables • Polynomial regression • Building a regression model • Dealing with multicollinearity • Introduction to Logistic regression • Other fun techniques

3

Page 4: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Notes and Books

• I use BlackBoard

• Sheather, S. J. A Modern Approach to regression with R,, New York:, Springer 2009

• Neter, J., Wasserman, W. & Kutner, M.H. Applied Linear Models , 2nd edition Boston, Irwin:1989

• Kutner. M. H., Nachtsheim, C.J., Neter, J. & Li, W. Applied Linear Statistical Models, 5th, Boston: McGraw-Hill, 2005

4

Page 5: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Purpose of regression

• To build a model for prediction purposes

– Price of diamond from number of carats

– Price of a house

– Time to process invoices

– Measuring the volume of wood in trees

• To look at relationships

– Factors relating to cot death

5

Page 6: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Netflix competition

• Variables were

• user, movie, date of grade, grade

• Grade was measured from 1 to 5

• 100,480,507 ratings

• 480,189 users

• 17,770 movies

• Movie, title and year of release

6

Page 7: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

7

Page 8: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

8

Page 9: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

308 diamnonds, price, colour, clarity and size

9

Page 10: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

10

Page 11: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

11

Page 12: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Initial examination of data

• Know the story behind the data

• Understand the background

• Understand meanings of variables

• Look at each variable separately

• Check the quality of data

• Summary statistics and graphs

• How much missing data?

12

Page 13: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Revision of simple linear regression

• Manager of a purchasing department of a large company would like to predict average amount of time it takes to process a given number of invoices. Data was collected over a sample of 30 days on the number of invoices and time taken in hours

• Three variables Time, Number of Invoices and Day

13

Page 14: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Invoices Time

N 30 30

N* 0 0

Mean 130 2.11

SE Mean 13.7 0.165

StDev 74.8 0.905

Minimum 23 0.8

Q1 60 1.425

Median 127.5 2

Q3 190.8 2.8

Maximum 289 4.1

14

Page 15: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

15

Page 16: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Model to fit

• 𝑇𝑖𝑚𝑒𝑖 = α + β ∗ 𝐼𝑛𝑣𝑜𝑖𝑐𝑒𝑠𝑖 + 𝜀𝑖

• Linear model

• Need estimates of α and β

• Need SE for estimates

• We use Minitab to calculate estimates of α and β

16

Page 17: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

17

Page 18: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

What is going on here? What are the lines? More importantly what are the differences

18

Page 19: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Prediction vs Confidence intervals

• Confidence interval

• For a given value of x0 this is an interval for the average value of the dependent variable

• Point Estimate ± t *s 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑣𝑎𝑙𝑢𝑒

• t has n-(k+1) df where k = no. of predictors

• s= 0.330 – what does this measure

• Distance value =1

𝑛+

(𝑥0−𝑥 )2

(𝑥𝑖−𝑥 )2

19

Page 20: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Prediction vs Confidence intervals

• Prediction interval

• For a given value of x0 this an interval for the particular value of the dependent variable

• Point Estimate ± t *s 1 + 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑣𝑎𝑙𝑢𝑒

• t has n-(k+1) df where k = no. of predictors

• s= 0.330 – what doe this measure

• Distance value =1

𝑛+

(𝑥0−𝑥 )2

(𝑥𝑖−𝑥 )2

20

Page 21: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Approximate intervals for reasonably large samples

• Confidence intervals=2*s*1

𝑛

• Prediction intervals = 2*s * 1 +1

𝑛

21

Page 22: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Example

• Let number of invoices = 50

• Where do these numbers come from roughly?

22

Page 23: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

ANOVA table…

• Total sums of squares(SS) =(𝑌𝑖 − 𝑌 )2

• Regression SS=(𝑌 𝑖 − 𝑌 )2

• Error SS =(𝑌𝑖 − 𝑌 𝑖)2

• What is R2?

23

Page 24: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

What happens if we do the following?

• Let Invoices=X

• Subtract k from each case

• What will change?

• 𝑇𝑖𝑚𝑒 = α + β ∗ 𝑋 + 𝜀 − 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑚𝑜𝑑𝑒𝑙 • Time=α + β*(X-k)+ε= (α- βk)+ βX+ ε

• Slope does not change but intercept does

• Intercept = expected value of Time when X=k

• Normally we use k=mean of the variable

24

Page 25: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

The regression equation is Time = 2.11 + 0.0113 Centered invoices

25

Page 26: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

26

Page 27: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Trees data

• Sample of 31 black cherry trees in the Allegheny national Forest in Pennsylvania

• Volume in cubic feet

• Height in feet

• Diameter in inches 54 inches above ground

27

Page 28: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Variable Diameter Height Volume

N 31 31 31

N* 0 0 0

Mean 13.248 76 30.17

SE mean 0.564 1.14 2.95

StDev 3.138 6.37 16.44

Minimum 8.3 63 10.2

Q1 11 72 19.1

Median 12.9 76 24.2

Q3 16 80 38.3

Maximum 20.6 87 77

28

Page 29: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

29

Page 30: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

30

Page 31: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

31

Page 32: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

32

Page 33: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

33

Page 34: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

34

Page 35: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

What does the F-test mean?

• Testing a hypothesis

• Null hypothesis H0: 𝛽1 = 𝛽2 = 0

• Alternative Hypothesis H1: Not all β’s =0

• F=254.97, df=(2,28) p<0.001

• Enough evidence against the null hypothesis

35

Page 36: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Interpretation of coefficients

• Volume = β0+ β1*Height+ β2*Diameter + ε

• E(Volume) or Predicted(Volume) or sometimes written as 𝑌

• = -58.0 +0.339*Height+4.71 *Diameter

• Constant (-58.0) is the mean response when Height=0 and Diameter=0

• β1 change in mean response per unit increase in Height when Diameter is held constant (at any value)

• Similarly β2 change in mean response per unit increase in Diameter when Height is held constant (at any value)

36

Page 37: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

And a little more

• Example let Diameter =12

• E(Volume) =-58.0 +0.339 Height + 4.71 *12

• = -1.48+0.339 Height

• Intercept changes but β1 stays the same.

• Effect on mean response of height does not depend on Diameter

• We say effects are additive or not to interact

• Partial regression coefficients

37

Page 38: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Changing coefficients

• Height by itself 1.54 (.38)

• Diameter by itself 5.07 (0.25)

Multiple regression

• Height | Diameter 0.34 (0.13)

• Diameter | Height 4.71 (0.26)

38

Page 39: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Sums of squares

• Same calculation as before

• Sequential sums of squares Diameter & Height

• Diameter 7581.8

• Height 102.4

• Sequential sums of squares Height & Diameter

• Height 2901.1

• Diameter 4783.0

39

Page 40: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Derived variables

• Create a new x from the given x-variables

• Could be a transformation or a combination

• Use background knowledge to create new variable

• Tree crudely modeled by cylinder

• 𝑐𝑦𝑙𝑖𝑛𝑑𝑒𝑟 𝑣𝑜𝑙 = 𝜋𝑟2𝑥 ℎ𝑡 =𝜋

4(𝐷𝑖𝑎𝑚)2x ht

• ∝ ℎ𝑡 ∗(𝐷𝑖𝑎𝑚)2

40

Page 41: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Plot first

41

Page 42: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

42

Page 43: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

43

Page 44: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

44

Page 45: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

45

Page 46: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Transform using logs

• y=logba; by=a;

• 23=8; log28=3;

• b is called the base

• Typical bases are e and 10

• We are going to use base 10

• e is a mathematical number =2.71

• logs to the base e are called natural logs often written as ln

46

Page 47: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Basic rules for logs using base 10

• Log(10) =1

• Log(10)a=a

• Log(1)=0

• Log(0) is not defined

• Log(xr)=rlog(x)

• 10log(a)=a

• Richter scale for measuring earthquake strength is on a log 10 scale

47

Page 48: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

And some more

• Log(ab) = log(a)+log(b)

• log𝑎

𝑏= log 𝑎 − log 𝑏

• 10ab=(10a)b; 10(a+b)=10a10b;10a-b=10𝑎

10𝑏

48

Page 49: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

What are we going to do with all this?

• Linear Model

• We can take logs of X; of Y; or of both;

• What we are interested in examining is the interpretation of the coefficients and interpret them in the original scale

• We will see later when it is appropriate

• Let us start with the model

• Y=α + β*log(x) + ε

49

Page 50: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

50

Page 51: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

51

Page 52: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Interpretation of coefficients

• A 1 unit increase in log(X) is associated with β increase in Y units

• log(X)+1 = log(X) +log(10)= log(10X) • Converting to a percentage • Multiplying X by 10 equivalent to (10-1)*100%

change = 900% increase in x

• 𝛽 expected change in Y when X is multiplied by 10

• 𝛽 expected change in Y when X increases by 900%

52

Page 53: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

And more

• For other percentage changes p

• p% increase in X = 𝛽 ∗ log (100+𝑝

100) increase in Y

• A 10% increase in X associated with

𝛽 ∗ log (100+10

100) increase in Y

• 𝛽 *log(1.1) increase in Y

• 𝛽 *0.041 increase in Y

53

Page 54: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

What does this mean?

• Volume = - 461 + 262 logheight

• An increase in 1 in logheight will increase Volume by 262

• Multiplying height by 10 will increase Volume by 262

• A 10% increase in height will increase Volume

by 𝛽 ∗ log (100+𝑝

100) =262*log(1.1)=10.84

54

Page 55: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Next situation

• Log(Y)=α+β*X+ε

• A 1 unit increase in X is associated with β increase in log Y units

• Log Y + β =10(log 𝑦 +𝛽) = 𝑌 ∗ 10𝛽

• Each 1-unit increase in X multiplies the expected value of Y by 10β

• The effect of a c-unit increase in X is to multiply the expected value of Y by 10cβ

55

Page 56: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

More

• Calculate ch= 𝑌 ∗ 10𝛽

• Calculate (ch-1)*100

• Ch=1.20 implies a 20% increase

• Ch=.7 implies a 30% decrease

56

Page 57: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

57

Page 58: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

And now ..

• logVolume = - 0.346 + 0.0233 Height

• A 1 unit increase in height increase logVolume by 0.0233

• Each unit increase of height increases Volume by a multiple of 100.0233 =1.055 or 5.5% increase

58

Page 59: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Last situation

• Log Y = α +β*log(X) +ε

• A 1 unit increase in log(X) is associated with β*log(Y) units

• p% increase in X = 𝛽 ∗ log (100+𝑝

100) increase in

log Y units

• a= 𝛽 ∗ log (100+𝑝

100)

• Log Y+a =Y*10a

59

Page 60: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

60

Page 61: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Again some interpretation

• logVolume = - 6.06 + 3.98 logheight • A 1 unit increase in logheight will increase

logVolume by 3.98 • Multiplying height by 10 multiplies Volume by

103.98 • A 10% increase in height multiplies Volume by

10(3.98*log(1.1)) = 1.46 • Can interpret this a 46% increase in Volume • 10% increase in height associated with a 46%

increase in Volume.

61

Page 62: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Interpretation

• logVolume = - 6.06 + 3.98 logheight

• Can write as

• 10logVolume=10(-6.06+3.98logheight)

• Volume =10-6.06*103.98logheight

• This is sometimes called a multiplicative model

• Using the above for prediction

• Height = 85 – remember to use log10(85)=1.929

• Using Minitab we get (1.2412, 1.9973) as PI

62

Page 63: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

In the original units

• (1.2412, 1.9973) = (101.2412 ,101.9973)=

• (17.42, 99.38)

• 85 is not in the centre

• Will return to when to use logs

63

Page 64: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

64

Interpret coefficients in original scale Calculate predicted Sun circulation for weekday circulation of 300,000 – both predicted and CI. You can just use the approximate solution

Page 65: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Interpretations

65

10% increase in weekly circulation associated with a 10(1.05*log(1.1)) = 1.105 increase in Sunday circulation equivalent

% Increase in weekly

Increase in Sunday

% increase in Sunday

10 1.105 10.5

20 1.211 21.1

30 1.317 31.7

40 1.424 42

50 1.531 53

Page 66: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Approximate Confidence Intervals

• Calculate CI’s for weekly circulation of 300,000

• Predicted Value= -0.134+1.05*log(300000)

• =5.62 on log scale

• N=89;s=0.056

• 95%CI = 5.617±2*0.056*

• =(5.605,5.629) = 402,835 to 425,473

66

Page 67: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Great chapter on derived variables

• Linoff, G. S & Berry, M. J. A. Data Mining Techniques 3rd Edition, Wiley: Indianapolis, 2011

67

Page 68: Introduction to Regression · 2016. 1. 25. · •Log(Y)=α+β*X+ε •A 1 unit increase in X is associated with β increase in log Y units •Log Y + β =10 :log +𝛽 ;= ∗10𝛽

Some summary thoughts

• Get to know the story of your data

• Use simple plots and summary statistics

• Does it look ok?

• Think about derived variables

• Start simply

• Don’t forget your common sense

68


Recommended