+ All Categories
Home > Documents > Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1...

Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1...

Date post: 10-May-2018
Category:
Upload: lamtram
View: 216 times
Download: 1 times
Share this document with a friend
43
7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11, Chapter 4
Transcript
Page 1: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-1

Lecture 7

Remedial Measures

STAT 512

Spring 2011

Background Reading

KNNL: 3.8-3.11, Chapter 4

Page 2: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-2

Topic Overview

Review Assumptions & Diagnostics

Remedial Measures for

Non-normality

Non-constant variance

Non-linearity

Other Miscellaneous Topics (Chapter 4)

Page 3: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-3

Regression Assumptions

X and Y are related linearly (scatter plot,

residuals vs. X)

Assumptions on the Errors...

Constancy of Variance (residuals vs. X)

Normality (normal probability plot)

Independent (sequence plot)

Page 4: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-4

Remedial Measures

Two basic choices when assumptions are violated:

Use some more appropriate model (often

more complicated)

Find a transformation of the data for which

the regression model is appropriate

Page 5: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-5

Non-linear Relationships

Can potentially still use a “linear” model. For

example,

20 1 2

0 1 ln

Y X X

Y X

This model is still “linear” in terms of the

regression coefficients (parameters). Simply

consider a new predictor variable 2 or lnX X ,

and just treat this like any usual predictor.

Page 6: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-6

Non-linear Relationships

Can use nonlinear regression models (beyond the

scope of this course, but discussed in Chapter 13).

For now, we will try to guess at a good

transformation and see if it works.

Page 7: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-7

Variance Not Constant

Might be able to model the change in variance (if

it is related to X). In this case, can use a weighted

analysis (Chapter 11.1)

Sometimes a variance-stabilizing transformation

can be found (log, square-root are common)

Box-Cox procedure can help to find a

transformation

Note: In this class, we use natural logs,

unless specified otherwise

Page 8: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-8

Errors Not Normal

Knowledge of error distribution known? If so,

can use SAS GENMOD procedure (Chapter 14)

Binomial (Yes/No or Categorical Resp.)

Poisson (Response is a Count)

Knowledge of error distribution unknown?

Sometimes a transformation will help

Often, non-normality/non-constant

variance occur together and

transformations can sometimes help both!

Page 9: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-9

Other Remedies

Correlated errors (not independent)

Use a model for correlated error structure

(Chapter 12)

Omission of Important Predictors

Multiple Regression (starts in Chapter 6)

Page 10: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-10

Other Remedies

Outliers

Determine whether to keep in analysis

(e.g., was there a recording error? Be

very cautious of deleting observations!)

Determine influence on parameter

estimates and standard errors

Perform more robust estimation

procedure that puts less emphasis on

outliers (Chapter 11.3)

Page 11: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-11

Transformations

Finding a good transformation gets easier

with practice

Generally the method is to make an

educated guess at a useful transformation

and then try it to see if it works by

rechecking diagnostic plots

Transformations have a tendency to stabilize

variance and normality.

Page 12: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-12

Transformations (X)

(For nonlinear relationship issues)

Log or Square-root

Square or Exp(x)

Reciprocal or Exp(-x)

Page 13: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-13

Transformations (Y)

(For non-constant variance issues)

See page 132

Standard transformations if increasing

variance: square-root or reciprocal

If decreasing variance: log

Simultaneous transformations on X may

also be useful

Page 14: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-14

Box-Cox Procedure

Automated procedure to determine a “best”

power transformation for the response

Chooses from different , Y

1 (No transformation)

0.5 (Square Root)

0 (Natural Log)

0.5 (Reciprocal Square Root)

1 (Reciprocal)

Use TRANSREG procedure in SAS

Page 15: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-15

Example (1) – boxcox.sas

X - Age

Y – Plasma Level

25 Healthy children

data orig; input age plasma @@;

datalines;

0 13.44 0 12.84 0 11.91 0 20.09 0 15.60

1 10.11 1 11.38 1 10.28 1 8.96 1 8.59

2 9.83 2 9.00 2 8.65 2 7.85 2 8.88

3 7.94 3 6.01 3 5.14 3 6.90 3 6.77

4 4.86 4 5.10 4 5.67 4 5.75 4 6.23

; proc print data=orig; run;

Page 16: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-16

Example (2)

First, let’s look at the scatterplot to see the

relationship

goptions ftitle=centb ftext=swissb htitle=3

htext=1.5 ctitle=blue ctext=black;

title1 'Original Variables';

symbol1 v=dot c=blue ;

axis1 label=('Age (Years)');

axis2 label=(angle=90 'Plasma Level');

proc gplot data=orig;

plot plasma*age / haxis=axis1 vaxis=axis2;

run;

Note, method for obtaining titles, axis labels.

Page 17: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-17

Example (3)

Page 18: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-18

Example (4)

Run SLR model and check diagnostic plots

proc reg data=orig;

model plasma=age;

output out = notrans r = resid;

run;

axis1 label=('Age (Years)');

axis2 label=(angle=90 'Residual');

proc gplot data = notrans;

plot resid*age / vref = 0 haxis=axis1 vaxis=axis2;

run;

proc univariate data=notrans;

var resid;

qqplot/normal (L=1 mu = est sigma = est);

run;

Note: Reference line in residual plot, 45-degree

line in normal probability plot

Page 19: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-19

Example (5) Root MSE 1.84135 R-Square 0.7532

Page 20: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-20

Example (6)

Page 21: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-21

Example (7)

Residuals do not appear to have constant

variance

Relationship not quite linear

Use Box-Cox procedure to suggest a possible

transformation of the Y variable

Page 22: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-22

Example (8) proc transreg data = orig;

model boxcox(plasma)=identity(age);

run;

The TRANSREG Procedure

Box-Cox Transformation Information for plasma

Lambda R-Square Log Like

-----

-2.00 0.80 -12.3665

-1.75 0.82 -10.1608

-1.50 0.83 -8.1127

-1.25 0.85 -6.3056

-1.00 0.86 -4.8523 *

-0.75 0.86 -3.8891 *

-0.50 0.87 -3.5523 <

-0.25 0.86 -3.9399 *

0.00 + 0.85 -5.0754 *

0.25 0.84 -6.8988

0.50 0.82 -9.2925

-----

< - Best Lambda

* - 95% Confidence Interval

+ - Convenient Lambda

Page 23: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-23

Example (9)

“+” indicates the most convenient ; “<”

indicates the best as determined by the

log-likelihood function.

Try ln ( )Y and 1

Y

data trans; set orig;

logplasma = log(plasma);*In SAS log=ln, log10=log base 10;

rsplasma = plasma**(-0.5);

proc print data = trans; run;

Page 24: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-24

Example (10)

Re-run regression, diagnostic plots with

transformed variables title1 'Natural Log Transformation';

proc reg data = trans;

model logplasma = age;

output out = logtrans r = logresid;

run;

axis1 label=('Age (Years)');

axis2 label=(angle=90 'ln(Plasma)');

proc gplot data = logtrans;

plot logplasma * age/ haxis=axis1 vaxis=axis2;

run;

axis1 label=('Age (Years)');

axis2 label=(angle=90 'Residual');

proc gplot data = logtrans;

plot logresid * age / vref = 0 haxis=axis1 vaxis=axis2;

run;

proc univariate data=logtrans;

var logresid;

qqplot/ normal (L=1 mu = est sigma = est);

run;

Page 25: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-25

Example (11)

Page 26: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-26

Example (12) Root MSE 0.14385 R-Square 0.8535

Page 27: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-27

Example (13)

Page 28: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-28

Example (14)

Page 29: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-29

Example (15) Root MSE 0.02319 R-Square 0.8665

Page 30: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-30

Example (16)

Page 31: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-31

Example (17)

Both transformations ln ( )Y and 1

Y :

Led to a reasonably linear regression

relation

R-square improvement

Improved non-constant variance problem

Normality assumption supported in all cases

Page 32: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-32

Summary of Remedial Measures

Nonlinear Relationships – Sometimes a transformation

on X will fix this.

Nonconstant Variance – If we can model the way in

which the error changes, we can use weighted

regression. Sometimes a transformation on Y will work

instead.

Nonnormal Errors – Could use a procedure that allows

different distributions for the error term. Often, a

transformation on Y will help.

Page 33: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-33

Summary of Remedial Measures

Often, a transformation on Y may help with more

than one issue (e.g., normality and non-constant

variance).

Box-Cox Transformations – Suggests some possibly

Y transformations to try.

Sometimes a transformation on X and Y may help.

Remember - Assumptions still need to be satisfied

(on the transformed scale) if we are to use linear

regression model. So, we must always recheck

diagnostic plots after transforming any variable.

Page 34: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-34

Chapter 4

Covers some miscellaneous but important

topics

Joint (family) confidence levels (4.1-4.3)

Regression through the origin (4.4)

Measurement Errors (4.5, optional

reading)

Inverse predictions – when Y becomes X

(4.6)

Page 35: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-35

Summary of Inference - Reminder

100 1 %

Confidence Intervals

1 1critb t s b

0 0critb t s b

Where (1 ; 2)2critt t n

.

Page 36: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-36

Family Confidence Levels

Separate confidence intervals for 0 and 1

Now: Joint estimation of 0 and 1

If k 95% CI’s are independent then their

family confidence coefficient is given by

0.95 k . Note, here k=2 and 0.95 k=0.9025.

Usually not independent, so family

confidence coefficient will be somewhat

larger than 0.95 k , but certainly less than

0.95.

Page 37: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-37

Bonferroni Adjustment We want the probability that both intervals are correct

to be 0.95

Basic idea is that we have an error budget (α =.05), so

spend half on β0 and half on β1

We use α =.025 for each CI (97.5% CI), leading to

0 0

1 1

c

c

b t s b

b t s b

where 0.025

1 , 22

ct t n

.

We start with a 5% error budget and we have two

intervals so we give 2.5% to each

Each interval has two ends (tails) so we again divide

by 2

Page 38: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-38

Bonferroni Adjustment (Summary)

Want to control family confidence level at

95%, then need to make adjustments

Instead of , use /k .

This is often more conservative than

necessary, but will work in all cases and for

any number of CIs.

We can use this method for simultaneous

estimation of mean responses and

predictions of new observations too.

Page 39: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-39

Mean Response CIs We already talked about simultaneous

estimation for all Xh with a confidence band:

use Working-Hotelling

ˆ ˆh hY Ws Y where 2 2 1 ;2, 2W F n

For simultaneous estimation for a few Xh, say k

different values, we may use Bonferroni

instead.

ˆ ˆh hY Bs Y where 1 / 2 , 2B t k n

Similar for simultaneous prediction intervals.

Page 40: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-40

Regression Through Origin

Yi = β1Xi + εi

Should be very cautious using something

like this. Forcing regression line through

(0,0) can introduce bias, especially if X=0

isn’t in the scope of the model.

Problems with r2 and other statistics

Generally safer not to use this method; if

when X=0 it is true that Y=0, then

probably 0 will not test as significantly

different from zero anyway.

Page 41: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-41

Inverse Predictions

From equation 0 1Y b b X instead of estimating

Y based on X, want to estimate X based on Y

Sometimes called calibration

Example: A regression analysis was performed on

the amount of decrease in cholesterol level (Y)

achieved with a given dose (X) of a new drug

based on observations of 50 patients. A

physician needs to know the dose to give if a new

patient’s cholesterol needs to be decreased by a

certain amount ( ( )h newY ).

Page 42: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-42

Inverse Predictions

Natural point estimate is ( ) 0( )

1

ˆ h newh new

Y bX

b

Approximate confidence limits are obtained

using the standard error:

2

( )

21

ˆ11

h new

X

X XMSESE predX

b n SS

Page 43: Lecture 7 Remedial Measures - Purdue Universityghobbs/STAT_512/Lecture_Notes/Regression/... · 7-1 Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11,

7-43

Upcoming in Lecture 8...

Review of Matrix Algebra in the context of

simple linear regression (Chapter 5)


Recommended