Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe.

Post on 13-Jan-2016

216 views 0 download

Tags:

transcript

Multiple Regression & OLS violations

Week 4 Lecture

MG461

Dr. Meredith Rolfe

Which group are you in?Which group are you in?

1 2 3 4 5 6 7 8

10%

11%

21%

8%

13%

8%

16%

11%

1. Group 12. Group 23. Group 34. Group 45. Group 56. Group 67. Group 78. Group 8

Key Goals of the Week

• What is multiple regression?• How to interpret regression results:

• estimated regression coefficients• significance tests for coefficients

• Violations of OLS assumptions• Diagnostics• What to do

MG461, Week 3 Seminar 3

MULTIPLE REGRESSION

When to use Regression

• We want to know whether the outcome, y, varies depending on x

• Continuous variables (but many exceptions)• Observational data (mostly)• The relationship between x and y is linear

MG461, Week 3 Seminar 5

Simple Linear Model

MG461, Week 3 Seminar 6

Regression is a set of statistical tools to model the conditional expectation…

1 2

76%

24%

1. of one variable on another variable.

2. of one variable on one or more other variables.

Multiple Regression

Compensation

PerformanceSize of

Company Years worked

Ratings of Supervisor

Opportunity to learn

Critical of poor

performance

Handles complaints

Which best accounts for variation in supervisor ratings?

1 2 3 4

5%

21%

47%

28%

1. Does not allow special privileges.

2. Opportunity to learn.

3. Too critical of poor performance.

4. Handles employee complaints.

Simple linear model: Rating vs. No Special Privileges

Estimate (s.e.)

(Constant) 42.11***(9.27)

No special privileges

0.42*(0.17)

n=R2=

300.15

Note on significance of coefficients:***p < 0.001 **p < 0.01 *p < 0.05 . p < 0.1

Source: Chatterjee et al, Regression Analysis by Example

SPSS output -> Regression Table

Estimate (s.e.)

(Constant) 42.11***(9.27)

No special privileges

0.42*(0.17)

n=R2=

300.15

βhat0

βhat1

se(βhat0)

se(βhat1)

ignoret(βhat0-0)t(βhat1-0)x variable

42% of employees value supervisors who don’t grant special privileges?

1. Yes2. No

32%68%

Estimate (s.e.)

(Constant) 42.11***(9.27)

No special privileges

0.42*(0.17)

n=R2=

300.15

Simple linear model #2:Rating vs. Opportunity to Learn

Estimate (s.e.)

(Constant) 28.17***(8.81)

Opportunity to learn

0.65*(0.15)

n=R2=

300.37

Note on significance of coefficients:***p < 0.001 **p < 0.01 *p < 0.05 . p < 0.1

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6

(Constant) 42.11***(9.27)

28.17***(8.81)

14.38*(6.62)

19.98(11.69)

50.24**(17.31)

56.76***(9.74)

No special privileges 0.42*(0.17)

Opportunity to learn 0.65*(0.15)

Handles complaints 0.75***(0.15)

Raises based on performance

0.69***(0.18)

Too critical of poor performance

0.19(0.23)

Rate of advancing to better jobs

0.18(0.22)

n=R2=

300.15

300.37

300.68

300.35

300.02

300.02

Are these good estimates of the relationship between x and y?

1 2

44%

56%1. Yes2. No

Multiple potential explanations…

• Experimental Controls:• Random

assignment• Experimental

Design• Observational

data analysis:• Statistical

Controls

Ratings of Supervisor

No special privileges

Opportunity to learn

Critical of poor

performance

Handles complaints

Multiple Regression Model

MG461, Week 3 Seminar 17

DependentVariable

IndependentVariables

Intercept

Coefficients

Error

Observation or data point, i, goes from 1…n

WHICH MODEL PARAMETER DO WE NOT NEED TO ESTIMATE?

1 2 3 4

5%

20%

5%

70%

1. Β0

2. x1,i

3. βp

4. σ2

Multiple RegressionOLS Estimates (matrix)

Y = Xβ +ε

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 ALL

(Constant) 42.11***(9.27)

28.17***(8.81)

14.38*(6.62)

19.98(11.69)

50.24**(17.31)

56.76***(9.74)

10.79(11.59)

No special privileges 0.42*(0.17)

-0.07(0.14)

Opportunity to learn 0.65*(0.15)

0.32(0.16)

Handles complaints 0.75***(0.15)

0.61***(0.16)

Raises based on performance

0.69***(0.18)

0.082(0.22)

Too critical of poor performance

0.19(0.23)

0.038(0.14)

Rate of advancing to better jobs

0.18(0.22)

-0.21(0.17)

n=R2=

300.15

300.37

300.68

300.35

300.02

300.02

300.73

Significance of Results

Model Significance• H0: None of the 1 (or more)

independent variables covary with the dependent variable

• HA: At least one of the independent variables covaries with d.v.

• Application: compare two fitted models

• Test: Anova/F-Test • **assumes errors (ei) are

normally distributed

Coefficient Significance• H0: ß1=0, there is no

relationship (covariation) between x and y

• HA: ß1≠0, there is a relationship (covariation) between x and y

• Application: a single estimated coefficient

• Test: t-test**assumes errors (ei) are

normally distributed

MG461, Week 3 Seminar 21

Comparing Models: AnovaComplaints

onlyComplaints

& LearnALL

(Constant) 14.38*(6.62)

9.87(7.06)

10.79(11.59)

No special privileges -0.07(0.14)

Opportunity to learn 0.21(0.13)

0.32(0.16)

Handles complaints 0.75***(0.15)

0.64***(0.12)

0.61***(0.16)

Raises based on performance

0.082(0.22)

Too critical of poor performance

0.038(0.14)

Rate of advancing to better jobs

-0.21(0.17)

n=R2=

300.68

300.71

300.73

Anova Model Comparison

All Variables (Full) vs.Complaints & Learn:F=0.53 p=0.72

Complaints & Learn vs. Complaints:F=2.47 p=0.13

SPEED PRACTICE: INTERPRETING REGRESSION RESULTS

1) p-values & significance2) Coefficients significant from tables2) substantive interpretation of coefficients

Does “Critical” have an effect on supervisor ratings?

33%67%

Coefficient s.e. t p-value (sig)

(Constant) 10.79 11.59 0.93 0.36

No special privileges -0.07 0.14 -0.54 0.60

Opportunity to learn 0.32 0.16 3.81 0.07

Handles complaints 0.61 0.16 1.90 0.009

Raises based on performance 0.082 0.22 0.26 0.80

Too critical of poor performance 0.038 0.14 0.37 0.72

Rate of advancing to better jobs -0.21 0.17 -1.22 0.24

R2

n0.73336

1. Yes2. No

0%0%

Coefficient s.e. t p-value (sig)

(Intercept) -149.6 117.9e+02 -1.27 0.21

Average Income 5.077e-06 1.640e-03 0.003 0.998

% Metropolitan -5.062e-03 3.129e-01 -0.016 0.987

Average Taxes -3.974e-02 1.505e-02 -2.64 0.012

Average Education 2.73 1.22 2.25 0.030

Temperature 0.76 0.90 0.84 0.41

R2

n0.2848

Does Income have an effect on Immigration Rate?

50%50%

1. Yes2. No

0%0%

Does having a HS Degree affect salary?

Coefficient s.e. t p-value (sig)

Intercept 11031.81 383.22 28.79 0.000

Years Experience 546.18 30.52 17.90 0.000

HS Degree -2996.21 411.75 -7.28 0.000

B.S. Degree 147.82 387.66 0.38 0.705

Management (1=Yes) 6883.53 313.9 21.90 0.000

R2

n0.95746

1. Yes2. No

0%0%

Countdown

10

Coefficient s.e. t p-value (sig)(Intercept) 5.32 0.10 50.86 0.000

Runs 0.0045 0.004 1.00 0.32

Hits 0.012 0.002 5.14 0.00

Home Runs 0.039 0.008 4.81 0.00

Strike Outs -0.008 0.002 -3.63 0.0003

R2

n0.49337

Do strike outs affect salary?

95%5%

1. Yes2. No

0%0%

Coefficient s.e. t p-value (sig)(Intercept) 103.3 245.6 0.42 0.67

Average age 4.52 3.22 1.40 0.17

% with HS Degree -0.062 0.81 -0.076 0.94

Average Income 0.019 0.010 1.86 0.070

% Black 0.36 0.48 0.73 0.47

% Female -1.05 5.56 -0.19 0.85

Avg. Price of Cigarettes -3.25 1.03 -3.16 0.0029

R2

n0.3250

Does %Female affect Cigarette Sales?

11%89%

1. Yes2. No

0%0%

PRACTICE 2:SIGNIFICANT COEFFICIENTS IN TABLES

Does Total Employment affect CEO Compensation?

1. Yes2. No

86%14%

Does Restructuring Affect Firm ROA?

1. Yes2. No

14%86%

Does firm sales growth affect the length of CEO tenure?

1. Yes2. No

75%25%

Does Total Employment affect CEO Compensation?

1. Yes2. No

82%18%

Are employees more aggressive when their job is stressful?

1. Yes2. No

44%56%

Does employee turnover affect Firm Productivity?

1. Yes2. No

91%9%

PRACTICE 3:INTERPRETING COEFFICIENTS

High values of 1983 centralization product a(n) ….. in current centralization

1. Increase2. Decrease

2%98%

Corporations are more likely to enter petitions when their market share is…

1. High2. Low

81%19%

Starting compensation is a good predictor of current compensation?

1. True2. False

68%32%

Managers at larger firms get paid more?

1. True2. False

18%82%

More centralized companies invest more in Research?

1. True2. False

60%40%

Participant Scores15 Participant 313C7D15 Participant 313C9915 Participant 254CFE15 Participant 313C4115 Participant 313CB2

Fastest Responders (in seconds)

Team Scores14.24 Group 213.23 Group 413.15 Group 712.48 Group 812.13 Group 111.72 Group 311.7 Group 511.17 Group 6

Team MVPPoints Team Participant15 Group 2 313C7D 15 Group 4 313C99 15 Group 7 313CB2 14 Group 8 313D44 15 Group 1 313C41 14 Group 3 313C84 14 Group 5 2D180F 14 Group 6 254D62

OLS VIOLATIONS & OTHER ISSUES

Assumptions of OLS Regression

• .• correctly specified model• linear relationship Errors are normally distributed

• Errors have mean of 0: E(εi)=0

• Homoscedastic: Var(εi)=σ2

• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearityMG461, Week 3 Seminar 47

When is a model linear?

• Linear in the parameters

• Transformations of x and/or y variables can turn a relationship that isn’t linear initially into one that is linear in the parameters

Example: The Challenger disaster

Example: Challenger

Shuttle disaster

30°

What the m

anagers didn’t see…

Diagnosis of Non-linearity and/or Errors not normally distributed

• Theoretical expectations• Scatterplots of y against x variables prior to

estimating model• Scatterplot of yi-hat against ei-hat (predicted y-

values against predicted residuals)• Normal Probability Plot

Example: Number of Supervisors & Number of Employees

Re-estimated, including x2

Solutions to Non-linearity

• Better Model of Structure (transformations)• Exponential (squared, cubed)• Logs or natural logs (heteroscedasticity)• Proportional scaling (divide by x or y)

• If outliers cause the problem, omit them or use robust regression

Assumptions of OLS Regression

• .• correctly specified model• linear relationship

• Errors have mean of 0: E(εi)=0

• Homoscedastic: Var(εi)=σ2

• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearity

MG461, Week 3 Seminar 56

Diagnosis of Heteroscedasticity (like non-linearity)

• Theoretical expectations• Scatterplots of y against x variables prior to estimating

model• Scatterplot of yi-hat against ei-hat (predicted y-values

against predicted residuals)• Scatterplot of xi against ei-hat (observed x-values

against predicted residuals)• Normal Probability Plot• Statistical Tests (Breusch Pagan, White, Goldfeld Quant)

OLS estim

ates of Regression Line

MG461, Week 3 Seminar 58

Salary = -34 + 27.47*Runs

Distribution of D.V.

(Salary)

Norm

al Probability Plot of Salary

Baseball Salary and Performance:Residuals vs. Fitted Values

Transformed D

ependent Variable

log(Salary) = 5.3 + 0.026*Runs

Residual Plot of m

odel with Log (Salary)

Norm

al Probability Plot of Residuals

Another Example: SalaryCoefficient s.e. t p-value (sig)

Intercept 11031.81 383.22 28.79 0.000

Years Experience 546.18 30.52 17.90 0.000

HS Degree -2996.21 411.75 -7.28 0.000

B.S. Degree 147.82 387.66 0.38 0.705

Management (1=Yes) 6883.53 313.9 21.90 0.000

R2

n0.95746

Plot of Residuals vs. Education (I.V.)

Plot of Residuals vs. Education

× Manager

Solution: Include Interaction TermCoefficient s.e. t p-value (sig)

Intercept 11023.50 79.07 141.7 0.000

Years Experience 496.98 5.57 89.3 0.000

HS Degree -1730.69 105.33 -16.4 0.000

B.S. Degree -349.03 97.57 -3.6 0.0009

Management (1=Yes) 7047.32 102.60 68.7 0.000

HS + Management -3066.04 149.33 -20.5 0.000

BS + Management 1836.49 131.17 14.0 0.000

R2

n0.99946

Results from

Salary Model

Solutions for Heteroscedasticity:

• Better Model of Structure:• Interaction terms• Transformation

• Robust Standard Errors• Weighted GLM• ARCH models (in time series)

Assumptions of OLS Regression

• .• correctly specified model• linear relationship

• Errors have mean of 0: E(εi)=0

• Homoscedastic: Var(εi)=σ2

• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearity

MG461, Week 3 Seminar 71

Violation 2: Errors not Independent

• Across time• Across cases (diffusion, network models)• Time series data, panel data, cluster samples,

hierarchical data, repeated measures data, longitudinal data, and other data with dependencies

Example: Consum

er Spending vs. M

oney

Diagnosis & Solutions:

Diagnosis• Type of Data• Durbin-Watson Statistic• Residual Plots

Solution• Incorporate dependencies

into estimates• Difference Variables

(Cochrane-Orcutt)• Variables for Seasonality• Various Time Series Models• Various network/spatial

dependence models• Structural Models (SUR, SEM)

• GLS (generalized least squares)

Assumptions of OLS Regression

• .• correctly specified model• linear relationship

• Errors have mean of 0: E(εi)=0

• Uncorrelated Errors: Cov(εi,εi)=0

• Homoscedastic: Var(εi)=σ2

• No multicollinearity

MG461, Week 3 Seminar 75

Problem: Multicollinearity

Diagnosis• High Correlation between

two or more IVs • Standard errors “blow up”• Large changes in

coefficients between estimated models

• Statistical tests (VIF)

Solutions• Are the two x’s measuring

the same thing: create an index or use PCA

• Get more data!• Centering of x variables• Instrumental variables