+ All Categories
Home > Documents > ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a...

ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a...

Date post: 28-Aug-2019
Category:
Upload: duongkhanh
View: 215 times
Download: 0 times
Share this document with a friend
19
ETC3250: Regression Semester 1, 2019 Professor Di Cook Econometrics and Business Statistics Monash University Week 2 (a) Outline Multiple regression ǡ Model Each is numerical and is called a predictor. The coef×cients measure the effect of each predictor after taking account of the effect of all other predictors in the model. Predictors may be transforms of other predictors. e.g., . The model describes a line, plane or hyperplane in the predictor space. 1 / 21
Transcript
Page 1: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

ETC3250: RegressionSemester 1, 2019 Professor Di Cook Econometrics and Business Statistics Monash University

Week 2 (a)

Outline

� Multipleregression

ǡModel

� Each is numerical and is called a predictor.

� The coef�cients measure the effect of eachpredictor after taking account of the effect of all otherpredictors in the model.

� Predictors may be transforms of other predictors. e.g., .

� The model describes a line, plane or hyperplane in thepredictor space.

1 / 21

Page 2: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression

ǡModel� The coef�cients measure the effect of eachpredictor after taking account of the effect of all otherpredictors in the model.

� Each is numerical and is called a predictor.

� Predictors may be transforms of other predictors. e.g., .

� The model describes a line, plane or hyperplane in thepredictor space.

1 / 21

Outline

� Multipleregression

ǡModel

� Predictors may be transforms of other predictors. e.g., .

� Each is numerical and is called a predictor.

� The coef�cients measure the effect of eachpredictor after taking account of the effect of all otherpredictors in the model.

� The model describes a line, plane or hyperplane in thepredictor space.

1 / 21

Page 3: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression

ǡModel

� The model describes a line, plane or hyperplane in thepredictor space.

� Each is numerical and is called a predictor.

� The coef�cients measure the effect of eachpredictor after taking account of the effect of all otherpredictors in the model.

� Predictors may be transforms of other predictors. e.g., .

1 / 21

Outline

� Multipleregression

ǡModel

(Chapter3/3.1.pdf)

2 / 21

Page 4: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression

ǡModel

(Chapter3/3.5.pdf)

3 / 21

Outline

� Multipleregression

ǡModelǡCategoricalvariables

Qualitative variables need to be converted to numeric.

which would result in the model

These are called dummy variables.

4 / 21

Page 5: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression

ǡModelǡCategoricalvariables

More than two categories

which would result in the model

These are called dummy variables.

5 / 21

Outline

� Multipleregression

ǡModelǡCategoricalvariablesǡOLS

Ordinary least squares is the simplest way to �t the model.Geometrically, this is the sum of the squared distances,parallel to the axis of the dependent variable, betweeneach observed data point and the corresponding point onthe regression surface – the smaller the sum ofdifferences, the better the model �ts the data.

6 / 21

Page 6: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression

ǡModelǡCategoricalvariablesǡOLSǡDiagnostics

is the proportion of variation explained by the model,and measures the goodness of the �t, close to 1 the modelexplains most of the variability in , close to 0 it explainsvery little.

where (read: Residual Sum of Squares), and (read: Total Sum of Squares).

7 / 21

Outline

� Multipleregression

ǡModelǡCategoricalvariablesǡOLSǡDiagnostics

Residual Standard Error (RSE) is an estimate of thestandard deviation of . This is meaningful with theassumption that .

8 / 21

Page 7: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression

ǡModelǡCategoricalvariablesǡOLSǡDiagnostics

F statistic tests whether any predictor explains response,by testing

vs at least one is not 0

9 / 21

Outline

� Multipleregression

ǡModelǡCategoricalvariablesǡOLSǡDiagnosticsǡThink about

� Is at least one of the predictors useful in predicting theresponse?� Do all the predictors help to explain , or is only asubset of the predictors useful?� How well does the model �t the data?� Given a set of predictor values, what response valueshould we predict and how accurate is our prediction?

10 / 21

Page 8: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example

Wage and other data for a group of 3000 male workers inthe Mid-Atlantic region. Interested in predicting wagebased on worker characteristics.

## Observations: 3,000## Variables: 11## $ year <int> 2006, 2004, 2003, 2003, 2005, 2008, 2009, 2008, 2## $ age <int> 18, 24, 45, 43, 50, 54, 44, 30, 41, 52, 45, 34, 3## $ maritl <fct> 1. Never Married, 1. Never Married, 2. Married, 2## $ race <fct> 1. White, 1. White, 1. White, 3. Asian, 1. White## $ education <fct> 1. < HS Grad, 4. College Grad, 3. Some College, 4## $ region <fct> 2. Middle Atlantic, 2. Middle Atlantic, 2. Middle## $ jobclass <fct> 1. Industrial, 2. Information, 1. Industrial, 2. ## $ health <fct> 1. <=Good, 2. >=Very Good, 1. <=Good, 2. >=Very G## $ health_ins <fct> 2. No, 2. No, 1. Yes, 1. Yes, 1. Yes, 1. Yes, 1. ## $ logwage <dbl> 4.318063, 4.255273, 4.875061, 5.041393, 4.318063## $ wage <dbl> 75.04315, 70.47602, 130.98218, 154.68529, 75.0431

11 / 21

Outline

� Multipleregression� Example

ǡTake a look

12 / 21

Page 9: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example

ǡTake a lookǡTransform

13 / 21

Outline

� Multipleregression� Example

ǡTake a lookǡTransformǡModel

Proposed model

where log Wage, Year information collected, , Education.

14 / 21

Page 10: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example

ǡTake a lookǡTransformǡModel

lm(formula = logwage ~ year + age + education, data = Wage)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.745e+01 5.469e+00 -3.191 0.00143 year 1.078e-02 2.727e-03 3.952 7.93e-05 age 5.509e-03 4.813e-04 11.447 < 2e-16 education2. HS Grad 1.202e-01 2.086e-02 5.762 9.18e-09 education3. Some College 2.440e-01 2.195e-02 11.115 < 2e-16 education4. College Grad 3.680e-01 2.178e-02 16.894 < 2e-16 education5. Advanced Degree 5.411e-01 2.362e-02 22.909 < 2e-16

Residual standard error: 0.3023 on 2993 degrees of freedomMultiple R-squared: 0.2631, Adjusted R-squared: 0.2616 F-statistic: 178.1 on 6 and 2993 DF, p-value: < 2.2e-16

15 / 21

Outline

� Multipleregression� Example� Modelling

ǡ Interpretation

� The ideal scenario is when the predictors areuncorrelated.

ǡEach coef�cient can be interpreted and testedseparately.

� Correlations amongst predictors cause problems.ǡThe variance of all coef�cients tends to increase,sometimes dramatically.ǡ Interpretations become hazardous -- when changes, everything else changes.ǡPredictions still work provided new values arewithin the range of training values.

� Claims of causality should be avoided for observationaldata.

16 / 21

Page 11: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling

ǡ Interpretationǡ Interactions

� An interaction occurs when the one variable changesthe effect of a second variable. (e.g., spending on radioadvertising increases the effectiveness of TV advertising).� To model an interaction, include the product in themodel in addition to and .� Hierarchy principleHierarchy principle: If we include an interaction in amodel, we should also include the main effects, even if thep-values associated with their coef�cients are notsigni�cant. (This is because the interactions are almostimpossible to interpret without the main effects.)

17 / 21

Outline

� Multipleregression� Example� Modelling

ǡ Interpretationǡ Interactions

(Chapter3/3.7.pdf)

18 / 21

Page 12: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling

ǡ Interpretationǡ InteractionsǡResiduals

� If a plot of the residuals vs any predictor in the modelshows a pattern, then the relationship is nonlinear.

� If a plot of the residuals vs any predictor notnot in themodel shows a pattern, then the predictor should beadded to the model.

� If a plot of the residuals vs �tted values shows apattern, then there is heteroscedasticity in the errors.(Could try a transformation.)

18 / 21

Outline

� Multipleregression� Example� Modelling

ǡ Interpretationǡ InteractionsǡResiduals

� If a plot of the residuals vs any predictor in the modelshows a pattern, then the relationship is nonlinear.

� If a plot of the residuals vs any predictor notnot in themodel shows a pattern, then the predictor should beadded to the model.

� If a plot of the residuals vs �tted values shows apattern, then there is heteroscedasticity in the errors.(Could try a transformation.)

18 / 21

Page 13: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling

ǡ Interpretationǡ InteractionsǡResiduals

� If a plot of the residuals vs any predictor in the modelshows a pattern, then the relationship is nonlinear.

� If a plot of the residuals vs any predictor notnot in themodel shows a pattern, then the predictor should beadded to the model.

� If a plot of the residuals vs �tted values shows apattern, then there is heteroscedasticity in the errors.(Could try a transformation.)

18 / 21

Outline

� Multipleregression� Example� Modelling

ǡ Interpretationǡ InteractionsǡResiduals

(Chapter3/3.9.pdf)

19 / 21

Page 14: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModel

Let , , and

Then

20 / 21

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimation

Least squares estimationLeast squares estimation

Minimize:

Differentiate wrt and equal to zero gives

(The "normal equation".)

Note:Note: If you fall for the dummy variable trap, is asingular matrix.

20 / 21

Page 15: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimation

Least squares estimationLeast squares estimation

Minimize:

Differentiate wrt and equal to zero gives

(The "normal equation".)

Note:Note: If you fall for the dummy variable trap, is asingular matrix.

20 / 21

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimation

Least squares estimationLeast squares estimation

Minimize:

Differentiate wrt and equal to zero gives

(The "normal equation".)

Note:Note: If you fall for the dummy variable trap, is asingular matrix.

20 / 21

Page 16: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimationǡ Likelihood

If the errors are iid and normally distributed, then

So the likelihood is

which is maximized when is minimized.

So MLE OLS.

20 / 21

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimationǡ Likelihood

If the errors are iid and normally distributed, then

So the likelihood is

which is maximized when is minimized.

So MLE OLS.

20 / 21

Page 17: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimationǡ LikelihoodǡPredictions

Optimal predictionsOptimal predictions

where is a row vector containing the values of theregressors for the predictions (in the same format as ).

Prediction variancePrediction variance

� This ignores any errors in .� 95% prediction intervals assuming normal errors:

.

20 / 21

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimationǡ LikelihoodǡPredictions

Optimal predictionsOptimal predictions

where is a row vector containing the values of theregressors for the predictions (in the same format as ).

Prediction variancePrediction variance

� This ignores any errors in .� 95% prediction intervals assuming normal errors:

.

20 / 21

Page 18: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Outline

� Multipleregression� Example� Modelling� Matrixformulation

ǡModelǡEstimationǡ LikelihoodǡPredictions

Optimal predictionsOptimal predictions

where is a row vector containing the values of theregressors for the predictions (in the same format as ).

Prediction variancePrediction variance

� This ignores any errors in .� 95% prediction intervals assuming normal errors:

.

20 / 21

  Made by a human with a computerSlides at https://monba.dicook.org.

Code and data athttps://github.com/dicook/Business_Analytics.

Created using R Markdown with �air by xaringanxaringan, andkunoichikunoichi (female ninja) style.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

21 / 21

Page 19: ETC3250: Regression - monba.dicook.org · Note: If you fall for the dummy variable trap, is a singular matrix. 20 / 21. Outline Multiple regression Example Modelling Matrix formulation

Recommended