Post on 01-Sep-2018
transcript
Bradford S. Jones, UC-Davis, Dept. of Political Science
Statistical Inference for the Linear RegressionModel
Brad Jones1
1Department of Political ScienceUniversity of California, Davis
January 22, 2010
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Today: Variance Components and Inference
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Variance Components from a Regression Model
I Useful to think about the “standard error” of the regression.
I Quantity minimized:∑
r2i
I Suppose we compute the variance of the residuals:
var(r) =
∑r2i
n − k − 1
I Why n− k − 1? [These are the consumed degrees of freedom.Note again what must happen as k → n.]
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Variance Components from a Regression Model
I Since the variance gives us the average squared deviationbetween the observed Y and Y , we take the square root:
s.e(r) =
√ ∑r2i
n − k − 1
I This gives us the standard error of the regression.
I . . . or the “average prediction error.” The smaller the residualcomponent, the smaller the s.e. of the regression.
I For the pedagogical regression using the calcount data, thes.e. is about 6.15.
I Average prediction error in the model is about 6 percent.
I The s.e. is scaled by Y so it is easy to interpret.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Variance Components from a Regression Model
I It is clear the residual sums of squares are just half of theoverall variance in the regression model.
I If the RSS gives us “error variance” then what informs usabout predictive improvement over and above the mean?
I Recall that if βj = 0 then β0 = Y .
I Deviations in predictions, Y from the mean, Y tells theimprovement gain in using X to predict Y over simply guessthe mean every time.
I The calcounty data:
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Regression Model
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Variance Components from a Regression Model
I So the deviation Y − Y gives the signed difference betweenpredicted and the mean.
I Intuition: if the fitted values do not depart from the mean, Xis not doing a “good job” of predicting Y .
I Square and sum:n∑
i=1
(Y − Y )2
I This is regression sum of squares (or sum of squares due tothe regression). Fox refers to it as RegSS.
I It should be clear that the sum of RSS and RegSS accountsfor the total variance in the model.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Variance Components from a Regression Model
I Total Sums of Squares (TSS):
TSS =∑
(Yi − Yi )2 +
∑(Y − Y )2
=∑
(Yi − Y )2
= RegSS + RSS
I This shows us again that the regression function must passthrough the point of averages.
I From these variance components, an intuitive fit measureemerges:
R2 =
∑(Y − Y )2∑
(Yi − Yi )2 +∑
(Y − Y )2
=RegSS
TSS
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Variance Components from a Regression Model
I In multiple regression, this is the squared multiple correlationor equivalently the square of rY Y .
I Obama model: RSS = 2119; RegSS = 7667.
I R2 = 7667/(2119 + 7667) ≈ .78.
I In terms of the total variance in the model, about 78 percentis accounted for by the linear regression of votes on Prop. 8support (n.b.).
I Issues with the R2
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Variance Components from a Regression Model
I The R2 is nondecreasing in X (why must this be the case?; beable to show this mathematically)
I Usually better to use “adjusted R2:
R2 = 1− (1− R2)n − 1
n − k − 1
= 1− RSS
TSS× dfTSS
dfRSS
= 1− RSS/n − k − 1
TSS/n − 1(1)
I The degrees-of-freedom are used as a correction factor.
I In passing, note that R2 can be negative (see next slide).
I If R2 is nondecreasing, it is not very useful for modelcomparisons.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Adj. R2 from Regression Model
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Properties
I What are some properties of the model?
I Start with assumption that the error is not systematicimplying:
E (εi ) = E (εi | X ) = 0 (2)
I Linearity: E (Y ) is a linear function of Xk :
µi = E (Y | xi ) = E (β0 + β1xi + εi )
= β0 + β1xi + E (εi )
= β0 + β1xi + 0
= β0 + β1xi (3)
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Assumptions and Properties
I Homoskedasticity (constant variance):
var(εi | Xi ) = E [εi − E (εi )] | Xi ]2
= E (ε2i | Xi )(Why?)= σ2 (4)
I This implies the variance of εi for each Xi is equal to somepositive constant (which is equal to σ2).
I (Q: Since we usually do not observe σ2 directly, what do youthink is used as its estimator?)
I When this assumption does not hold, we have a conditionknown as heteroskedasticity, and the variance is equal to σ2
i .
I Why might you care about this assumption?
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Assumptions and Properties
I Independence assumptions:
cov(εi , εj | Xi ,Xj) = E [εi − E (εi ) | Xi ][εj − E (εj) | Xj ]
= E (εi | Xi )(εj | Xj)(Why?)= 0 (5)
I This implies that there is no correlation of the disturbancesacross the observations.
I Wrt sampling, the observations are sampled independently.
I Problem with time-series data: If εti and εt−1,i are positivelycorrelated, then Y is a function of not only Xi and εti , butalso εt−1,i .
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Assumptions and Properties
I Xk are fixed in repeated sampling.
I Very strong assumption!
I Experimental designs (in principle) will satisfy this condition. . .
I Unfortunately, we often work with observational data.
I This is why causal inference is difficult (or at least one reasonwhy).
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Assumptions and Properties
I Covariance result: the covariance between εi and Xi is 0:
cov(εi ,Xi ) = E [εi − E (εi )][Xi − E (Xi )]
= E [εi (Xi − E (Xi ))](Why?)= E (εiXi )− E (Xi )E (εi )
= E (εiXi )
= 0. (6)
I The import of it is to say that the unsystematic component(given by εi ) is not related to the systematic component(given by the Xi ).
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Assumptions and Properties
I X is not a constant
I n > k + 1
I There is no perfect collinearity.
−1 < rXi ,Xj< 1 (7)
i.e. one variable is not a linear combination of another variablesuch that the correlation between the variables is 1 (or -1).
I The model is correctly specified.
I Note we have said nothing about distributions at this point.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I The regression assumptions give us a baseline to evaluate theadequacy of the model.
I But we need more precision in connecting our estimates backto the population parameters.
I βk are derived from the sample data so there will be variability.
I We want to estimate the parameter’s precision or its reliability.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I The usual measure of precision in statistics is the standarderror.
I It is taken as the standard deviation of the samplingdistribution of the estimator.
I Given that our estimator has a probability distribution (for agiven sample size from a given population), it is natural to askwhat the variance is of that distribution.
I This leads directly to the consideration of the variance of theestimators.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Bivariate model first (extension to the n-variable case isstraightforward).
I The variance of the regression slope, β is given by
var(β) =σ2∑
(Xi − X )2,
and the standard error is the square root of the variance,giving us
se(β) =σ√∑
(Xi − X )2.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I The variance of the regression intercept β0 is given by
var(β0) =
( ∑X 2
i
n∑
(Xi − X )2
)σ2,
and the standard error is given by
se(β0) =
√( ∑X 2
i
n∑
(Xi − X )2
)σ.
I In general, we will be more interested in the precision aroundthe slope coefficient than the intercept.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I We have seen σ2 before: variance of the error component.
I Which is assumed to be constant.
I We usually will not directly observe this term (why?) butinstead must estimate it directly from the data.
I What is the estimator we use?
I Recall the “standard error of the estimate”:
s.e(r) =
√ ∑r2i
n − k − 1
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I For the bivariate setting:
var(ri ) =
∑(Yi − Yi )
2
n − 2
=
∑(ri )
2
n − 2
=SSE
n − 2,
which, after taking the square root, gives us√SSE
n − 2
I The square root is the s.e. of the estimate aka the “rootmean square error.”
I . . . and the MSE is?I
∑(Yi − Yi )
2/n − 2
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Call:
lm(formula = obamapercent ~ proportionforprop8)
Residuals:
Min 1Q Median 3Q Max
-8.795 -5.392 -0.669 4.117 19.317
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 102.3339 3.5434 28.9 <2e-16 ***
proportionforprop8 -0.8658 0.0608 -14.2 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 6.15 on 56 degrees of freedom
Multiple R-squared: 0.783, Adjusted R-squared: 0.78
F-statistic: 203 on 1 and 56 DF, p-value: <2e-16
> anova(regmod)
Analysis of Variance Table
Response: obamapercent
Df Sum Sq Mean Sq F value Pr(>F)
proportionforprop8 1 7667 7667 203 <2e-16 ***
Residuals 56 2119 38
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Model output:RSS: 2119RSS df: 56 (n-2)MSE: 38RegSS: 7667RegSS df: 1 (k)MSR: 7667 (RegSS/df)TSS: RSS + RegSS (not shown)TSS df: 57 (n-1)RMSE: 6.15
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I More model output . . .
I Standard error of the regression coefficient:
se(β) =σ√∑
(Xi − X )2.
I RMSE=6.15 and√∑
(Xi − X )2 ≈ 10620
I s.e.(β) = 6.15/√
(10620) ≈ .06
I You can verify the s.e. of the constant on your own.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I The extension to multiple regressors is straightforward(although like the least squares estimators, the presentation inscalar form gets ugly).
I Model with β0, β1 β2 gives variances for the intercepts of:
var(β1) =σ2∑
(X1 − X1)2(1− r21,2)
,
for β1,
var(β2) =σ2∑
(X2 − X2)2(1− r21,2)
,
for β2
I The variance function for the constant is a bit ugly; consultFox to see it.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Standard errors:
se(β1) =σ√∑
(X1 − X1)2(1− r21,2)
,
for β1,
se(β2) =σ√∑
(X2 − X2)2(1− r21,2)
,
for β2.
I The term, 1− r21,2, is known as the “auxiliary regression”
where the r2 is obtained by the regression of X1 on X2.
I Equivalently, the square root of the r2 term gives you thecorrelation coefficient between X2 and X1.
I It is a measure of how collinear the covariates are.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Fox uses this notation for the variance:
var(βk) =1
1− R2k
× σ2∑ni=1(xij − x j)2
I Obviously the same result will be gotten if you take the squareroot.
I When k > 2, R2k is the squared multiple correlation from the
regression of some X on all the other Xk .
I Note that the first factor is sometimes called the “varianceinflation factor.”
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I We now have β and we have the s.e.(β)
I What can we do in the way of inference?
I It’s time to overlay some distributional assumptions here.
I Conventional to assume normality.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Now understand, we’ve gotten pretty far without thenormality assumption.
I The only assumptions regarding εi have been:a. conditional mean is 0.b. variance is homoskedastic.c. 0 covariance with xi .
I But now we need to go beyond point estimation and entertheworld of hypothesis testing. This requires us to say somethingabout the distribution of the error term.
I The regression coefficients are a linear function of εi (recallthe least squares estimator).
I Therefore, the sampling distribution of our least squaresestimator will depend on the sampling distribution of ε.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I The assumptions:
E (εi ) = 0
E (ε2i ) = σ2
E (εi , εj) = 0, i 6= j ,
which are the assumptions discussed earlier.
I But in addition to this, we’re going to assume the ε isnormally distributed.
I This leads to the following assumption:
εi ∼ N(0, σ2),
which says that ε is normally distributed with mean 0 andvariance σ2.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I We can state this more explicitly by recognizing that for anytwo normally distributed random variables, a zero covariancebetween them implies independence.
I This means that if εi and εj have a 0 covariance (which theydo by assumption), then they can be said to beindependently distributed, leading to:
εi ∼ NID(0, σ2),
where NID means normally and independently distributed.
I Why assume the normal?
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I The principle reason is given by the central limit theorem.
I Under the CLT, if there are large number of iid randomvariables, the distribution of their sum will tend to a normaldistribution as n increases.
I So it is the central limit theorem that provides us with astrong justification to assume normality.
I An important result of the normal distribution is that anylinear function of normally distributed random variables isitself, normally distributed.
I The regression coefficients are linear functions of εi , so itmust be the case that the sampling distributions for theregression estimates are also normally distributed.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I For the multiple regression setting we now can say
βk ∼ N(βk , σ2βk
),
I Additional results: under the normal distribution, we candefine a distribution for our estimator σ2 as
(n − k − 1)σ2
σ2∼ χ2
n−k−1,
where χ2 denotes the chi-square distribution with n − k − 1degrees of freedom. Use of the χ2 statistic will allow us tocompute confidence intervals around the estimator σ2.
I Under the normal distribution, the regression estimates haveminimum variance in the entire class of unbiased estimators.
I Finally if εi is distributed normally, then Yi itself must benormally distributed:
Yi ∼ N(β0 + βkXi , σ2).
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Under the normality condition, we can specify Z = β1−β1σb1
.
I A fundamental problem exists because σ is usually unknown.In its place, we estimate σ by using the standard error of β1.
I This gives rise to a t-statistic:
t =β1 − β1
s.e.(β1),
which follows the t distribution with n − k − 1 degrees offreedom.
I Now since β1−β1
s.e.(β1)∼ t(n − k − 1), we can use the t
distribution to establish a confidence interval:
Pr(−tα/2 ≤ t ≤ tα/2) = 1− α.
The term tα/2 denotes our critical value and α denotes thesignificance level. The level α = .05 is common, but .01 or .10levels are also commonly used as well.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Substituting terms for the interval, we can rewrite theprevious statement as
Pr(−tα/2 ≤β1 − β1
s.e.(β1)≤ tα/2) = 1− α,
I Rearranging, gives
Pr[β1 − tα/2s.e.(β1) ≤ β1 ≤ β1 + tα/2s.e.(β1)]
which is the 100(1− α) percent confidence interval.
I Hence, α = .05 yields a 95 percent confidence interval:
β1 ± tα/2s.e.(β1).
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I One important thing to note is the fact that we’re dividing thesignificance level by two.
I Note also that the width of the c.i. is proportional to thestandard error of the coefficient.
I We can now see why the standard error is a measure ofprecision: it directly effects the interval in which thepopulation parameters will probabilistically reside (overrepeated samples).
I Simple tests-of-significance can now be done.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I We can test the condition of the null hypothesis using a tstatistic.
I The t:
t =β1 − β1
s.e.(β1).
I We could state the null as
H0 : β1 = 0,
but we could easily specify β1 under the null as being equal toany hypothetical value (i.e. 1, .5, 3.14, etc.).
I Define β∗1 as the value of β1 under the null and rewrite t as
t =β1 − β∗1s.e.(β1)
.
where β∗1 now reflects the condition of the null (and tα/2
denote the critical t values).Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I We can utilize p values to determine the probability of a tvalue.
I In consulting a t table, we can look up the appropriate degreesof freedom and derive the probability for a given t value.
I Suppose we have 8 degrees of freedom and obtain a t value of2.306.
I In looking at the t table, we see that the probability ofobtaining a t value of 2.306 or greater is 5 percent. Thismeans that this result could have occurred by chance aloneonly about 5 percent of the time.
I This is all based on classical statistics.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Joint tests-of-significance are possible.
I Omnibus F -test: H0 : β1 = β2 = . . . = βk = 0
I Easy to compute using variance components:
F0 =RegSS/k
RSS/(n − k − 1)
= MSR/MSE (8)
I Consult an F-table for k and n − k − 1 degrees-of-freedomand you can obtain a p-value for the test.
Jones POL 212: Research Methods
Bradford S. Jones, UC-Davis, Dept. of Political Science Today: Variance Components and Inference
Inference for the Regression Model
I Next week: matrix form of the model as well as further proofsof the assumptions.
I Model matrices and diagnostic methods.
I Review matrix algebra.
Jones POL 212: Research Methods