+ All Categories
Home > Documents > Linear Regression: Making Sense of Regression Results

Linear Regression: Making Sense of Regression Results

Date post: 30-Dec-2015
Category:
Upload: shamara-amena
View: 52 times
Download: 0 times
Share this document with a friend
Description:
Linear Regression: Making Sense of Regression Results. Interpreting Stata regression output Coefficients for independent variables Fit of the regression: R Square Statistical significance How to reject the null hypothesis Multivariate regressions College graduation rates - PowerPoint PPT Presentation
48
Linear Regression: Making Sense of Regression Results Interpreting Stata regression output Coefficients for independent variables Fit of the regression: R Square Statistical significance How to reject the null hypothesis Multivariate regressions College graduation rates Ethnicity and voting
Transcript
Page 1: Linear Regression: Making Sense of Regression Results

Linear Regression: Making Sense of Regression Results

Interpreting Stata regression outputCoefficients for independent variablesFit of the regression: R Square

Statistical significanceHow to reject the null hypothesis

Multivariate regressionsCollege graduation ratesEthnicity and voting

Page 2: Linear Regression: Making Sense of Regression Results

SPSS Output – We’ll Use Stata – Benefit in Knowing Two Packages

Average SAT Score

16001400120010008006004002000

Gra

du

atio

n R

ate

100

80

60

40

20

0 Rsq = 0.3454

How tight is the fit?

Y-intercept or “constant”

Slope or “coefficient”

Page 3: Linear Regression: Making Sense of Regression Results

Interpreting regression output

Regression output typically includes two key tables for interpreting your results:

A “Coefficients” table that contains the y-intercept (or “constant”) of the regression, a coefficient for every independent variable, and the standard error of that coefficient.

A “Model Summary” table that gives you information on the fit of your regression.

Page 4: Linear Regression: Making Sense of Regression Results

Interpreting SPSS (another statistical package) regression: Coefficients – 1

Coefficientsa

4.236 7.048 .601 .549

5.88E-02 .007 .588 8.778 .000

(Constant)

AverageSAT Score

Model1

BStd.Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Graduation Ratea.

• The y-intercept is 4.2% with a standard error of 7.0%

• The coefficient for SAT Scores is 0.059%, with a

standard error of 0.007%. Standardized coefficients

discussed later.

Page 5: Linear Regression: Making Sense of Regression Results

Interpreting regression output: Coefficients - 2

The y-intercept or constant is the predicted value of the dependent variable when the independent variable takes on the value of zero.This basic model predicts that when a

college admits a class of students who averaged zero on their SAT, 4.2% of them will graduate.

The constant is not the most helpful statistic.

Page 6: Linear Regression: Making Sense of Regression Results

Interpreting regression output: Coefficients - 3The coefficient of an independent variable

is the predicted change in the dependent variable that results from a one unit increase in the independent variable.A college with students whose SAT scores are

one point higher on average will have a graduation rate that is 0.059% higher.

Increasing SAT scores by 200 points leads to a (200)(0.059%) = 11.8% rise in graduation rates

Page 7: Linear Regression: Making Sense of Regression Results

Interpreting regression output: Fit of the Regression

Model Summary

.588a .345 .341 12.45%Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Average SAT Scorea.

The R Square measures how closely a regression line fits the data in a scatterplot.

• It can range from zero (no explanatory power) to one (perfect prediction).

• An R Square of 0.345 means that differences in SAT scores can explain 35% of the variation in college graduation rates. Key sentence for quizzes!

Page 8: Linear Regression: Making Sense of Regression Results

Statistical Significance - 1

What would the null hypothesis look like in a scatterplot?

If the independent variable has no effect on the dependent variable, the scatterplot should look random, the regression line should be flat, and its slope should be zero.

Null hypothesis: The regression coefficient for an independent variable equals zero.

Page 9: Linear Regression: Making Sense of Regression Results

Statistical Significance - 2

Our formal test of statistical significance asks whether we can be SURE that a regression coefficient DIFFERS from zero.The “standard error” is the standard deviation

of the sample distribution. If a coefficient is more than two standard

errors away from zero, we can reject the null hypothesis (that it equals zero).

Page 10: Linear Regression: Making Sense of Regression Results

Statistical Significance - 3

So, if a coefficient is more than TWICE the size of its standard error, we REJECT the NULL hypothesis with 95% confidence.This works whether the coefficient is

negative or positive.The coefficient/standard error ratio is called

the “test statistic” or “t-stat.”A t-stat bigger than 2 or less than -2

indicates at statistically significant effect

Page 11: Linear Regression: Making Sense of Regression Results

Statistical Significance - 4

Page 12: Linear Regression: Making Sense of Regression Results

Regression of Tax on Cons, Party and Stinc in Stata

Source | SS df MS Number of obs = 100

-------------+------------------------------ F( 3, 96) = 65.44

Model | 54886.5757 3 18295.5252 Prob > F = 0.0000

Residual | 26840.2643 96 279.586087 R-squared = 0.6716

-------------+------------------------------ Adj R-squared = 0.6613

Total | 81726.84 99 825.523636 Root MSE = 16.721

------------------------------------------------------------------------------

tax | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

cons | -.64472 .07560 -8.53 0.000 -.7010575

party | 11.20792 4.67533 2.40 0.018 .1902963

stinc | -.56008 1.28316 -0.44 0.663 -.0297112

_cons | 67.38277 15.11393 4.46 0.000 .

------------------------------------------------------------------------------

For which independent variables would we reject the null hypothesis? Why?

Page 13: Linear Regression: Making Sense of Regression Results

Visualizing a t ratio - 1

Which of the next two slides depicts a higher t ratio?

Page 14: Linear Regression: Making Sense of Regression Results

Visualizing a t ratio - 2

Page 15: Linear Regression: Making Sense of Regression Results

Visualizing a t ratio - 3

Page 16: Linear Regression: Making Sense of Regression Results

Multivariate Regression - 1

A “multivariate regression” uses more than one independent variable (or confound) to explain variation in a dependent variable.The coefficient for each independent variable

reports its effect on the DV, holding constant all of the other IVs in the regression.

Page 17: Linear Regression: Making Sense of Regression Results

Multivariate Regression - 2

Year of

Founding

SAT Scores

Graduation Tuition Rates

Student/Faculty

Ratio

Page 18: Linear Regression: Making Sense of Regression Results

Multivariate Regression - 3

Coefficientsa

59.187 47.203 1.254 .212

-2.1E-02 .023 -.072 -.917 .361

4.2E-02 .010 .410 4.224 .000

8.4E-04 .000 .208 2.109 .037

-.206 .329 -.054 -.626 .533

(Constant)

Year school wasfounded

Average SAT Score

In-state Tuition

Student/faculty ratio

Model1

BStd.Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Graduation Ratea.

Page 19: Linear Regression: Making Sense of Regression Results

Multivariate Regression - 4

Holding all other factors constant, a 200 point increase in SAT scores leads to a predicted (200)(0.042) = 8.4% increase in the graduation rate, and this effect is statistically significant.

Controlling for other factors, a college that is 100 years younger should have a graduation rate that is (100)(-0.021) = 2.1% lower, but this effect is NOT significantly different from zero.

Page 20: Linear Regression: Making Sense of Regression Results

Multiple Regression: Comparative Politics – Stata - 1

Let’s examine the impact of government ideology on economic growth in 18 wealthy democracies (Western Europe, the United States, Canada, Japan, Australia and New Zealand) annually over the 1961-1994 period.

Page 21: Linear Regression: Making Sense of Regression Results

Comparative Politics - 2

Variable List:

growthpc – annual growth of per capita (i.e., per person) gross domestic product

govcons – strength of the conservative party in the national government

left – strength of the left party in the national government

Page 22: Linear Regression: Making Sense of Regression Results

Comparative Politics - 3

gdppc – per capita gross domestic product

unem – unemployment rate

Page 23: Linear Regression: Making Sense of Regression Results

Comparative Politics - 4

Source | SS df MS Number of obs = 453

-------------+------------------------------ F( 4, 448) = 16.56

Model | 272.295407 4 68.0738517 Prob > F = 0.0000

Residual | 1841.26412 448 4.10996456 R-squared = 0.1288

-------------+------------------------------ Adj R-squared = 0.1211

Total | 2113.55953 452 4.67601666 Root MSE = 2.0273

------------------------------------------------------------------------------

growthpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

govcons | -.168093 .0380607 -4.42 0.000 -.2428933 -.0932942

left | .001841 .0034541 0.53 0.594 -.0049468 .0086298

gdppc | -.000157 .0000585 -2.70 0.007 -.0002725 -.0000428

unem | -.086520 .0458576 -1.89 0.060 -.176643 .0036023

_cons | 7.501013 .7285216 10.30 0.000 6.069269 8.932757

-------------+----------------------------------------------------------------

What do these results indicate?

Page 24: Linear Regression: Making Sense of Regression Results

Multicollinearity Check

vif

Variable | VIF 1/VIF

-------------+----------------------

govcons | 1.37 0.730762

unem | 1.31 0.763241

gdppc | 1.29 0.776446

left | 1.20 0.834291

-------------+----------------------

Mean VIF | 1.29

Low multicollinearity – highest is govcons

(27% of the variance explained by the other independent variables:

1 - .73 = .27 – thus “low”)

Page 25: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 1

While many/most variable relationships in political science are reasonably well approximated by the linear relationships shown on the next slide, some are not.

Page 26: Linear Regression: Making Sense of Regression Results
Page 27: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 2

The next slide shows a negative nonlinear relationship between OSHA expenditures and the workplace injury rate. What theory would lead us to think that: (1) the relationship between OSHA expenditures and the workplace injury rate would be negative; (2) that the relationship would be nonlinear? What form should the nonlinearity take?

Page 28: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 3

Page 29: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 4

DON’T WORRY ABOUT THE MATH!

Since the rate of change decreases (i.e., the injury rate decreases but at a slower rate for each additional dollar spent on OSHA inspections), we can estimate a linear relationship by converting the OSHA budget to logarithms. Thus, an OSHA budget of 10 (i.e., $10,000,000) is read as 2.3 (i.e., base “e” = 2.71728 and 2.718282.3 = 10).

Page 30: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 5

The next slide shows the relationship between economic development and political violence. What form should such a relationship take? Should we expect the relationship to change direction (i.e., from negative to positive or vice versa)? Why? How would you measure the variables?

Page 31: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 6

Page 32: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 7

The next several slides examine nonlinear models from the comparative politics literature on political violence. The dependent variable is the death rate in a nation from political violence or violent acts (e.g., riots).

Page 33: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 8

Page 34: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 9

Page 35: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 10

Page 36: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 11

The next slide shows a graph in which the dependent variable (Y axis) is the percentage of elected county officials who are African-American and the independent variable (X axis) is the percentage of the county voters who are African-American. What would you expect the graph to look like? How many “changes of direction” (positive to negative or vice versa) in the relationship would you expect?

Page 37: Linear Regression: Making Sense of Regression Results

Nonlinear Models - 12

Page 38: Linear Regression: Making Sense of Regression Results

North Carolina

Source | SS df MS Number of obs = 300-------------+------------------------------ F( 4, 295) = 83.90

Model | 8422.69127 4 2105.67282 Prob > F = 0.0000

Residual | 7404.1454 295 25.098798 R-squared = 0.5322

-------------+------------------------------ Adj R-squared = 0.5258

Total | 15826.8367 299 52.9325641 Root MSE = 5.0099

------------------------------------------------------------------------------

blktot | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

blkreg | .9915165 .1630062 6.08 0.000 .670714 1.312319

blkregsq | -.037464 .0071142 -5.27 0.000 -.051465 -.023463

blkregcub | .0005588 .00009 6.21 0.000 .0003817 .0007359

wall | -.1548252 .0395056 -3.92 0.000 -.2325737 -.0770767

_cons | 1.051 .9752407 1.08 0.282 -.868311 2.970311

------------------------------------------------------------------------------

Page 39: Linear Regression: Making Sense of Regression Results

Interaction Terms - 1

If our theory indicates that the impact of one independent variable on the dependent variable changes as the level of ANOTHER independent variable changes, we need an interaction term. We simply multiply the scores on the two independent variables and create a new independent variable.

Page 40: Linear Regression: Making Sense of Regression Results

Interaction Terms - 2

Page 41: Linear Regression: Making Sense of Regression Results

Interaction Terms - 3

Page 42: Linear Regression: Making Sense of Regression Results

The Impact of Outliers

The next two slides show the impact of outlier (i.e., extreme) data. The argument that a lower corporate tax rate will actually raise more revenue is based on this conundrum. Spotting outliers is one of the reasons graphical analysis is useful. We sometimes re-run analyses removing an extreme score to see how fragile the initial results are.

Page 43: Linear Regression: Making Sense of Regression Results
Page 44: Linear Regression: Making Sense of Regression Results

Outlier Omitted

Page 45: Linear Regression: Making Sense of Regression Results

Causal Models – Presidents and the Economy - 1

20th Percentile (Dep. Variable: Growth Rate)

Democratic President 2.32 (.80)

Oil Prices (% lagged) -.032 (.016)

Labor Force Participation 4.66 (1.44)

Lagged Growth -.191 (.084)

Linear Trend -12.84 (5.88)

Quadratic Trend 9.68 (5.75)

Intercept 2.68 (1.26)

R - Squared .41

Page 46: Linear Regression: Making Sense of Regression Results

Causal Models – Presidents and the Economy - 2

Impact of Democratic President across Income Groups:

20th Percentile: 2.32 (.80)

40th Percentile: 1.60 (.56)

60th Percentile: 1.53 (.52)

80th Percentile: 1.23 (.51)

95th Percentile: .50 (.64)

Page 47: Linear Regression: Making Sense of Regression Results

Causal Models – Presidents and the Economy - 3

20th Percentile (Dep. Variable: Growth Rate)

Democratic President .51 (.64)

Unemployment (%) -.849 (.307)

Inflation (%) -.134 (.127)

GNP Growth (%) .798 (.144)

Oil Prices (% lagged) -.005 (.013)

Why are the results different? Does the partisanship of the President matter? (YES!)

Page 48: Linear Regression: Making Sense of Regression Results

Regression – Presidents and the Economy - 4

income

Democratic >>>> unemployment >>growth

Presidential >>>> inflation >>>>>> rate

Adm. >>>>>GNP growth>>>> 20th

percentile


Recommended