Sociology 601 Class 23: November 17, 2009
• Homework #8
• Review
– spurious, intervening, & interactions effects
– stata regression commands & output
• F-tests and inferences (A&F 11.4)
1
Review: Types of 3-variable Causal Models
• Spurious• x2 causes both x1 and y• e.g., age causes both marital status and earnings
• Intervening• x1 causes x2 which causes y• e.g., marital status causes more hours worked which
raises annual earnings
• No statistical difference between these models.
• Statistical interaction effects: The relationship between x1 and y depends on the value of another variable, x2
• e.g., the relationship between marital status and earnings is different for men and women.
2
Review: Causal Models with earnings & marital status
bivariate relationship:1.married earnings
spuriousness:2. married earnings
age
intervening:3. married hours earnings
interaction effect:4.married earnings
gender
3
Review: Stata Commands
• describe• summarize• tab• tab xcat, sum(yvar)• drop if / keep if• gen / replace• ttest• regress• predict / predict, residuals
• histogram / scattergram• graph box yvar, over(xvar)
4
Review: Regression models using Stata
see:
http://www.bsos.umd.edu/socy/vanneman/socy601/conrinc.do
5
Review: Regression models with Earnings, Marital status and Age
bivariate relationship:. * association of earnings and marital status:. regress conrinc married
Source | SS df MS Number of obs = 725-------------+------------------------------ F( 1, 723) = 31.29 Model | 1.9321e+10 1 1.9321e+10 Prob > F = 0.0000 Residual | 4.4645e+11 723 617501240 R-squared = 0.0415-------------+------------------------------ Adj R-squared = 0.0402 Total | 4.6577e+11 724 643334846 Root MSE = 24850
------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- married | 10383.4 1856.279 5.59 0.000 6739.057 14027.74 _cons | 35065.27 1380.532 25.40 0.000 32354.94 37775.6------------------------------------------------------------------------------
. spuriousness (partial):
. * age makes the marriage-earnings relationship partly spurious:
. regress conrinc married age
Source | SS df MS Number of obs = 725-------------+------------------------------ F( 2, 722) = 36.20 Model | 4.2454e+10 2 2.1227e+10 Prob > F = 0.0000 Residual | 4.2332e+11 722 586315863 R-squared = 0.0911-------------+------------------------------ Adj R-squared = 0.0886 Total | 4.6577e+11 724 643334846 Root MSE = 24214
------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- married | 8243.081 1840.613 4.48 0.000 4629.489 11856.67 age | 702.0977 111.7749 6.28 0.000 482.6551 921.5403 _cons | 8836.284 4387.025 2.01 0.044 223.4344 17449.13------------------------------------------------------------------------------
6
Review: Regression models with Earnings, Marital status and Hours Worked
Intervening variable relationship (hours worked):. * hours worked explains some of how marital status increases earnings:. regress conrinc married age hrs1
Source | SS df MS Number of obs = 664-------------+------------------------------ F( 3, 660) = 25.02 Model | 4.4322e+10 3 1.4774e+10 Prob > F = 0.0000 Residual | 3.8970e+11 660 590458672 R-squared = 0.1021-------------+------------------------------ Adj R-squared = 0.0980 Total | 4.3402e+11 663 654637868 Root MSE = 24299
------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- married | 7328.527 1934.225 3.79 0.000 3530.551 11126.5 age | 631.5836 117.8463 5.36 0.000 400.1848 862.9824 hrs1 | 281.3472 71.47315 3.94 0.000 141.0051 421.6894 _cons | -232.1376 5465.426 -0.04 0.966 -10963.86 10499.58------------------------------------------------------------------------------
But: problem with N!
Create new hours worked:. gen hrs=hrs1(101 missing values generated)
. replace hrs=hrs2 if hrs1>=.(24 real changes made, 2 to missing)
. replace hrs=0 if hrs1>=. & wrkstat>=3(101 real changes made)
7
Review: Regression models with Earnings, Marital status and Hours Worked
Intervening variable relationship (revised hours worked):
. regress conrinc married age hrs
Source | SS df MS Number of obs = 725-------------+------------------------------ F( 3, 721) = 36.27 Model | 6.1081e+10 3 2.0360e+10 Prob > F = 0.0000 Residual | 4.0469e+11 721 561294582 R-squared = 0.1311-------------+------------------------------ Adj R-squared = 0.1275 Total | 4.6577e+11 724 643334846 Root MSE = 23692
------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- married | 7465.107 1805.967 4.13 0.000 3919.526 11010.69 age | 640.1643 109.891 5.83 0.000 424.4197 855.9089 hrs | 278.3368 48.31685 5.76 0.000 183.4783 373.1954 _cons | -493.7634 4587.79 -0.11 0.914 -9500.786 8513.259------------------------------------------------------------------------------
b(married) reduced to 7465.1 from 8243.1 (N= 725 for both)
8
Review: Regression models with EarningsMarital status, Age, and Hours worked.
9
Model 0 Model 1 Model 2x Model 2
Married 10,383.4*** 8,243.1*** 7,328.5*** 7,465.1***
Age 702.1*** 631.6*** 640.2***
Hours worked 281.3*** 278.3***
Constant 35,065.3*** 8,836.3* -232.1n.s. -493.8n.s.
N 725 725 664 725
R-square 0.042 0.091 0.102 0.133
Review: Regression models with Earnings and Marital status, separately by Gender
Statistical Interaction Effect:. * association of earnings and marital status for men:. regress conrinc married if sex==1
Source | SS df MS Number of obs = 725-------------+------------------------------ F( 1, 723) = 31.29 Model | 1.9321e+10 1 1.9321e+10 Prob > F = 0.0000 Residual | 4.4645e+11 723 617501240 R-squared = 0.0415-------------+------------------------------ Adj R-squared = 0.0402 Total | 4.6577e+11 724 643334846 Root MSE = 24850
------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- married | 10383.4 1856.279 5.59 0.000 6739.057 14027.74 _cons | 35065.27 1380.532 25.40 0.000 32354.94 37775.6------------------------------------------------------------------------------
. * association of earnings and marital status for women:
. regress conrinc married if sex==2
Source | SS df MS Number of obs = 749-------------+------------------------------ F( 1, 747) = 0.26 Model | 106732224 1 106732224 Prob > F = 0.6129 Residual | 3.1118e+11 747 416578779 R-squared = 0.0003-------------+------------------------------ Adj R-squared = -0.0010 Total | 3.1129e+11 748 416164546 Root MSE = 20410
------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- married | 755.3387 1492.253 0.51 0.613 -2174.17 3684.848 _cons | 26201 1038.855 25.22 0.000 24161.57 28240.42------------------------------------------------------------------------------
10
Inferences: F-tests of global model
Ho : β1 = β2 = ... βk = 0
• α or β0 ?
F-tests of H0:
• Calculate new test statistic, F
• ratio of “explained variance” / “unexplained variance”
• F-distribution: ratio of chi-square distributions
• df1 (numerator); df2 (denominator)
• if df1=1, then F = t2
• Table D, pages 671-673
• Global F-test less useful (almost always significant unless you have a really bad model or very small N).
• Base for F-test comparing regression models (later)
11
F-test: Method 1, STATA output
. regress conrinc married age hrs1
Source | SS df MS Number of obs = 725-------------+------------------------------ F( 3, 721) = 36.27 Model | 6.1081e+10 3 2.0360e+10 Prob > F = 0.0000 Residual | 4.0469e+11 721 561294582 R-squared = 0.1311-------------+------------------------------ Adj R-squared = 0.1275 Total | 4.6577e+11 724 643334846 Root MSE = 23692
------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- married | 7465.107 1805.967 4.13 0.000 3919.526 11010.69 age | 640.1643 109.891 5.83 0.000 424.4197 855.9089 hrs | 278.3368 48.31685 5.76 0.000 183.4783 373.1954 _cons | -493.7634 4587.79 -0.11 0.914 -9500.786 8513.259------------------------------------------------------------------------------
df1 = 3 (= k = # parameters = β(married), β(age), β(hrs) )
df2 = 721 [ = N – (k+1) = 725 – (3+1) ]
F(3,721) = 2.60 (α = .05); 36.27 >> 2.60
12
F-test: Method 2, using R-square
13
€
F =R2 / k
(1− R2) / [N − (k +1)]
df 1 = k; df 2 = N − (k +1)
€
F =.1311/ 3
(1− .1311) / [725 − (3+1)]
F =.0437
.8689 / 721
F = 36.26
F-test: Method 3, using SSE and Model SS
14
€
F =ModelSS / k
SSError / [N − (k +1)]
F =Model Mean Square
Mean Square Error
df 1 = k; df 2 = N − (k +1)
F = 2.0360e+10 / 561294582
= 36.27
Inferences: βi
15
H0: βi = 0
• what we are usually most interested in
test statistic:
€
t =bi
ˆ σ bi
df = df 2 = N − (k +1)
ˆ σ bi is calculated from matrix routines
Next: Regression with Dummy Variables
16
Agresti and Finlay 12.3 • (skim 12.1-12.2 on analysis of variance)
Example: marital status, 3 categories• currently married• never married• widowed• separated• divorced