The Effect of Urban Agglomeration on Wages:
Evidence from Samples of Siblings
Harry Krashinsky∗
University of Toronto
Abstract
The large and significant relationship between city population and wages has beenwell established in the agglomeration literature, yet its causal interpretation remainsdebated. This paper contributes new evidence to this debate by using multiple datasets of siblings in order to estimate the agglomeration premium while controlling forunobserved individual heterogeneity with a family-specific fixed effect. In the absenceof this fixed effect, the agglomeration premium is large and significant. But after afamilial fixed effect is included in the regression framework, the city-size wage premiumbecomes small in magnitude and statistically insignificant in all of the data sets usedin the analysis. The results demonstrate the importance of family background forinterpreting the agglomeration premium.
∗Corresponding author: Harry Krashinsky, 121 St. George Street, Centre for Industrial Relations, Uni-versity of Toronto, Toronto, Ontario, Canada, M5S 2E8. Telephone: (416) 978-1744. Fax: (416) 978-5696.Email: [email protected] I would like to thank Orley Ashenfelter and Cecilia Rouse for permit-ting me to access to the data of identical twins. I would like to thank William Strange and Stuart Rosenthalfor several helpful comments and discussions about this paper. I would also like to thank Jeremy Glazierfor helpful research assistance.
1 Introduction
The effect of a city’s population on wages is highly significant and large in magni-
tude. Various studies have demonstrated that doubling the population of an individual’s
city would cause wages to rise by three to seven percent, and moving from a city of less
than 500,000 people to one with more than half-a-million residents would increase wages by
over 20 percent. Both of these effects are at least as large as the returns to many standard
variables included in a wage regression, and perhaps because of this, researchers have ques-
tioned the causal nature of the relationship between city size and wages. Generally, papers
which argue against the causal effects of city size on wages tend to find that non-random
selection is the reason wages are higher in cities than in non-urban areas — better workers
select into cities to obtain higher-paying jobs there. Conversely, studies which argue that
city size does have a causal effect on wages suggest that even after accounting for selection
issues, agglomeration causes better matches between workers and firms because larger cities
have “thicker” markets, or that cities contain other amenities (or disamenities) which cause
wages to be higher in urban areas.
To contribute to this debate, this paper presents new evidence from multiple data
sets of siblings in order to estimate the city-size wage premium while using an econometric
strategy that is new to the agglomeration literature: incorporating familial effects on wages.
The first data set used in the analysis is a sample of identical twins; this data is advantageous
for assessing the agglomeration premium because it is possible to contrast the wages of twins
in cities of different sizes. Such a contrast estimates the causal effect of agglomeration on
wages because it accounts for the unobserved component of ability, since the twin pairs
are genetically identical, and it also accounts for familial effects on earnings because the
twins also share the same family background. The evidence will show that in a cross-
sectional regression without controls for familial ability, there is a significant and large effect
of city size on wages, but this effect becomes insignificant in the within-twin analysis. More
importantly, controlling for familial ability causes the agglomeration premium to become
1
significantly reduced in many different specifications and econometric approaches. The
analysis uses two popular measures of city size to represent the effect of agglomeration:
the log of a city’s population and an indicator equal to one if the city’s population exceeds
500,000 residents. For both variables, the inclusion of a family-specific fixed effect eliminates
the city-size wage premium.
To address the robustness of these findings, two other data sets will also be incor-
porated into the analysis. First, the National Longitudinal Survey of Youth (NLSY) will
be employed because it is a longitudinal data set that also contains a large number of sib-
lings. Since there are few twins in this sample, it is not possible to assume that the siblings
are equally able because they are genetically identical; however, the NLSY does contain a
measure of unobserved ability since all respondents were assigned a score based upon their
performance on a series of standardized tests. As such, within-family differences in the
unobserved component of ability can be captured by differences in these standardized test
scores. The results will demonstrate that, as was the case with the data set of twins, there
is a large and significant effect of city size on wages in simple cross sectional regressions, but
the inclusion of a familial fixed effect into these regressions eliminates the significance of the
agglomeration premium.
The second data set that will be used to consider the robustness of the impact of
familial effects on the agglomeration premium is the five-percent Public Use Microdata Sam-
ple (PUMS) from the 2000 United States Census. The PUMS is a household-based sample,
and it contains information about each member of the household, as well as information re-
garding the familial relationship between the head of the household and all other household
members. As such, it is possible to identify two types of sibling sets in the data: household
heads and household members who are siblings of the household head, as well as children
of a household head who are living together in a given household. Furthermore, the infor-
mation collected by the Census makes it possible to determine the effect on wages of the
population of the public-use microdata area (PUMA) where the respondent works. This
2
measure of agglomeration does have a significantly positive effect on wages, but including
familial-specific fixed effects eliminates the significance of the agglomeration premium for
both sets of siblings. Overall, the evidence from all three data sets is remarkably con-
sistent and underlines the importance of controlling for familial ability when assessing the
magnitude and significance of the agglomeration premium.
2 Literature Review
The literature on the agglomeration wage premium documents the significant effect
of city size on wages1, and discusses how this premium is affected by selection, which arises
from the fact that the distribution of individual ability may not be the same across cities
of different sizes. Wheeler (2001) uses the 1980 IPUMS sample and finds that the return
to doubling city size is equal to approximately 3 percent. He also finds that this return
varies significantly for less- or more-educated workers: whereas there is not a significant
effect of city size on wages for workers without any high school education, college graduates
exhibit a return of approximately 4 percent if city size is doubled. Glaeser and Mare
(2001) study the urban wage premium by comparing individuals who live in a city of at
least 500,000 people to those who do not, and find that the wage difference between these
two groups is approximately 25 to 30 percent. After incorporating individual fixed effects
into their framework, Glaeser and Mare show that this premium significantly decreases to
between 4 to 10 percent, and the authors suggest that this decrease could be consistent with
the agglomeration premium being caused by ability bias. However, they also argue that
the fixed-effect framework does not address the higher wage growth evident within cities.
Concurrent with this notion, Glaeser and Mare find evidence of higher wage growth for
workers in cities, but it is smaller than the impact of the fixed effect on wages from living
in cities, and the authors suggest that this fixed effect is an important component of the
urban wage premium. Yankow (2006) also considers big-city wage premia by analyzing cities1One of the first papers to provide theoretical and empirical evidence of this idea is Roback (1982).
3
with at least one million residents, and cities between one-quarter million and one million
residents. Similar to Glaeser and Mare, he finds that fixed effects can account for about
two-thirds of the urban premium in this specification, and that growth effects can account for
some of the remaining premium. Wheeler (2006) also investigates the effect of cities on wage
growth, and specifically compares wage growth which occurs within a job to wage growth
resulting from job changes. Overall, he finds mixed evidence for higher wage growth in cities,
and that wage growth within a job is not significantly higher for workers at jobs in large
cities; it is only wage growth resulting from job changes that is significantly higher in large
cities (however, this difference is not significant in the fixed effect specification). Rosenthal
and Strange (2006) consider the transmission mechanisms through which the urban wage
premium is conveyed, and find that it is primarily due to the concentration of more-educated
workers in urban areas, and that this effect is attenuates with distance. After including
controls for endogeneity as well as using an instrumental variables procedure, the authors
find that an increase in population size has a significant effect on wages. Bacalod et. al.
(2007) create measures of skills to explore the urban premium, and they find that attributes
such as cognitive skills are, in general, uniformly distributed across cities of different size.
However, these skills are more highly valued in larger cities than smaller ones, and this
greater valuation is robust to the inclusion of AFQT scores and individual fixed effects in
the wage equation.
International evidence on the agglomeration premium is generally similar to the
evidence from U.S. data sources. Tabuchi and Yoshida (2000) use Japanese data to estimate
the urban wage premium, and find that wages should increase by 10 percent if city size
doubles. Combes, Duranton and Gobillon (2007) use a large French panel data set to
consider the impact of the area fixed effects on wages. In their analysis of French wages,
Combes et. al. find that individual fixed effects are, by far, the most important determinants
of wages as well as area fixed effects, which suggests that sorting on the basis of ability is
an important component of the urban wage premium. However, it should also be noted
4
that area fixed effects were not entirely eliminated by the incorporation of individual fixed
effects into a wage regression, and that agglomeration still plays a significant role in wage
determination.
3 Data and Estimation Approach
To consider the effect of familial ability on the urban wage premium, a common
estimation procedure will be used for all three data sets, and the analysis will begin with the
data set of identical twins. This data was collected during the summers of 1991, 1992, 1993
at the Twinsburg Twins Festival in Twinsburg, Ohio, and the interview questionnaires were
modeled after the Census and CPS instruments.2 The data are drawn from the sub-sample
of identical white twins,3 both of whom have worked within two years prior to the interview
and are living within the United States, and the key question for the purpose of the analysis
is the population of the city in which each sibling lives. Respondents were asked to report
the city in which they lived, and then this city’s population was separately entered into
the data based upon Census statistics.4 Table 1 displays the characteristics of the twins
sample, and compares them to white workers from reweighted5 CPS supplements. The data2Some of the data from the first three waves of this survey were used by Ashenfelter and Krueger (1994)
and Ashenfelter and Rouse (1998), who provide a discussion of the procedures used to collect this data.
Some additional questions were specifically designed for interviewing twins, such as the twin’s report of his
or her sibling’s educational attainment, which was used as an instrumental variable to account for the effect
of measurement error on the return to education.3Ashenfelter and Krueger (1994) and Ashenfelter and Rouse (1998) discuss the fact that, on average,
the black twins interviewed for the sample exhibited unrepresentative characteristics. As such, they were
dropped from the sample. However, this exclusion does not affect any of the main results presented in this
paper.4The respondent’s report of their hometown was matched against the city population provided by the
1990 U.S. Census.5Reweighting was conducted on the basis of the twin’s state of residence. As was the case with the results
in Ashenfelter and Rouse (1998), these differences have no large effect on the results in this paper. Also,
wage regressions using CPS and twin data yield very similar coefficients on all the variables in my wage
regressions.
5
set composed of identical twins is generally similar to the reweighted CPS samples, with
some small differences evident in characteristics like marital status.
The data set of identical twins provides a unique advantage in assessing the ag-
glomeration premium; specifically, it is possible to determine the causal effect of a city’s
population on wages by assuming that the unobserved component of ability is equal for
both twins. This implies that the difference in earnings between a twin in a large city
and his sibling in a small town will be attributed to the effect of city size on earnings, and
will not be biased by the unobserved component of ability. Figures 1 and 2 graphically
demonstrate the effect of making this assumption about the city-size premium. In Figure
1, the log of each twin’s hourly wage is plotted against the log of their city’s size, and a
positive fitted relationship is evident between these two variables.6 If this positive effect of
city size was not prone to ability bias, then comparing the within-twin difference in wages
with the within-twin difference in city size should yield a roughly similar result, given that
each pair of twins is assumed to be equally able. However, Figure 2 plots the within-twin
differences in wages and city size, and there is not a significant relationship between these
two within-twin differences — that is, a difference in city size is not observed to correlate
with a difference in wages for each twin pair.7 This suggests that the effect of ability bias
is important within the analysis of the agglomeration premium.
However, Figures 1 and 2 are only suggestive of the importance of ability bias,
and it is necessary to explore the agglomeration premium with a more formal econometric
framework. To operationalize this framework, it is assumed that ability has a linear effect
on earnings, and the earnings equations for each twin can be expressed as follows:
y1j = β01jX1j + α0Zj +Aj + ε1j
y2j = β02jX2j + α0Zj +Aj + ε2j
6In a simple bivariate regression of log wages on the log of city size, the coefficient on city size is 0.056
with a standard error of 0.012.7The bivariate regression of the within-twin difference in wages on the within-twin difference in city size
yields a coefficient of 0.014 with a standard error of 0.016 for the within-twin difference in city size.
6
where Xij represents a vector of individual characteristics for twin i from family j, Zj
represents common characteristics for family j, Aj is a family-specific ability term and εij
is an individual-specific error term. The identifying assumption of the model is that the
returns to individual characteristics Xij are the same for both twins, and that ability is
correlated between twins. Specifically, Aj is expressed as:
Aj = γ
µX1j +X2j
2
¶+ vj
These assumptions lead to the reduced-form correlated random-effects model (Chamberlain
1982):
y1j = βX1j + αZj + γ
µX1j +X2j
2
¶+ vj + ε1j
y2j = βX2j + αZj + γ
µX1j +X2j
2
¶+ vj + ε2j
where γ represents the correlation between a family’s ability level and each twin’s individual
characteristics. An attractive component of this model is that it provides estimates of both
γ, the effect of familial ability on wages, and β, the effect of individual-specific variables on
earnings.
An alternative estimation procedure that accounts for familial ability bias is the
fixed-effects model, which differences the two regressions used in the correlated random
effects model. The resulting equation is:
(y1j − y2j) = β(X1j −X2j) + (ε1j − ε2j)
Although the fixed-effect model yields unbiased estimates that are not correlated with ability,
it does not provide a direct estimate of γ.
Estimates from the OLS, correlated random effects and fixed effects models are
provided in Table 2, which displays results for earnings equations which use two different
measures to represent the agglomeration premium: the logarithm of the respondent’s city’s
population, and an indicator variable equal to one if the respondent’s city has a population
in excess of 500,000 residents. If familial ability had no effect on earnings, then the OLS
7
estimates displayed in Table 2 would provide an unbiased estimate of the effect of the ex-
ogenous regressors, including both variables used to capture the agglomeration premium.
Also, under these circumstances, the OLS and correlated random effects estimates would
differ only because of sampling error. However, this is not the case. Results in the first
three columns show that the coefficient for the log of city size differs dramatically depending
on the estimation procedure. Without controls for ability, the estimates in column one
show that the premium for doubling a city’s size is roughly four percent, which is within
the range of premia estimated by prior studies.8 However, the results in columns two and
three demonstrate that accounting for familial ability greatly reduces the significance and
the magnitude of this coefficient — the city size premium is basically reduced to zero, with a
correspondingly small t-value. In addition, the correlation between ability and city size is
large in magnitude and highly significant. These results suggest that unobserved ability is
a significant factor for determining the city size premium, which is consistent with studies
such as Combes et. al. (2007), amongst others.9
The last three columns of Table 2 display results from regressions which include an
indicator equal to one if the city’s population is greater than half of a million residents. The
findings in the fourth column demonstrate that residing in a city whose population exceeds
500,000 generates a wage premium of approximately 19 percent. Glaeser and Mare argued
that this effect may be due to selection (captured by an individual fixed effect) as well as
higher wage growth that occurs in cities. The results in columns five and six of this Table
attest to the importance of fixed effects, since they demonstrate that there is a large effect of8As previously discussed, Wheeler (2001) finds a premium of 3 percent for doubling city size. Also,
Bacolod et. al. find a premium of approximately 6 to 7 percent.9 It is also noted that the other estimated coefficients in the OLS model from the data set of twins (such
as the returns to education, marital status, and tenure) are similar to those in commonly used data sets.
Also, as demonstrated in earlier work, the return to education remains significant even after controlling for
familial ability, as does tenure, but marital status does not. Ashenfelter and Krueger (1994) and Ashenfelter
and Rouse (1998) showed that education remains significant even in the presence of a family fixed-effect,
while Krashinsky (2004) showed that the marital premium drops to zero after familial controls are included
in the regression.
8
familial ability on this wage premium. In fact, incorporating a familial fixed effect into the
econometric framework results in the premium becoming insignificant and significantly lower
than it was in the case where no such controls were included in the regression. This suggests
that, similar to the findings in the first three columns of the table, the agglomeration wage
premium for cities of at least 500,000 people is also prone to bias through unobserved ability.
Table 3 bifurcates the sample into men and women to determine whether or not
the effects of familial ability controls differ by gender, and uses both types of measures of
agglomeration employed in Table 2 — the log of the city’s population and an indicator equal to
one if the city’s population is over five-hundred thousand. The first four columns of Table 3
present the OLS and correlated-random effects estimates for men, and the last four columns
present the same estimates for women. The first and fifth columns show that the premium
associated with the log of a city’s population is highly significant for both men and women,
and roughly of the same magnitude for both groups — approximately 4 percent for women
and 5 percent for men. For both genders, however, columns two and six demonstrate that
the effect of controlling for familial ability is the same as in Table 2 — the city-size premium
is no longer significant, its magnitude is basically zero and it is significantly lower than it
was in the absence of familial controls. Columns three and seven display the return to
living in a city with a population of at least half of one million people, and for men, the
premium is large and highly significant. Columns four and eight demonstrate that with the
inclusion of familial controls, though, the coefficient on this variable is small in magnitude
and statistically insignificant.
Recent papers have also documented the value of various skills in cities. For instance,
Bacolod et. al. have found that certain types of skills are rewarded more in cities than outside
of cities. To that end, the city-size wage premium is investigated within a quantile regression
context to allow the agglomeration premium to differ for workers with different levels of skill,
and to assess the impact of a familial fixed effect within this context as well. Results from
quantile regressions are displayed in Table 4 for each of the different variables that represent
9
the effect of city size on wages, both with and without the inclusion of family controls in
the regression specification.10 The first two rows of Table 4 show the results from quantile
regressions at the 10th, 25th, 50th, 75th and 90th percentiles for models which use the log of
the city’s population as the independent variable representing the effect of city size. For all
five percentiles, city size has a significant effect on wages, and this effect grows in magnitude
at higher percentiles (which was also evident in Bacolod et. al.). However, for all five
cases, introducing controls for familial ability causes the premium to become statistically
insignificant, and significantly smaller than the return to city size in the absence of family
controls. This suggests that familial ability not only has an impact on the average return
to city size (as was demonstrated in Tables 2 and 3), but it also affects the agglomeration
premium throughout the wage distribution as well. The last two rows of Table 4 show
similar results for the quantile regressions which include a dummy variable equal to one if
the twin resides in a city whose population exceeds 500,000 people: at all five percentiles, the
effect of including familial controls reduces the magnitude of the effect of city size on wages.
Furthermore, at all percentiles except the 50th, the premium is significant in the absence of
familial controls, but insignificant with these controls.
Overall, the results in Tables 2 through 4 demonstrate that controlling for familial
ability between twins accounts for virtually all of the wage premium associated with city size,
and this is true for many different specifications. Familial controls had a significant effect
on the premium associated with the log of the city’s population and a dummy variable equal
to one if the city’s population exceeds half a million residents. Also, the family fixed effect
had a similar impact on the agglomeration premium evident in quantile regressions. This
suggests that the unobserved component of ability is a significant factor for explaining the
wage effects of agglomeration. However, given that the data set of twins is not a large data
set, it is important to demonstrate that the impact of familial effects on the agglomeration10For brevity’s sake, Table 4 only contains estimates from the OLS and correlated random effects models.
Within-twin estimates of the agglomeration premium in the quantile regression context are similar to results
using the correlated random effects model.
10
premium is also present in other data sets as well. As such, the agglomeration premium
will be analyzed in two separate data sets in the next section.
4 Results from the NLSY and the U.S. Census
To consider the robustness of the findings from the data set of twins, the analysis
will also explore the effect of familial ability on the agglomeration premium within the
National Longitudinal Survey of Youth (NLSY) and the five-percent Public Use Microdata
Sample (PUMS) from the 2000 United States Census. The NLSY contains several sets of
siblings because of its design: the data were assembled from an individual-level survey drawn
from households with youths between the ages of 14 and 22 in 1979. A large number of
households have multiple youths surveyed for the sample, and since the data also contains
longitudinal information about the urban status of each respondent’s town (well after they
move out of their parents’ home), it possible to use this data to compare the wages of
siblings in different areas. Similarly, the U.S. Census is a household-level survey which
contains information about the head of the household, as well as other members within the
household, so it is possible to make two types of sibling comparisons with this data. First,
since some households contain a household head and his or her sibling, and because the data
contain information on each respondent’s Public Use Microdata Area (PUMA) of work, a
wage comparison may be conducted for these siblings working in different PUMAs of work.
Second, similar comparisons can be made for children of the household head who are working
and still living with the household head. Overall, the evidence from the NLSY and Census
will demonstrate that familial fixed effects make the agglomeration premium statistically
insignificant and small in magnitude, which corroborates the findings from the data set of
twins.
Table 5 presents descriptive statistics from a sample drawn from the 1979 to 2004
waves of the NLSY for all respondents who work at least 15 hours per week and more than
26 weeks of the year, not including respondents from the two oversamples collected by the
11
NLSY.11 For comparability’s sake, the analysis is restricted to include only those siblings who
are within three years of age of each other, and also uses same-gender siblings — sisters are
compared to sisters and brothers to brothers — in order to avoid issues regarding differential
labor force participation for brother-sister pairs.12 Table 5 displays means from the entire
cross-sectional sample of the NLSY as well as means from the sample of same-gendered
siblings; generally, the samples are quite similar.
The first row displays the percentage of respondents who reside in “urban” areas,
which the NLSY classifies as a central core or city and its adjacent, closely settled territory
which have a combined total population of 25,000 or more. Although not as detailed
as the population of the respondent’s city, the “urban” indicator is the best measure of
agglomeration in the publicly-available files of the NLSY, and it is consistent with measures
used in studies which study agglomeration by analyzing cities with populations above and
below a given threshold. The results in the first row of the Table show that approximately
seventy-seven percent of the overall sample live in urban areas, and about eighty percent
of siblings of both genders live in urban places. The second through eighth rows of the
Table display various observable characteristics of the sample, including age, marital status
and average log wage. As it was with the results from the first row, the characteristics of
the overall sample from the NLSY are quite similar to the characteristics of the sample of
siblings.
The ninth row of the Table displays the adjusted average score from a standardized
test the respondents wrote, which is commonly referred to as the Air Force Qualifying Test
(AFQT). The score on this test is created from an amalgam of scores on a series of tests
known as the Armed Services Vocational Aptitude Battery (ASVAB); these tests were given11The NLSY is comprised of three main subsamples: the representative cross-sectional subsample, a mili-
tary oversample, and an oversampling of civilian Hispanic or Latino, black, and economically disadvantaged,
non-black/non-Hispanic youth. In order to use the most representative data, the two oversamples were
excluded from the analysis, and siblings were drawn from the representative cross-sectional subsample.12Since females have a lower probability of participating in the labor market than males, the exclusion of
siblings of different genders circumvents the need to model these participation decisions.
12
to virtually all respondents in the NLSY in 1980.13 However, since respondents varied in
age and education at the time of writing the test, it is necessary to adjust the scores for these
two factors when analyzing the test scores. As such, Table 5 presents an “adjusted” score
from the AFQT — it is the residual from a regression of the AFQT score on a respondent’s
age and education at the time of writing the tests. The advantage of this variable is that it
provides an approximate measure of the unobserved component of the respondent’s ability;
that is, it represents his or her aptitude above and beyond observable measures.14
This variable is useful in a regression context because it can assist in accounting for
within-sibling differences in ability. This is explored in Table 6, which displays results from
wage regressions which use a simple OLS procedure as well as a fixed-effects framework to
measure the urban premium for siblings in the NLSY. The first two columns of the Table
analyze a pooled sample of both brothers and sisters, and the results in the first column
indicate that there is a highly significant 13 percent return to living in an “urban” area,
even after controlling for observable characteristics. However, including a family-specific
fixed effect into the analysis significantly alters the agglomeration premium. The findings in
column two suggest that the agglomeration premium is slightly less than two percent, after
accounting for a family-specific fixed effect. In addition, these results are strengthened by the
fact that each sibling’s adjusted AFQT score is included in the regression. Unlike the data
set of twins, the siblings in the NLSY are not genetically identical, and it is plausible that
a family fixed effect may not capture all of the within-sibling differences in the unobserved
component of ability. However, the within-sibling difference in the adjusted AFQT score
should serve as a good proxy for any remaining portion of unobserved ability that is not
captured by the familial fixed effect.
The remaining four columns of Table 6 present results bifurcated for the sample of13In a few cases, the ASVAB tests were written in 1981.14There are many other ways of normalizing the AFQT measure, such as converting the raw test score to
a percentile score within each age cohort. All of the main findings from the NLSY are robust to the use of
different normalization adjustments for the AFQT score.
13
brothers and the sample of sisters, since the returns to individual regressors in the wage
equation may be different for men and women. Generally, though, the impact of a familial
fixed effect for each gender is the same as the results for the pooled sample in columns one
and two. Columns three and four demonstrate that the agglomeration premium for brothers
is statistically significant and approximately 13 percent in an OLS regression without any
family controls, but two-and-a-half percent (and only marginally significant) once familial
fixed effects are included within the regression. Similarly, columns five and six show that
the agglomeration premium for sisters is statistically significant and approximately twelve
percent in a simple OLS regression, but only one percent and statistically insignificant once
familial controls are included in the regression. Substantively, the results in columns three
through six do not alter any of the fundamental conclusions drawn in the first two columns
of the table: even separating the analysis by gender, the large and significant agglomeration
premium becomes small in magnitude and statistically insignificant at the five percent level
of significance after familial controls are included in the regression.
A further examination of the agglomeration premium in the NLSY is presented in
Table 7, which contains results from quantile regressions at the tenth, twenty-fifth, fiftieth,
seventy-fifth and ninetieth percentiles for the three subsamples considered in Table 6. The
first two columns present quantile regression results for the pooled sample of brothers and
sisters from the data, and the results are similar to those in Table 6 (and also the findings
from the quantile regression results from the data set of twins). In the first column, it
is demonstrated that there is a significant urban premium at all five percentiles, and this
premium is also increasing in magnitude at higher percentiles. However, the second column
demonstrates that including a family-specific fixed effect makes this premium small and
statistically insignificant in all but one case. When the siblings are analyzed by gender in
the remaining columns of the Table, similar findings are evident. The results in columns
three and five for brothers and sisters, respectively, demonstrate that the urban premium is
significant at all of the percentiles, and that the premium increasing at higher percentiles.
14
In columns four and six, the results show that the urban premium becomes insignificant
(and small in magnitude) in all cases. Overall, as was the case in Table 4, the findings
suggest that the urban premium increases at larger percentiles, but becomes statistically
insignificant throughout the wage distribution once familial fixed effects are included in the
regression specification.
The five-percent PUMS from the U.S. Census also allows for a within-sibling analysis,
as previously discussed, because it contains information about two main groups that will be
of particular use to the analysis: household heads and their siblings, and children of household
heads who are siblings and live in the home of the household head. Ideally, it would be
possible to identify siblings in different households (as was the case with the data set of
twins and the NLSY), however, the sample design makes such a comparison impossible.
This limitation has an effect on the types of individuals selected from the Census for this
analysis, since the sample of household heads who also have a sibling living with them (or
households with two working children still living at home) may not be representative of the
overall population. Table 8 confirms this notion by comparing sample means from the overall
census to means from the subsamples that will be used for the analysis. The first column
shows the means from the overall census population of male household heads, male children of
household heads and male siblings of household heads; comparing the characteristics of this
group to the male household heads who live with their male siblings (in column three), it is
clear that the latter sample is less educated, less wealthy and much younger than the former
sample. A similar comparison can be made with the females from the census; column two
reports sample means for all female household heads, siblings and children from the census,
and a comparison to the results from column five (for female household heads who also live
with their sisters) shows that the same differences are evident, although to a lesser degree —
there are only minor differences in hourly wages, suggesting that selection effects are more
minor for the female sample. Given the results from the sample of twins and from the
NLSY, it would be expected that the agglomeration premium would be smaller for siblings
15
from the Census (especially male siblings), given that they are drawn from lower percentiles
of the wage distribution.
Table 9 confirms this fact. Columns one and three show that there is a relatively
small agglomeration premium for the PUMA of work for both male household heads and their
siblings (approximately half of a percent) and for male children of household heads (approx-
imately one-and-a-half percent), and both are consistent with the returns to agglomeration
in the lower wage percentiles seen in the NLSY and data set of twins. However, columns
two and four demonstrate that, as was the case in the other data sets, familial fixed effects
eliminate the significance of the agglomeration premium for males, and significantly reduces
its magnitude. An analysis of female siblings from the Census reveals similar patterns, too.
Table 8 showed that female siblings within the Census appeared to be far more similar to
the overall sample of women in the Census — especially in regards to wages — than was the
case for men. As a result, the agglomeration premium for female siblings in the Census is
much more similar to that from the overall literature; columns five and seven reveal that
the agglomeration premium is approximately three percent for both female household heads
and their siblings, as well as female children living with the household head. Again, the
inclusion of a familial fixed effect in columns six and eight makes the premium insignificant
and substantially smaller in magnitude — virtually zero for both subsamples.
To further consider the agglomeration premium within the Census, Table 10 repli-
cates the analysis from Table 9, but instead of using the log of the population of the respon-
dent’s PUMA of work, the framework uses an indicator variable equal to one if population
of the respondent’s PUMA of work exceeds five-hundred thousand, and zero otherwise. The
results in Table 10 are highly similar to findings in Table 9: the first and third column
show that the two types of male siblings exhibit a significant wage premium (between six
and nine percent) for working in a PUMA whose population exceeds one-half million, but
columns two and four show that this premium becomes insignificant and small in magnitude
after family fixed effects are included in the framework. As well, the two female samples of
16
siblings show that this type of agglomeration premium is large and significant in the absence
of family fixed effects: columns five and seven report that these women exhibit a twelve to
fourteen percent premium for working in a PUMA whose population exceeded one-half of a
million people. However, these large premia became insignificant and small in magnitude
once familial fixed effects were included in the regression specification.
Overall, the NLSY and Census provide evidence that is consistent with findings from
the data set of twins, and the results reinforce the notion that the issue of selection is a highly
important factor when computing the agglomeration premium. One remaining issue for the
analysis, though, involves some econometric complications that can affect conclusions drawn
from any sibling-based framework; these issues will be discussed in the following section.
5 Within-Silbling Differences in Ability and Measure-
ment Error
The key issue raised in this study was the manner in which the wage premium
for agglomeration involves the sorting of workers inside or outside of cities — in particular,
that there may be a non-random sorting of more able workers into cities which creates this
premium. The assumption used to identify the causal effect of agglomeration on wages
is that the unobserved component of ability is captured by a familial fixed effect, and any
further sorting into or out of cities that occurs after accounting for this fixed effect is due to
factors unrelated to productivity, such as the preference for amenities or disamenities present
within cities.15 However, as it has been with other studies which use data on siblings, it
can be questioned whether or not a familial fixed effect actually captures all of a siblings’s
unobserved ability. In particular, it may be the case that even after including a family-based
fixed effect into the regression specification, within-sibling differences in ability still exist,
even with the inclusion of within-sibling differences in test scores, as was the case with data15Roback’s (1982) seminal work on this subject provides a model of individual choice to reside inside or
outside of cities.
17
from the NLSY. Both Neumark (1999) and Bound and Solon (1999) outlined the potential
biases that can affect within-twin estimates of the return to education, and the same biases
can affect within-sibling estimates of any other variable in the wage equation. If sibling
i’s individual-specific component of ability is denoted by the variable bAij, then the wageequations for each sibling can be written as:
y1j = βX1j + αZj + θAj + φ bA1j + ε1j
y2j = βX2j + αZj + θAj + φ bA2j + ε2j
In this case, the within-sibling estimates of β derived from a regression of ∆yj on ∆Xj are
not unbiased, because a within-sibling estimator will not a fully remove the effects of ability:
(y1j − y2j) = β(X1j −X2j) + φ( bA1j − bA2j) + (ε1j − ε2j)
∆yj = β∆Xj + φ∆ bAj +∆εj
and the resulting estimates of β are biased by the correlation of ∆A0j and ∆Xj:
bFE = (∆X0j∆Xj)
−1∆X 0j∆yj
= β + φ(∆X 0j∆Xj)
−1∆X 0j∆ bAj
It has been suggested that there exists a positive correlation withA0ij and a series of regressors
in the wage equation, such as education, marital status, tenure, and city size. Thus, the
row vector, ∆X 0j∆ bAj, would be expected to contain exclusively positive entries. The more
able sibling would also receive a higher wage than his or her counterpart, suggesting that
φ > 0, causing an upward bias in the estimation results for bFE. This lead Bound and Solon
and Neumark to suggest that the within-sibling estimates are upper-bounds of the unbiased
return to education, since it could be argued that differences in educational attainment
between the siblings were due to differences in unobserved ability that was not captured by
the family-specific fixed effect. However, this criticism is equally valid for any other variable
analyzed in the within-sibling framework, including city size, since it could be argued that
the more able sibling locates to a larger city.
18
Although the existence of within-sibling differences in ability may weaken conclusions
drawn about estimates of the return to education from the data set of siblings (in particular,
the data set of identical twins), it has favorable implications for the evidence on the city
size wage premium presented herein. Since differences in inter-sibling ability cause an
upward bias of the within-sibling fixed-effect estimator, then the fixed-effect estimate is an
upper-bound on the true value of the return to city size. However, because the fixed-effect
estimate of the city size wage premium is insignificant, then this suggests that the unbiased
coefficients also are insignificant (and possibly negative). Thus, the presence of any within-
sibling differences in ability would actually strengthen the conclusions drawn from the results
about the causal effects of city size on wages.
One additional consideration for the analysis is the potential effect of measurement
error. Many authors (Ashenfelter and Krueger (1994), Griliches (1979)) have demonstrated
that measurement error has an attenuating effect on coefficient estimates from a within-
sibling framework, so it could be the case that the within-sibling or family fixed effects
estimates of the urban premium are small because of these attenuating effects. However,
this is unlikely to be true, because little measurement error would be present for the variables
used to represent agglomeration in the analysis. In the NLSY and Census, the population of
a respondent’s city or PUMA of work is recorded through a relatively accurate administrative
record, not a relatively inaccurate self-report. Further, for the data set of identical twins,
each twin was asked for his or her city of residence, not the population of this place — the
population was coded into the data based upon each respondent’s report of their town. The
likelihood that a respondent misreported his or her hometown is exceptionally small, and as
such, for all three data sources, the accuracy of the variables used in the analysis is good.
Given this accuracy, the impact of attenuation bias due to measurement error should be
very small (if at all), and could not account for the change in the estimated agglomeration
premium in the presence of familial fixed effects.
19
6 Conclusion
The effect of agglomeration on wages is highly significant and large in magnitude. But
the question of the causal nature of this effect has been debated in the literature, which
remains divided on this subject. The evidence presented in this paper is derived from
multiple data sources and analyzed with an econometric approach that allows for the causal
return to city size to be estimated by using a family fixed-effect for samples of siblings,
including data from the U.S. Census, the NLSY and a sample of identical twins. The
results from all three data sources were remarkably consistent, and demonstrated that there
were not significant causal effects of many different variables used to represent the effect of
agglomeration on wages, such as: the log of a city’s population, an indicator variable equal
to one if the city had a population in excess of 500,000 residents, an indicator variable equal
to one if the city had a population in excess of 25,000 residents, and similar measures for
the population of the respondent’s PUMA of work. In a simple cross-sectional regression,
all of these variables exhibited significant and large effects on wages. However, these effects
became statistically insignificant and small in magnitude once controls for familial ability
were included within the regression framework. In addition, it was found that the effect of
controlling for familial ability was not only evident in regressions which estimated the average
effect of agglomeration on wages, but also in quantile regressions as well. These approaches
relate to the recent finding that agglomeration has greater effects for more skilled workers;
even though the agglomeration premium is higher for more able workers, controlling for
familial ability causes the city size wage premium to become insignificant for both less- and
more-skilled workers. Overall, the evidence suggests that familial ability plays a significant
role in the effect of city size on wages.
20
References
[1] Aaronson, Daniel. “Using Sibling Data to Estimate the Impact of Neighborhoods onChildren’s Educational Outcomes.” Journal of Human Resources, Autumn 1998, 33(4),pp. 915-946.
[2] Ashenfelter, Orley and Alan Krueger. “Estimates of the Economic Return to Schoolingfrom a new Sample of Twins.” American Economic Review, December 1994, 84(5), pp.1157-1173.
[3] Ashenfelter, Orley and Cecilia Rouse. “Income, Schooling and Ability: Evidence froma New Sample of Twins.” Quarterly Journal of Economics, February 1998, 113(1), pp.253-284.
[4] Bacalod, Marigee; Blum, Bernado S. and Strange, William C. “Skills in the City”Mimeo, University of Toronto, 2007.
[5] Bound, John and Gary Solon. “Double Trouble: On the Value of Twins-Based Estima-tion of the Return to Schooling.” Economics of Education Review, April 1999, 18(2),pp. 169-182.
[6] Chamberlain, Gary. “Multivariate Regression Models for Panel Data.” Journal ofEconometrics, January 1982, 18(1), pp. 5-46.
[7] Combes, Pierre-Phillippe; Duranton, Gilles; and Gobillon, Laurent. “Spatial WageDisparities: Sorting Matters!” Journal of Urban Economics, forthcoming.
[8] Glaeser, Edward L. and Mare, David C. “Cities and Skills.” Journal of Labor Eco-nomics, 19(2), April 2001, pp. 316 - 342.
[9] Griliches, Zvi. “Sibling Models and Data in Economics: Beginnings of a Survey.” Jour-nal of Political Economy, October 1979, 87(5), Part 2, pp. S37-S64.
[10] Krashinsky, Harry A. “Do Marital Status and Computer Use Really Change the WageStructure?” Journal of Human Resources, Summer 2004, pp. 774-791.
[11] Neumark, David. “Biases in Twin Estimates of the Return to Schooling.” Economicsof Education Review, April 1999, 18(2), pp. 143-148.
[12] Roback, Jennifer. “Wages, Rents and the Quality of Life.” Journal of Political Econ-omy, 90(6), December 1982, pp. 1257 - 1278.
[13] Rosenthal, Stuart S. and Strange, William C. “The Attenuation of Human CapitalSpillovers.” Working paper, University of Toronto, 2006.
[14] Tabuchi, Takatoshi and Yoshida, Atsushi. “Separating Urban AgglomerationEconomies in Consumption and Production.” Journal of Urban Economics, 48, July2000, pp. 70-84.
21
[15] Wheeler, Christopher H. “Search, Sorting, and Urban Agglomeration.” Journal ofLabor Economics, 19(4), October 2001, pp. 879 - 899.
[16] Wheeler, Christopher H. “Cities and the Growth of Wages Among Young Workers:Evidence from the NLSY.” Journal of Urban Economics, 60, September 2006, pp.162-184.
[17] Yankow, Jeffrey J. “Why Do Cities Pay More? An Empirical Examination of SomeCompeting Theiries of the Urban Wage Premium.” Journal of Urban Economics, 60,September 2006, pp. 139-161.
22
Table 1: Sample Means for Twins Data and Reweighted CPS MORG data
Twins Data CPS Data
Age 37.71 (11.19)
37.34 (11.44)
Hourly Wage 14.26 (12.90)
13.90 (8.80)
Years of Education 13.97 (2.04)
13.61 (3.14)
Married 0.492 (0.494)
0.596 (0.491)
Female 0.585 (0.493)
0.480 (0.500)
Married Female 0.315 (0.461)
0.270 (0.443)
Unionized 0.210 (0.403)
0.220 (0.415)
Standard deviations in parentheses. The CPS sample is drawn from the 1993 outgoing rotation groups, and consists of respondents between the ages of 18 and 65, who work full-time and earn real hourly wages of at least $2.50 per hour and no more than $100 per hour. The data on identical twins was collected from 1991 to 1993, and has the same age and hourly earnings restrictions as the CPS data. CPS data are reweighted on the basis of the geographic location of individuals in the data set of twins.
Table 2: Pooled Sample Wage Regressions from the Twins Data
OLS (1)
CRE (2)
Fixed-Effects
(3) OLS
(4) CRE (5)
Fixed-Effects
(6)
Log(City Population) 0.043***
(0.009) 0.001
(0.015) 0.001
(0.015) … … …
Family Average Log(City Population)
0.056***
(0.021) … … …
City Population ≥ 500,000
… … … 0.188***
(0.068) -0.075 (0.105)
-0.075 (0.105)
Family Average City Population ≥ 500,000
… … … 0.346** (0.151)
Education 0.102***
(0.011) 0.074***
(0.018) 0.074***
(0.018) 0.108***
(0.011) 0.075*** (0.018)
0.075***
(0.018) Family Average Education
0.030 (0.023)
0.037 (0.023)
Married 0.260***
(0.069) 0.085
(0.086) 0.085
(0.086) 0.237***
(0.070) 0.085
(0.086) 0.085
(0.086) Family Average Marital Status
0.250* (0.130)
0.213 (0.131)
Married Female -0.235***
(0.085) -0.008 (0.114)
-0.049 (0.111)
-0.240***
(0.086) -0.052 (0.111)
-0.052 (0.111)
Family Average Married Female
-0.227 (0.166)
-0.249 (0.167)
Covered by a Union 0.077*
(0.048) 0.096* (0.058)
0.096* (0.058)
0.079 (0.049)
0.094 (0.058)
0.094 (0.058)
Family Average Union Coverage
-0.044 (0.100)
-0.041 (0.102)
Tenure 0.025***
(0.003) 0.024***
(0.004) 0.024***
(0.004) 0.024***
(0.003) 0.023*** (0.004)
0.023***
(0.004) Family Average Tenure 0.003
(0.006) 0.002
(0.006)
Female -0.153***
(0.060) -0.131* (0.076)
-0.155** (0.061)
-0.126 (0.077)
N 526 526 263 526 526 263 Standard Errors are listed in parentheses. *** Significant at the 1% level, ** Significant at the 5% level, * Significant at the 10% level OLS and CRE regressions also include age and age squared terms.
Table 3: Wage Regressions for Male and Female Subsamples of the Twins Data
Standard Errors are listed in parentheses. *** Significant at the 1% level, ** Significant at the 5% level, * Significant at the 10% level. The first four columns of the table display regression results for the male sample of twins and the last four columns display regression results for the female sample of twins. The columns entitled OLS report results from ordinary least squares regressions which do not include familial fixed effects, and the columns entitled CRE report results from an correlated random effects model which does include a familial fixed effect. The first two rows of the table report results from a set of wage regressions which use as their measure of agglomeration the logarithm of the respondent’s city’s population. The last two rows of the table report results from a set of wage regressions which use as their measure of agglomeration an indicator variable equal to one if the respondent resides in a city whose population exceeds 500,000 people. In addition to the variables representing agglomeration, all regressions also include the same variables used in Table 2, including: age and its square, education, a female indicator variable, marital status and its interaction with the female indicator variable, an indicator equal to one if the individual is covered by a labor union, and tenure on the current job.
Men Women OLS
(1) CRE (2)
OLS (3)
CRE (4)
OLS (5)
CRE (6)
OLS (7)
CRE (8)
Log(City Population) 0.049*** (0.016)
0.007 (0.023)
… … 0.038*** (0.012)
-0.003 (0.020)
… …
Avg. Log(City Population)
0.063* (0.036)
… … 0.048* (0.026)
… …
City Population ≥ 500,000 … … 0.303***
(0.116) -0.138 (0.149)
… … 0.045 (0.077)
0.002 (0.151)
Family Average City Population ≥ 500,000
… … 0.623*** (0.241)
… … … 0.023 (0.198)
Table 4: Quantile Wage Regressions for the Effect of Log of City Population and for the Effect of City Population Exceeding 500,000 for the Twins Data
Standard Errors are listed in parentheses. *** Significant at the 1% level, ** Significant at the 5% level, * Significant at the 10% level. The first two columns of the table (labeled “10th Percentile”) display quantile regression results for sample of twins at the tenth percentile, and the following pairs of columns display results from the twenty-fifth, fiftieth, seventy-fifth and ninetieth percentiles. The first two rows of the table report results from a set of wage regressions which use as their measure of agglomeration the logarithm of the respondent’s city’s population. The last two rows of the table report results from a set of wage regressions which use as their measure of agglomeration an indicator variable equal to one if the respondent resides in a city whose population exceeds 500,000 people. In addition to the variables representing agglomeration, all regressions also include the same variables used in Table 2, including: age and its square, education, a female indicator variable, marital status and its interaction with the female indicator variable, an indicator equal to one if the individual is covered by a labor union, and tenure on the current job.
10th Percentile 25th Percentile 50th Percentile 75th Percentile 90th Percentile OLS
(1) CRE (2)
OLS (3)
CRE (4)
OLS (5)
CRE (6)
OLS (7)
CRE (8)
OLS (9)
CRE (10)
Log(City Population) 0.034**
(0.015) -0.006 (0.021)
0.033*** (0.012)
-0.005 (0.014)
0.031** (0.013)
-0.003 (0.014)
0.044*** (0.017)
-0.004 (0.018)
0.062*** (0.020)
-0.003 (0.031)
Family Average Log(City Population)
0.057** (0.028)
0.048** (0.023)
0.056** (0.026)
0.065** (0.029)
0.077** (0.035)
City Population ≥ 500,000
0.291** (0.114)
0.055 (0.073)
0.161** (0.075)
0.009 (0.124)
0.134 (0.082)
0.019 (0.103)
0.251** (0.122)
-0.038 (0.219)
0.231* (0.139)
-0.576 (0.328)
Family Average City Population ≥ 500,000
0.257** (0.129) 0.206
(0.143) 0.285 (0.204) 0.332
(0.258) 0.916** (0.349)
Table 5: Sample Means for Siblings and Overall Sample in the NLSY from 1979 to 2004
Entire Sample Pooled
Brothers and Sisters
Brothers Sisters
Urban 0.768
(0.422) 0.790
(0.407) 0.780
(0.414) 0.801
(0.399)
Age 29.38 (6.908)
28.70 (6.894)
28.56 (6.780)
28.85 (7.017)
Log Hourly Wage 2.201
(0.535) 2.203
(0.536) 2.267
(0.551) 2.131
(0.510)
Years of Education
12.90 (2.392)
12.98 (2.292)
12.71 (2.402)
13.27 (2.124)
Female 0.487
(0.500) 0.473
(0.500) … …
Married 0.473
(0.499) 0.444
(0.497) 0.433
(0.496) 0.455
(0.498)
Collective Bargaining
0.154 (0.361)
0.169 (0.375)
0.181 (0.385)
0.156 (0.362)
Tenure (in years) 12.47
(22.48) 12.73
(23.63) 12.84
(24.57) 12.62
(22.55)
Adjusted AFQT 4.848 (17.57)
3.694 (18.41)
3.419 (19.18)
4.000 (17.52)
N 74,491 14,197 7,480 6,717
Standard deviations are listed in parentheses. The NLSY sample is limited to respondents with a sibling who work at least 15 hours per week and whose wage exceeds $2/hour (in 1992 dollars).
Table 6: Wage Regressions from the NLSY With and Without a Familial Fixed Effect
Pooled Sample of Siblings Brothers Sisters OLS
(1) Family FE (2) OLS
(3) Family FE
(4) OLS (5)
Family FE (6)
Urban 0.127***
(0.017) 0.017*
(0.010) 0.134***
(0.025) 0.026* (0.014)
0.115*** (0.023)
0.011 (0.015)
Education 0.086***
(0.004) 0.091***
(0.003) 0.083***
(0.006) 0.089***
(0.004) 0.087***
(0.007) 0.096***
(0.004) Married 0.180***
(0.021) 0.151***
(0.010) 0.178***
(0.022) 0.137***
(0.011) 0.010
(0.022) -0.005 (0.011)
Married Female -0.173***
(0.027) -0.167***
(0.013) … … …
Collective Bargaining
0.154*** (0.017)
0.141*** (0.009)
0.162*** (0.022)
0.146*** (0.012)
0.139*** (0.027)
0.132*** (0.014)
Tenure 0.002***
(0.000) 0.001***
(0.000) 0.002***
(0.0005) 0.001***
(0.000) 0.003***
(0.001) 0.002***
(0.000) Female -0.067***
(0.021) 0.127***
(0.044) … … …
Adjusted AFQT 0.005***
(0.000) 0.005***
(0.000) 0.006***
(0.001) 0.006***
(0.001) 0.005***
(0.001) 0.004***
(0.001) N 14,197 14,197 7,480 7,480 6,717 6,717
Standard Errors are listed in parentheses and are clustered at the household level. *** Significant at the 1% level, ** Significant at the 5% level, * Significant at the 10% level. All regressions also include individual experience and its square, as well as eight indicator variables to capture industry effects. Columns one and two (entitled “Pooled Sample of Siblings”) report regression estimates from a sample of brothers and sisters from the NLSY; the next two columns only consider the subsample of brothers, and the last two columns only consider the subsample of sisters. Columns two, four and six, entitled “Family FE”, report results from regressions which include a family-specific fixed effect.
Table 7: Agglomeration Premium from Quantile Wage Regressions from the NLSY With and Without a Familial Fixed Effect
Pooled Sample of Siblings Brothers Sisters No Family FE
(1) Family FE (2) No Family FE
(3) Family FE
(4) No Family FE (5)
Family FE (6)
10th Percentile 0.073***
(0.014) -0.032
(0.034) 0.048**
(0.020) 0.010
(0.049) 0.080***
(0.019) -0.037 (0.039)
25th Percentile 0.094***
(0.011) 0.008
(0.023) 0.110***
(0.014) 0.040
(0.037) 0.096***
(0.016) 0.004
(0.034) 50th Percentile 0.123***
(0.010) 0.037
(0.025) 0.140***
(0.014) 0.052
(0.036) 0.114***
(0.015) 0.029
(0.029) 75th Percentile 0.132***
(0.011) 0.041
(0.027) 0.142***
(0.014) 0.035
(0.033) 0.121***
(0.015) 0.038
(0.031) 90th Percentile 0.141***
(0.015) 0.111***
(0.035) 0.125***
(0.020) 0.014
(0.045) 0.129***
(0.022) 0.012
(0.046) N 14,197 14,197 7,480 7,480 6,717 6,717
Standard Errors are listed in parentheses and are clustered at the household level. *** Significant at the 1% level, ** Significant at the 5% level, * Significant at the 10% level. In addition to the variables representing agglomeration, all regressions also include the same variables used in Table 6, including: experience and its square, education, a female indicator variable, marital status and its interaction with the female indicator variable, an indicator equal to one if the individual is covered by a collective bargaining agreement, tenure on the current job, adjusted AFQT score, as well as eight indicator variables to capture industry effects. Columns one and two (entitled “Pooled Sample of Siblings”) report regression estimates from a sample of brothers and sisters from the NLSY; the next two columns only consider the subsample of brothers, and the last two columns only consider the subsample of sisters. Columns two, four and six, entitled “Family FE”, report results from regressions which include a family-specific fixed effect within the regression.
Table 8: Sample Means for Siblings and Overall Sample from the Five-Percent PUMS of the 2000 U.S. Census
Entire Male
Sample (1)
Entire Female
Sample (2)
Male Household Heads and Male
Siblings (3)
Male Children of Household
Head (4)
Female Household Heads and
Female Siblings
(5)
Female Children of Household
Head (6)
Log of Workplace PUMA Population
12.80 (1.369)
12.92 (1.475)
13.30 (1.692)
13.00 (1.626)
13.27 (1.775)
13.12 (1.708)
Age 40.22 (10.45)
38.39 (11.05)
32.80 (9.382)
27.44 (7.866)
36.08 (10.76)
27.47 (8.050)
Log Hourly
Wage 2.771
(0.716) 2.494
(0.638) 2.360
(0.625) 2.231
(0.577) 2.405
(0.609) 2.194
(0.569)
Years of Education
13.69 (2.800)
13.93 (2.488)
11.55 (3.757)
12.59 (2.439)
13.33 (2.862)
13.37 (2.347)
Married 0.688 (0.463)
0.197 (0.398)
0.314 (0.464)
0.071 (0.256)
0.108 (0.310)
0.064 (0.245)
N 2,261,412 874,757 42,688 46,095 21,529 25,944
Standard deviations are listed in parentheses. The sample is limited to respondents with a sibling who work at least 15 hours per week and whose wage exceeds $2/hour. The first column reports sample means from the sample of male household heads, male siblings of household heads and male children of household heads, and the second column reports sample means from the equivalent female sample. The third column reports sample means for male household heads and their male siblings (from households which have both types of people present), and the fifth column presents sample means from the equivalent female sample. The fourth column reports sample means for male children of household heads (in households with at least two male children who work), and the sixth column presents sample means from the equivalent female sample.
Table 9: Wage Regressions from the Census With and Without a Familial Fixed Effect
Male Household Heads and Male Siblings Male Children of
Household Head Female Household Heads and Female Siblings Female Children of
Household Head OLS
(1) Family FE(2) OLS
(3) Family FE
(4) OLS (5)
Family FE(6) OLS
(7) Family FE
(8) Log of Workplace PUMA Population
0.006*** (0.002)
-0.006 (0.004)
0.017*** (0.002)
-0.0004 (0.004)
0.027***
(0.002) 0.001
(0.006) 0.029***
(0.002) -0.005 (0.005)
Education 0.059***
(0.001) 0.037***
(0.002) 0.061***
(0.001) 0.045***
(0.003) 0.087***
(0.002) 0.069***
(0.003) 0.087***
(0.002) 0.066***
(0.003) Experience 0.027***
(0.001) 0.032***
(0.002) 0.037***
(0.001) 0.037***
(0.002) 0.028***
(0.001) 0.032***
(0.003) 0.040***
(0.001) 0.045***
(0.003) Experience2/100 -0.039***
(0.002) -0.045***
(0.004) -0.071***
(0.004) -0.062***
(0.006) -0.040***
(0.003) -0.039***
(0.007) -0.073***
(0.005) -0.083***
(0.008) Married 0.056***
(0.006) 0.117***
(0.010) 0.045***
(0.011) 0.005
(0.017) -0.027**
(0.013) -0.001 (0.021)
0.010 (0.013)
0.022 (0.023)
N 42,688 42,688 46,090 46,090 21,529 21,529 25,944 25,944 Standard Errors are listed in parentheses and are clustered at the household level. *** Significant at the 1% level, ** Significant at the 5% level, * Significant at the 10% level. All regressions also include eight indicator variables to capture industry effects. The first and second columns report regression results for male household heads and their male siblings (from households which have both types of people present), and the fifth and sixth columns present regression results from the equivalent female sample. The third and fourth columns report regression results for male children of household heads (in households with at least two male children who work), and the seventh and eighth columns presents regression results from the equivalent female sample. Columns two, four and six, entitled “Family FE”, report results from regressions which include a family-specific fixed effect within the regression.
Table 10: Wage Regressions from the Census With and Without a Familial Fixed Effect
Male Household Heads and Male Siblings Male Children of
Household Head Female Household Heads and Female Siblings Female Children of
Household Head OLS
(1) Family FE(2) OLS
(3) Family FE(4) OLS
(5) Family FE
(6) OLS (7)
Family FE(8)
Workplace PUMA Population ≥ 500,000
0.062***
(0.006) -0.005
(0.016) 0.084***
(0.006) 0.025
(0.017) 0.140***
(0.008) 0.038
(0.023) 0.123***
(0.007) 0.006
(0.022) Education 0.058***
(0.001) 0.037***
(0.002) 0.061***
(0.001) 0.045***
(0.003) 0.085***
(0.002) 0.069***
(0.003) 0.087***
(0.002) 0.066***
(0.003) Experience 0.027***
(0.001) 0.032***
(0.002) 0.037***
(0.001) 0.037***
(0.002) 0.028***
(0.001) 0.032***
(0.003) 0.040***
(0.001) 0.045***
(0.003) Experience2/100 -0.039***
(0.003) -0.045***
(0.005) -0.072***
(0.004) -0.062***
(0.006) -0.041***
(0.003) -0.039***
(0.007) -0.072***
(0.005) -0.083***
(0.008) Married 0.059***
(0.006) 0.117***
(0.010) 0.047***
(0.011) 0.005
(0.017) -0.026**
(0.013) -0.001 (0.021)
0.009 (0.013)
0.022 (0.023)
N 42,688 42,688 46,090 46,090 21,529 21,529 25,944 25,944 Standard Errors are listed in parentheses and are clustered at the household level. *** Significant at the 1% level, ** Significant at the 5% level, * Significant at the 10% level. All regressions also include eight indicator variables to capture industry effects. The first and second columns report regression results for male household heads and their male siblings (from households which have both types of people present), and the fifth and sixth columns present regression results from the equivalent female sample. The third and fourth columns report regression results for male children of household heads (in households with at least two male children who work), and the seventh and eighth columns presents regression results from the equivalent female sample. Columns two, four and six, entitled “Family FE”, report results from regressions which include a family-specific fixed effect within the regression.