DICION PAPER ERIEftp.iza.org/dp11878.pdf · IZA DP No. 11878 Sonia Bhalotra Damian Clarke The Twin...

DISCUSSION PAPER SERIES

IZA DP No. 11878

Sonia BhalotraDamian Clarke

The Twin Instrument: Fertility and Human Capital Investment

OCTOBER 2018

Any opinions expressed in this paper are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but IZA takes no institutional policy positions. The IZA research network is committed to the IZA Guiding Principles of Research Integrity.The IZA Institute of Labor Economics is an independent economic research institute that conducts research in labor economics and offers evidence-based policy advice on labor market issues. Supported by the Deutsche Post Foundation, IZA runs the world’s largest network of economists, whose research aims to provide answers to the global labor market challenges of our time. Our key objective is to build bridges between academic research, policymakers and society.IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character. A revised version may be available directly from the author.

Schaumburg-Lippe-Straße 5–953113 Bonn, Germany

Phone: +49-228-3894-0Email: [email protected] www.iza.org

IZA – Institute of Labor Economics

DISCUSSION PAPER SERIES

IZA DP No. 11878

The Twin Instrument: Fertility and Human Capital Investment

OCTOBER 2018

Sonia BhalotraUniversity of Essex and IZA

Damian ClarkeUniversidad de Santiago de Chile and IZA

ABSTRACT

IZA DP No. 11878 OCTOBER 2018

The Twin Instrument: Fertility and Human Capital Investment*

Twin births are often used to instrument fertility to address (negative) selection of women

into fertility. However recent work shows positive selection of women into twin birth.

Thus, while OLS estimates will tend to be downward biased, twin-IV estimates will tend

to be upward biased. This is pertinent given the emerging consensus that fertility has

limited impacts on women’s labour supply, or on investments in children. Using data for

developing countries and the United States, we demonstrate the nature and size of the

bias in the twin-IV estimator of the quantity-quality trade-off and estimate bounds on the

true parameter.

JEL Classification: J12, J13, C13, D13, I12

Keywords: twins, fertility, maternal health, quantity-quality trade-off, parental investment, bounds, IV

Corresponding author:Sonia BhalotraISER & Department of EconomicsUniversity of EssexWivenhoe ParkColchster CO4 3SQUnited Kingdom

E-mail: [email protected]

* We are grateful to Paul Devereux, James Fenske, Judith Hall, Christian Hansen, Martin Karlsson, Toru Kitagawa,

Magne Mogstad, Rohini Pande, Adam Rosen, Paul Schulz, Margaret Stevens, Atheen Venkataramani and Marcos

Vera-Hernandez, along with various seminar audiences and discussants for helpful comments and/or sharing data.

Any remaining errors are our own. An earlier version of this paper appeared as “The Twin Instrument”, IZA DP 10405,

December 2016. A revised version of Part 1 of this working paper is forthcoming at The Review of Economics and

Statistics. This paper contains a revised version of Part 2 of the working paper.

1 Introduction

Following Becker (1960), fertility has been modeled jointly with investments in children and

with women’s labour force participation. In line with the average tendency for negative

selection into fertility, linear least squares estimates of associations of fertility with children’s

human capital, and with women’s employment tend to be upward biased. Since the pioneering

work of Rosenzweig and Wolpin (1980a,b), a considerable literature has attempted to address

selection by using twins to instrument fertility (see Appendix Table A1). The premise is that

twin births are quasi-random, so that the event of a twin birth constitutes a “natural” natural

experiment (Rosenzweig and Wolpin, 2000).

In a recent paper (Bhalotra and Clarke, forthcoming), we presented new population-level

evidence that challenges this premise. Using individual data for 17 million births in 72 coun-

tries, we demonstrated that indicators of the mother’s health, her health-related behaviours,

and the prenatal health environment are systematically positively associated with the prob-

ability of a twin birth. The estimated associations are large, evident in richer and poorer

countries, evident even among women who do not use IVF, and hold for sixteen different

measures of health. We provided evidence that selective miscarriage is the likely mechanism.

The upshot of our findings is that women who have twin births are positively selected on

unobservables related to health. If, as is plausible (and we will demonstrate), those unob-

servables are correlated with child human capital or with women’s labour force participation,

then twin-instrumented estimates of the relationship between fertility and child outcomes, or

women’s labour supply will tend to be upward biased, moving towards a null-estimate.

This is pertinent as it could resolve the ambiguity of the available evidence on these

relationships. Recent studies using the twin instrument reject the presence of a quantity–

quality (QQ) fertility trade-off (Black et al., 2005; Angrist et al., 2010), challenging a long-

standing theoretical prior of Becker (1960); Becker and Lewis (1973); Becker and Tomes (1976).

Similarly, research using the twin instrument finds that additional children have relatively little

2

influence on women’s labour market participation, at least after the first few years (Rosenzweig

and Wolpin, 1980a; Bronars and Grogger, 1994; Jacobsen et al., 1999; Vere, 2011). In principle,

addressing the omission of maternal health related variables could adjust for the downward

bias in these studies, and provide a true estimate of the trade-offs. In practice, maternal health

is multi-dimensional and almost impossible to fully measure and adjust for. To take a few

examples, foetal health is potentially a function of whether pregnant women skip breakfast

(Mazumder and Seeskin, 2015), whether they suffer bereavement in pregnancy (Black et al.,

2016), and fetal exposure to air pollution (Chay and Greenstone, 2003).

In this paper we investigate how inference in a literature concerned with causal effects

of fertility on human capital can proceed with partial adjustment and bounding. We first

illustrate the hypothesized direction of the bias of the twin-IV estimator, by introducing

available controls for maternal health in the estimation. Since this adjustment is necessarily

partial, we proceed to estimate bounds on the IV estimates. Given that the first stage (twins

predicting fertility) is powerful, we follow Conley et al. (2012) in estimating bounds on the

premise that twin births are plausibly if not strictly exogenous. In a sensitivity check, we also

estimate bounds under the different assumptions of Nevo and Rosen (2012), again using twin

births as an “imperfect instrumental variable”.

We provide estimates for the US using about 225,000 births, drawn from the US National

Health Interview Surveys (NHIS) for 2004-2014, and for a pooled sample of developing coun-

tries, containing more than 1 million births in 68 countries over 20 years, available from the

Demographic and Health Surveys, or DHS. These data are chosen because they contain infor-

mation on child outcomes and maternal health. Consistently using these two samples allows

us to assess the generality of our findings, and it allows that the relationship of interest, as

well as the violation of the exclusion restriction that concerns us, are different in richer vs.

poorer countries.

We start by briefly demonstrating, on the particular data samples used in this analysis,

3

our earlier result that the probability of twin birth is significantly positively associated with

indicators of maternal health. We then set the stage by showing the routine OLS and twin-IV

estimates on our data samples. The OLS estimates suggest a fertility-human capital trade-off

and, following (Altonji et al. (2005)) to gauge the importance of unobservables, we conclude

that accounting for unobservables is unlikely to dissolve the trade-off. The twin-IV estimates

replicate, in our samples, the finding in recent studies that there is no discernible trade-off.

However, adjusting for available maternal health related characteristics, even though these are

only a small subset of the range of relevant indicators, leads to emergence of a QQ trade-off.

This finding generalizes to recent non-linear models of the QQ trade-off (Brinch et al., 2017;

Mogstad and Wiswall, 2016), holding even when the impact of fertility is allowed to vary by

parity. For instance, in samples with at least three births, an additional child is associated

with lower human capital outcomes for the first two births: this is estimated as 0.05 s.d. for

years of education in developing countries, and 0.06 s.d. for an index of child health in the

US, and in the sample with at least two births it is 0.10 s.d. for grade progression in the US

(or 0.22 fewer grades progressed).

The bounds also confirm the presence of a trade-off at certain parities for education or

health outcomes.The lower bound is -0.05 to -0.06 s.d. for education in developing countries,

-0.13 to -0.24 s.d. for education in the USA and -0.02 to -0.10 of a s.d. for child health in the

USA.1 Observe that the trade-off is no smaller in the USA than in developing countries. This is

important given that the recent studies arguing there is no trade-off are set in richer countries,

and a natural reconciliation of these results with earlier studies proposed is that the trade-off

may exist but only in poorer countries where a larger share of families is credit constrained.

This said, the US sample is considerably smaller than the developing country sample and

bounds are correspondingly wider. As a result, bounds are uniformly more informative in the

developing country sample.

1Each is a range because the coefficient varies with whether twins occur at the second, third or fourth birth order.We place these effect sizes in perspective in section 5.2.4.

4

The results indicate that marginal increases in fertility often lead to diminished investments

in the human capital of children, and the trade-off is not negligibly small. This is important,

especially in view of growing evidence of the long run dynamic benefits of childhood invest-

ments (Heckman et al., 2013). These estimates put back on the stage the issue of a potential

human capital cost to fertility. Governments actively devise policies to influence fertility, for

instance, countries like China have penalized fertility, while many countries including Italy

and Canada have incentivized it, often with non-linear rules.2 Moreover, advocates of policies

encouraging smaller families rest their case on larger families investing less in the quality of

each child, limiting human capital accumulation and living standards (Galor and Weil, 2000;

Moav, 2005).

2 The Fertility-Investment Trade-off and the Twin Instrument

A long-standing theoretical result in the literature on human capital formation is the existence

of a quantity-quality (QQ) trade-off (Becker, 1960; Becker and Lewis, 1973; Willis, 1973;

De Tray, 1973; Becker and Tomes, 1976). The essential idea of these studies is that the

shadow price of child quality is increasing in child quantity and vice versa. This provides

behavioural micro-foundations consistent with an empirical regularity that has been noted in

cross-sectional and time series data, which is that children from large families have weaker

educational outcomes (Hanushek, 1992; Blake, 1989; Galor, 2012). We replicate this pattern

using our data samples from the USA and developing countries (see Appendix Figures A1 and

A2).

However, empirical evidence of a QQ trade-off is ambiguous. Early work including Hanushek

(1992) and Rosenzweig and Wolpin (1980a) documented significant negative effects of addi-

2As discussed in Mogstad and Wiswall (2016), families with children receive special treatment under the tax andtransfer provisions in 28 of the 30 Organization for Economic Development and Cooperation countries (OECD (2002)).Many of these policies are designed such that they reduce the cost of having a single child more than the cost of havingtwo or more children, in effect promoting smaller families. For example, welfare benefits or tax credits are, in manycases, reduced or even cut off after reaching a certain number of children.

5

tional births within a family on average child educational outcomes. Using IV or difference-in-

differences approaches, recent studies include estimates of a significantly positive relationship

(Qian, 2009), a significant negative relationship (Grawe, 2008; Ponczek and Souza, 2012; Lee,

2008; Bougma et al., 2015) and no significant relationship (Black et al., 2005; Angrist et al.,

2010; Fitzsimons and Malde, 2010), see the review in Clarke (2018). It has been argued that

where the usual twin-IV approach identifies no significant relationship, allowing for non-linear

and non-monotonic effects of family fertility on children’s education leads to emergence of a

negative relationship (Brinch et al., 2017; Mogstad and Wiswall, 2016). In this paper, we

assess twin-IV estimates on two different data samples, examining sensitivity to adjustment

for maternal health in linear and non-linear models, and to (small or signed) violations of

the exclusion restriction.3 In this paper we focus nearly exclusively on the internal validity of

twins estimates (IV consistency). In recent work, Aaronson et al. (2017); Bisbee et al. (2017)

examine the external validity of the twin instrumented or sex-mix instrumented estimates of

the impact of fertility on female labour supply.4

3 Methodology

3.1 Estimating The Quantity–Quality Trade-off with Twins

Analyses of the QQ trade-off attempt to produce consistent estimates of α1 in the following

population-level equation:

qualityij = α0 + α1quantityj +Xαx + εij. (1)

Here, quality is a measure of human capital attainment of, or investment in, child i in family j,

and quantity is fertility or the number of siblings of child i. A significant QQ trade-off implies

3The twin instrument has also been used to estimate varying effects of childbearing on women’s labour forceparticipation (Rosenzweig and Wolpin, 1980b; Jacobsen et al., 1999; Angrist and Evans, 1998), and the consequencesof out of wedlock births on marriage market outcomes, poverty and welfare receipt (Bronars and Grogger, 1994).

4We note that like Aaronson et al. (2017), our estimates suggest considerable heterogeneity by country incomelevels. We also observe heterogeneity by child gender.

6

that α1 < 0. Relevant family and child level controls are included, denoted X. As has been

extensively discussed in a previous literature, estimation of α1 using OLS will result in biased

coefficients given that child quality and quantity are jointly determined (Becker and Lewis,

1973; Becker and Tomes, 1976), and unobservable parental behaviours and attributes influence

both fertility decisions, and investments in children’s education (Qian, 2009). The direction

of the OLS bias is determined by the sign on the conditional correlation between quantityj

and the unobserved error term: E[quantityj · εij|X]. If mothers with weaker preferences for

child quality have more children, OLS estimates will overstate the true QQ trade-off.

Following the seminal work of Rosenzweig and Wolpin (1980a), fertility has been instru-

mented with the incidence of twin births on the premise that they constitute an exogenous

shock to family size. The 2SLS specification can be written as:

quantityj = π0 + π1twinj +Xπx + νij, (2a)

qualityij = β0 + β1 quantityj +Xβx + ηij. (2b)

where twinj is an indicator for whether the nth birth in family j is a twin birth. As described

further in section 4, a series of samples are constructed, referred to as the n+ groups, and

consisting of children born before birth n in families with at least n births. The idea is that

children born prior to birth n (subjects) are randomly assigned either one sibling (and make

up the control group) or two siblings (and make up the treatment group) at the nth birth,

and this allows us to estimate causal impacts of the additional birth on investments in, or

outcomes of, these children. The twins themselves are excluded from the estimation sample.5

If twins are a valid instrument, the parameter β1 is consistent and hence in limit equal to the

parameter α1 from the population equation 1.

5This takes care of the concern that since twins tend to be born with weaker endowments (e.g. birth weight), theywill tend to have systematically different quality outcomes. Using data from the US, Almond et al. (2005) documentthat twins have substantially lower birth weight, lower APGAR scores, higher use of assisted ventilation at birth andlower gestation period than singletons. We document similar endowment differences in our data samples (AppendixFigure A3 and A4).

7

Bhalotra and Clarke (forthcoming) provide evidence that omitted variables for maternal

health may contaminate ηij, and in Section 5.1 we document this for the data used in this

paper. If mothers who invest more in their pregnancies (for instance by averting smoking

before birth) also invest more in their children after birth, then the twin-IV estimates will be

inconsistent. There is some evidence for instance in Uggla and Mace (2016) that healthier

mothers (indicated by health measures such as used in our earlier work) invest more in children

in a range of domains. Positive selection of mothers of twins implies:

plimN→∞1

N

N∑i=1

twinj · ηj > 0⇔ plimN→∞ β1 > β1.

We can partition the stochastic error term from equation 2b into a vector of observable

measures of mother’s health capital (H), socioeconomic variables (S), and all other unob-

served components, as ηij = H + S + η∗ij. Assuming a positive (or zero) covariance between

the three components of the error term,6 the step-by-step removal of selection predictors will

result in the estimated coefficient becoming continually closer to the true parameter. Thus:

plim β1 > plim βS1 > plim βS+H

1 > β1. (3)

The coefficients βH1 and βS+H

1 refer to coefficients in a model augmented to control for observ-

able health capital H , and then also observable socioeconomic status S. Since, as discussed

further in section 5.1, all determinants of twin birth are virtually impossible to account for,

twin-IV will under-estimate the magnitude of the QQ trade-off, although addition of predictors

of twins as controls will lead to the estimate approaching the true value from above.

3.2 Estimating IV Bounds with an Imperfect Instrument

Given that we can never fully control for maternal health even with the full set of observable

controls, point estimation of the QQ trade-off is not possible. However, under additional

6Given that the covariance between elements of S and H is found to be positive, and given that the covariancebetween each of these and other unobserved variables which positively affect child quality are also likely to be positive,it is very likely that each covariance term is positive. This is tested later in this paper when examining IV estimates.

8

assumptions relating to the failure of the IV exclusion restriction, or correlations between the

IV, the endogenous variable, and unobservables we can bound the QQ trade-off. In order to

proceed based on IV even in the presence of twin selection, we follow two procedures to bound

the QQ trade-off using the (potentially imperfect) twin instrument.

The first of these is the Nevo and Rosen (2012) “Imperfect IV” procedure. This procedure

is ideally suited to the context examined here, as it suggests that if twins are positively selected

and if fertility is negatively selected, and if twinning and fertility are positively correlated,

then the true parameter will be bounded by the OLS and the IV estimate discussed above.7

If we are willing to additionally assume that the twin instrument is “less endogenous” than

fertility (Nevo and Rosen’s assumption 4), we can further tighten the bounds by forming

a compound instrument based on the endogenous fertility variable, and the imperfect twin

instrument. This instrument, (V = σquantityTwinj − σTwinquantityj), where σ refers to the

standard deviation, can provide tighter bounds on the β1 parameter where βVIV ≤ β1 ≤ βtwin

IV ,

suggesting end points for a series of IV bounds on the parameter β1.

The Nevo and Rosen (2012) procedure is straightforward and relies on quite weak assump-

tions. Namely, to produce bounds in this case, the only additional assumptions we require are

that there is negative selection into fertility, a widely accepted stance in the literature (Qian,

2009), and one that is verified in surveys querying fertility preferences, which show that less

educated women desire more children (e.g. Bhalotra and Cochrane (2010)); that twins are

positively selected, which is shown in Bhalotra and Clarke (forthcoming), that twin births

are positively associated with fertility, which we show in the first stage regressions below, and

that there is less selection into twin birth than into fertility, which seems reasonable.

The upper bound in the case of Nevo and Rosen is the upper end of the 95% confidence

7We can follow the notation of Nevo and Rosen (2012) precisely if we multiply twins by -1, as their assumptionsand lemmas are based on identically signed correlations between the endogenous variable and unobservables, and theIV and unobservables. In our case, once twins is multiplied by -1, assumption 3 is met assuming negative fertilityselection and positive twin selection: ρxuρzu ≤ 0, where ρ denotes correlation. In the notation of our paper, x refersto quantity in equation 1, z refers to twin in equation 2a, u refers to the unobservable stochastic term εij in 1. Then,under Nevo and Rosen (2012, Lemma 1), σxz < 0, or the negative of twins and fertility will be negatively correlated,

and as such βtwinIV ≤ β1 ≤ βOLS .

9

interval on the original twin IV estimate βtwinIV . From equation 3 we know that positive selection

of twins inflates this IV estimate upwards. As such, to offer a more informative identification

region at the upper bound, we also implement an alternative approach to inference for IV

models developed by Conley et al. (2012) for cases when the instrument is plausible but fails the

exclusion restriction. They provide an operational definition of plausibly (or approximately)

exogenous instruments, defining a parameter γ that reflects how close the exclusion restriction

is to being satisfied in the following model (adapted to the QQ model for this paper):

qualityij = δ0 + δ1quantityj + γtwinj +Xδx + ϑij. (4)

Since the parameters δ1 and γ are not jointly identified, prior information or assumptions

about γ are used to obtain estimates of the parameter of interest, δ1. The IV exclusion

restriction is equivalent to imposing ex-ante that γ is precisely equal to zero. Rather than

assuming this holds exactly, one can define plausible exogeneity as a situation in which γ

is nearly, but not precisely equal to zero. Estimating or imposing some (weaker) restriction

on γ buys the identifying information to bound the parameter of interest, even when the IV

exclusion restriction does not hold exactly.8

Conley et al.’s methods are ideally suited to the empirical application of this paper because

they show that their bounds are most informative when the instruments are strong, and the

twin instrument is strong (evidence below). In section 5.1, we provide evidence that leads us to

suspect that γ will not equal zero. Specifically, γ will reflect the effect of unobserved maternal

health on child quality, interacted with the degree to which twin mothers are healthier than

non-twin mothers.9

8Conley et al. (2012) state that “Manski and Pepper (2000) consider treatment effect bounds with instruments thatare assumed to monotonically impact conditional expectations, which is roughly analogous to assuming γ ∈ [0,∞]”.The procedure we follow here is hence an extension of the Manski and Pepper procedure.

9If one or other of these conditional correlations is equal to zero, IV estimates will not be inconsistent. Section5.1 only shows that twin mothers are healthier than mothers of singletons. To complement this, we also show below aseries of positive associations of maternal health and both investments in children and child outcomes. We also discusshow this can be estimated in reduced form from natural experiments in particular settings.

10

Conley et al. (2012) show that bounds for the IV parameter β1 from equation 2b can

be generated under a series of assumptions regarding γ. These include a simple assumption

regarding the support of γ (their “Union of Confidence Intervals”, or UCI, approach), or a

fully specified prior for the distribution of γ (their “Local to Zero”, or LTZ, approach). In the

latter case, a correctly specified prior often leads to tighter bounds. We follow both strategies,

the first is agnostic, placing little structure over the violation of the exclusion restriction by

simply allowing a large range for γ, and the second involves estimating γ as a(n auxiliary)

model parameter.

In general, the Conley et al. (2012) procedure relies on additional assumptions, as we must

form a prior over the magnitude of the failure of the exclusion restriction, while in Nevo and

Rosen (2012) we only need to provide the sign.10 The advantage of the Conley et al. procedure

that makes it worthwhile despite its stronger assumptions, is that it potentially returns tighter

bounds on both the upper and lower end, while Nevo and Rosen retains the original IV upper

bound and only tightens the lower bound using information from the original OLS estimates.

4 Data and Descriptive Statistics

We shall consistently estimate OLS and twin-IV estimates employing microdata from the US

and from a sample of 68 developing countries. In order to estimate the (health and SES

augmented) specification 1, we require information on sibling-linked births, measures of child

quality and characteristics of the mother that include indicators of her health in addition to

the more commonly available age, race and education. The data we use are chosen to satisfy

these requirements. These are the US NHIS, which have been fielded in an identical way

from 2004-2014, and the DHS for 68 countries, which have been applied over 20 years using a

broadly similar design.

In both data sets, children are included in the sample if aged between 6 and 18 years

10It is worth noting however, that Conley et al.’s procedure allows for cases where the prior over γ is of indeterminatesign, which Nevo and Rosen (2012) does not.

11

when surveyed. While ideally we would observe completed education, to our knowledge no

large datasets are available measuring child’s completed education, mother’s total fertility,

and a wide range of maternal health measures taken before the birth of the child. We would

have liked to use the data used in recent prominent studies of the QQ trade-off (Black et al.,

2005; Angrist et al., 2010; Mogstad and Wiswall, 2016), but the Israeli data do not contain

indicators of maternal condition or maternal behaviours, and the Norwegian data are not

publicly accessible, and additionally contain very few markers of maternal health.

A measure of child ‘quality’ available in both data sets is educational attainment. Since the

children are 6-18 and in the process of acquiring education, we use an age-standardized z-score.

In the DHS, the reference group consists of children in the same country and birth cohort, while

in the NHIS, it consists of children with the same month and year of birth. Thus coefficients

are expressed in standard deviations. While in the developing country setting relative school

progress is an appropriate measure of child human capital given high rates of dropout and/or

over-age school entry, this is not the case in the USA. In these data, grade-retention is a

relevant measure of educational progress. It is estimated that between 2 and 6% of children

are held back at least one grade in primary school (Warren et al., 2014). Grade retention has

also been documented to have substantial subsequent impacts on school drop-out and long

term attainment (Manacorda, 2012). The NHIS also provides a subjectively assessed binary

indicator of child health (excellent or not), which we model as an additional indicator of child

quality.11 Case et al. (2002) have demonstrated that an identical self-reported measure of

health predicts mortality and morbidity in the US population. Further details on all variable

definitions are provided in Appendix B.

Appendix Table A2 provides summary statistics for the DHS and NHIS data. Fertility and

maternal characteristics are described at the level of the mother, while child education, and

11While we would also like to analyze a health measure in the developing country sample, anthropometrics areonly available for births that occur within five (or fewer) years of the survey, and infant mortality is unsuitable as thetwin-IV estimator involves analysing child quality for children born prior to twins who will have already been fullyexposed to infant mortality risk by the time the twins were born.

12

health outcomes are described at the level of the child. Twin births make up 1.98% of all births

in the DHS sample, and 2.57% in the NHIS sample. As expected, twin families are larger

than non-twin families. Figure 1 describes total fertility in twin and non-twin families. The

distribution of family size in families where at least one twin birth has occurred dominates the

distribution for all-singleton families in both the DHS sample (Figure 1a) and the US sample

(Figure 1b). This establishes the relevance (power) of the twin instrument for fertility, which

is formally assessed below.

Estimation Samples Studies that instrument fertility with the occurrence of a twin birth

leverage the unexpected additional child to study impacts on outcomes of siblings born before

the additional child. Define families with at least two birth events as 2+ families. In this group,

we shall compare families in which twins occur at the second birth event (treated group) with

families in which a singleton occurs at second order (control group). The subjects, for whom

we measure indicators of child quality (proxies for parental investment) are the first-born

children. Following Black et al. (2005), we similarly construct a 3+ sample which consists of

families with at least three birth events and then we compare outcomes for the first two births

across families that have a twin birth at order three (treated) and families that have a single

birth at order three (control). Many existing studies, such as Angrist et al. (2010), focus upon

the 2+ and 3+ samples. Given higher fertility rates in the developing country sample that we

analyse, we also include 4+ families in which twins occur at fourth order and outcomes are

studied for the first three births.

Restricting the sample to families with at least n births in this way primarily ensures that

we avoid selection on preferences over family size. It also addresses the potential problem

that, since the likelihood of a twin birth is increasing in birth order (see Appendix Figures A5

and A6), increasing family size raises the chances of having a twin birth. In the DHS sample,

42% of all children are in on of the 2+, 3+ or 4+ samples. In the US sample, this value is

13

45%. Children will be in none of these samples if they are either high birth order children, or

if they are low birth order children who do not have older siblings.

5 Results

5.1 Twin Births and Maternal Condition

In Bhalotra and Clarke (forthcoming) we document that mothers with greater health stocks

prior to conception or those who engage in more healthy behaviours or are in a healthier

environment during pregnancy are more likely to take twins to term. In other words, twins

are born to selectively healthy mothers. In order for this to invalidate twin-IV estimates,

two conditions must be satisfied. First, twins must be non-random conditional on observable

controls (non-independence) and second, twins must have an impact on the outcome of interest

beyond that mediated by fertility (non-excludability). Here we document that this is the

case in the two data samples used in this paper, and direct readers to Bhalotra and Clarke

(forthcoming) where additional evidence in other contexts is presented.

Using the two data sets analysed in this paper, we regress the probability of a twin birth

on indicators of maternal health, holding constant socioeconomic status and demographic

characteristics. In the US sample (which is much smaller, limiting statistical power, see

Table 1), twinning is positively associated with mother’s education and BMI, and negatively

associated with the mother’s smoking status prior to the birth. The smoking indicator is

statistically significant even in the pre-IVF period. In Bhalotra and Clarke (forthcoming)

we use the universe of births in the US, between 2010 and 2013, and after removing births

assisted by Artificial Reproductive procedures such as IVF, we document negative associations

of twinning with diabetes and hypertension before pregnancy, with smoking before and during

pregnancy and with being short or underweight before pregnancy.12

12To the extent that educated women exhibit healthier behaviours (Currie and Moretti, 2003; Lleras-Muney andLichtenberg, 2005), education may influence twin births via its impact on health-related behaviours that we do nothave the data to capture directly.

14

In the developing country sample (Table 2), we observe that, conditional on maternal

age and country and year of birth fixed effects, twin births are positively associated with the

mother’s education and health, proxied by her height and body mass index (BMI). This result

holds even in a period before IVF became available (column 5), and in both low and middle

income countries. We also identify a statistically significant positive impact of public health

availability on the likelihood of twinning (column 6).13

We also investigated whether the source of twin non-randomness additionally has a direct

effect on the outcome of interest. This seems plausible since mothers with better health stocks

and mothers engaging in positive behaviours prior to pregnancy are likely to be healthier

themselves and have stronger preferences over health and educational investments in children

following pregnancy, with direct impacts on child outcomes. Evidence of positive causal

effects of maternal health with child health or education is not so easy to find but evidence

of associations for health is in Uggla and Mace (2016) and Kahn et al. (2002). We document

similar associations using our analysis samples. The US results are in Table 3. We regress

available measures of child investment (whether the child has any type of health coverage)

and outcomes (whether the child has any health limits, the child’s standardised educational

achievement, and whether the child is classified by parents as being in excellent health), on

the maternal characteristics documented to predict twinning in this sample. In each case, we

observe that positive maternal health measures are correlated with a reduced likelihood of

having health limitations or not having insurance (columns 1-2), and correlated with positive

measures of human capital outcomes (education and self-informed health status; columns 3-

4). The developing country results are in Table 4. Maternal height, BMI and education are

all positively associated with the likelihood of making more positive antenatal investments

in child outcomes (the number of appointments, and the likelihood of giving birth at home

rather than in a medical centre). We also see impacts of the same maternal health indicators

13We include indicators of prenatal care by doctors or nurses in the mother’s DHS cluster, rather than the mother’suptake, as this is potentially endogenous to birth type.

15

on the child’s education.14

In summary, there is compelling evidence that mothers of twins are selectively healthy.

There is also suggestive evidence that healthier women make greater investments in children

and that their children have better human capital outcomes. We will test this more formally

when progressively introducing controls in IV models in the following section.

5.2 The QQ Trade-off

We now turn to estimates of the QQ trade-off. We initially present the routine OLS and twin-

IV estimates since, under the assumptions about selection into fertility discussed in section 3.1,

these provide bounds on the true parameter. In each case, we show how these estimates are

modified upon addition of available controls for the mother’s health. So as to ascertain that

the indicators of health are not simply proxying for socio-economic status, we also introduce

controls for mother’s education. Our expectation is that the introduction of controls will

tighten the bounds, diminishing the size of the trade-off estimated by OLS and increasing

the size of the IV estimated trade-off. The former would confirm the hypothesis of negative

selection into fertility and the latter would confirm positive selection into twin birth, affording

a direct test of our hypothesis that the twin-IV estimator is biased downward by virtue of

twins being born to healthier mothers.

5.2.1 OLS Estimates

OLS results for both samples are in Table 5. We consistently control for fixed effects for

age of the child, age of the mother at birth, and the year of the survey. In the developing

country sample we also condition on country fixed effects, and in the US sample on census

region and mother’s race fixed effects. We additionally show results with birth order controls.

The available controls for mother’s health are height, BMI and cluster-level health service

14The maternal health indicators are also all positively associated with infant survival; the reason this is notdisplayed is that we do not analyse infant survival as an outcome for the reasons indicated in footnote 11 above.

16

availability in the developing country sample, and BMI and a self-reported assessment of own

health on a Likert scale in the US sample. In both samples, the control for socioeconomic

status is years of education of the mother (see Table A2 for summary statistics of these

variables) and in the developing country sample we also control for the wealth quintile of the

family.

The introduction of observable controls, first for mother’s health and then also for her

education progressively reduces the estimated trade-off to nearly half of the initial value in

both samples, consistent with negative fertility selection. The adjusted estimates for education

in developing countries are between 6.6 and 8.5% of a standard deviation. In the US they are

between 1 and 2.5% for education and between 0.3 and 1.7% for health status. The Altonji

et al. (2005) statistic for the DHS sample suggests that unobservable characteristics of the

mother would need to be about 1 to 1.2 times as important as observables for these estimate

of the QQ trade-off to be entirely driven by selection into fertility. The corresponding ratio in

the US varies from between 1 to 3. In developing countries, the estimated education-fertility

trade-off is decreasing in the birth order at which twins (the additional child) occur, i.e. it is

largest in the 2+ sample and smallest in the 4+ sample. In the US, the trade-off is similar for

the 2+ and 3+ samples and smaller and insignificant in the 4+ sample. However, for health,

this “gradient” is reversed and the largest child health–fertility trade-off is in the 4+ sample

and the smallest in the 2+ sample. In contrast to the case in Black et al. (2005), the controls

for birth order do not eliminate the trade-off (Appendix Tables A3 and A4).

5.2.2 IV Estimates with the Twin Instrument

IV estimates using the twin instrument are in Tables 6 (DHS) and 7 (US), the first-stage

estimates are in panel A and the second stage in panel B. In these Tables we present coefficients

on the variable of interest (fertility), however provide full output of all coefficients in Appendix

Tables A5, A6 and A7.

17

IV Estimates: Developing Countries. The first stage estimates demonstrate the well-

known power of the twin instrument. It consistently passes weak instrument tests (the

Kleibergen-Paap rk statistic and its p-value are presented in panel A). The point estimates

indicate that the incidence of twins raises total fertility by about 0.7 to 0.8 births. That

this estimate is always less than one is in line with other estimates in the twin literature

and is evidence of partial reduction of future fertility following twin births (compensating

behaviour). Consistent with this, the first stage coefficient is increasing in parity. In panel

B, the first column (“Base”) for each parity group presents estimates of β1 from equation 2a

using the current state of the art twin-IV 2SLS estimator. In each of the three samples, in line

with the findings of recent studies (Angrist et al., 2010; Black et al., 2005; Caceres-Delpiano,

2006; Fitzsimons and Malde, 2014; Aslund and Gronqvist, 2010), we find no significant QQ

trade-off. This is not simply because IV estimates are less precise than OLS estimates (as

emphasized in Angrist et al. (2010)), rather, the coefficients are much smaller.

Consistent with our hypothesis and the evidence we present in section 5.1 that twin mothers

are positively selected on health (and education), we see that upon introducing controls for

maternal selectors of twinning, a QQ trade-off emerges in the 3+ and 4+ samples, even though

the available controls are almost certainly a partial representation of the range of relevant

facets of maternal health stocks, health-related behaviours and environmental influences on

foetal health. The bias adjustment is meaningful and statistically significant. In the 3+

sample, the commonly estimated specification produces a point estimate of 2.8% which is

not statistically significant, and partial bias adjustment raises this to 4.1% (conditional on

maternal health indicators) or 4.6% (if mother’s education is also included). In the 4+ sample,

the corresponding figures are 2.7% and 3.7%.

While one way to compare the base and full control specifications is to test whether each

coefficient differs from zero, an alternative test is to compare the estimated coefficients (and

standard errors) to each other. We thus also test each coefficient compared to the “Base”

18

coefficient, and present the p-values of this test as “Coefficient Difference” at the foot of

panel B. We can often reject equality of the coefficients in the specifications with and without

controls for maternal health. Implementing these tests requires that we take account of the

correlations between error terms in each model. In order to do this we replicate IV estimates

using GMM, which allows us to estimate models simultaneously and hence compare coefficients

across models. Additional details related to this test are provided in Appendix C.

IV Estimates: United States The first stage estimates for the US sample (Table 7) are

similar to those for the developing country sample, with a twin birth at parity 2, 3 or 4 leading

to an additional 0.7 to 0.8 total births. The second stage estimates also follow a similar pattern

insofar as the baseline specification indicates no significant relationship between twin-mediated

increases in fertility and either the indicator of school progression, or the indicator of child

health. However, upon the introduction of controls for maternal health and education, the

coefficient describing the QQ trade-off tends to increase in magnitude. In the case of education,

it grows more negative in each sample and is statistically significant in the 2+ sample, with a

point estimate of 10.2%. When child quality is indicated by health, the point estimate in the

2+ sample remains insignificant but in the 3+ and 4+ samples it grows more negative and

in the 3+ sample it is statistically significant at 5.9%. Notice that the USA samples range

between about 21,000 and 61,000 individuals while the developing country data samples range

between about 260,000 and 400,000, so we have more limited statistical power with the US

data. As discussed earlier in this section, it is well recognised that twin-IV estimates are often

not precise. So it is quite striking that we find a significant trade-off for education and health.

Overall, partial bias adjustment reveals a statistically significant QQ trade-off for education

in the 2+ sample (comprising about 50% of the total sample) and for health in the 3+ sample

(comprising about a third of the total sample).

Recent work suggests that focusing on monozygotic (MZ) rather than dizygotic (DZ) twins

may resolve issues related to the heritability of twinning and relationships between twinning

19

and some maternal characteristics (Farbmacher et al., 2016). While we cannot observe whether

a twin pair are MZ or DZ in either of our data sources, when we use only same sex twins to

construct the twin instrument, as they are considerably more likely to be MZ, we observe a

similar pattern, where once again estimates diverge from zero and become significant when

controls for maternal health are included. Results for the DHS are in Table A8 and for the

NHIS in Table A9.

5.2.3 Non-Linear Models

Theoretical statements of the QQ model tend to assume, for simplicity, that all children in

a family have the same endowments and receive the same parental investments. More recent

work, for example the theoretical work of Aizer and Cunha (2012), and empirical papers by

Rosenzweig and Zhang (2009); Brinch et al. (2017); Mogstad and Wiswall (2016); Bagger et al.

(2013) relax this assumption. Among other things, this allows for reinforcing or compensating

behaviours in parental investment choices (Almond and Mazumder, 2013). This implies that

we should allow the coefficient β1 to vary across children in the family.

Using DHS data for which we have a sufficiently large sample to split instruments, we

re-estimate our regressions following the non-linear marginal fertility models of Brinch et al.

(2017); Mogstad and Wiswall (2016). We provide a full discussion of the methodology in

Appendix D, and in the analysis below we follow the procedure laid out by Mogstad and

Wiswall (2016) precisely. Models of this type loosen the linear marginal effects estimated on

fertility, and allow for a one-unit shift in fertility at different birth orders to have potentially

varied impacts on existing children.

We report the restricted (linear) and non-restricted (non-linear) IV models in Table 8, and

the corresponding first stage results in Appendix D (Appendix Table A14). We report results

by the same parity samples as the main IV results presented in Table 6.

In Table 8 we observe, firstly, that as described in Table 6, the linear specifications are

20

universally lower, and often become statistically distinguishable from zero when partially

controlling for the selection of twins as compared to the baseline estimate not controlling for

twin selection. These results only differ from those reported earlier in that we now restrict

the sample to families with 6 children or fewer in line with results reported in Mogstad and

Wiswall (2016), which involves a loss of between 5 and 18 percent of the sample depending

on the parity sample used. For full descriptives on family size in each parity group refer to

Appendix Figure A7. Turning to panel B, we observe a similar non-linear dynamic as that

reported in Mogstad and Wiswall (2016). For example, in the two-plus sample, we observe that

the twin instrumented estimate of the effect of moving from one to two siblings is large and

positive, while the impact of moving from two to three siblings is large and negative. However,

most interestingly for the present analysis, the non-linear impacts are nearly universally larger

in absolute terms when partially controlling for twin selection. As was the case with the linear

model, we observe that the marginal fertility effects become nearly everywhere more negative,

and in certain cases become statistically different from zero. Thus, our finding that the twin-

IV estimator tends to under-estimate the causal effect of fertility on child human capital holds

in the linear and non-linear specifications.

5.2.4 IV effect sizes in perspective

Since the QQ trade-off has been called into question, it is important to consider the size of

the partially-bias-adjusted estimates and not just their sign and statistical significance. Our

results (in the linear model) imply that an additional birth in a family is associated with 0.17

fewer years of completed education (developing countries) or 0.22 fewer grades progressed

(USA). In a widely cited study, Jensen (2010) shows that providing students with information

on the returns to secondary school in their area led, on average, to their completing 0.20-0.35

more years of school over the next four years. In a similarly high-profile experiment, Baird

et al. (2016) find that de-worming in school led to an increase of 0.26 years of schooling and

21

Bhalotra and Venkataramani (2013) find that a 1 s.d. decrease in under-5 diarrheal mortality

(11 deaths per 1000 live births) is associated with girls growing up to achieve an additional 0.38

years of schooling, while both studies find no increase in school years for boys. Almond (2006)

finds that foetal exposure to influenza in 1918 was associated with 0.126 years (1.5 months)

less schooling at the cohort-level and Bhalotra and Venkataramani (2014) show that exposure

to antibiotic-led reductions in pneumonia in infancy resulted in individuals completing 0.7

additional years of education in adulthood relative to unexposed cohorts. The PROGRESA

cash transfer in Mexico is estimated to have generated a 0.66 increase in years of schooling

(Schultz, 2004).

If we consider grade retention in the US, our estimates suggest that an additional birth

results in 0.22 fewer years completed. This is of similar magnitude to estimates of the effect

of an additional 1,000 grams of birthweight over the normal birthweight range (a 0.31 increase

in years of schooling) in Royer (2009), and estimates of the impact of historical exposure to

high rather than low malaria rates (a 0.4 year reduction) in Barreca (2010). Turning to the

effects on health, we find that an additional birth (at order 3 or 4) reduces the likelihood that

siblings are in excellent health by between 3-6%. Almond and Mazumder (2005) document

that in the long-run, the 1918 influenza pandemic increased the likelihood of being in poor or

fair health (the inverse of our health measure) by 10%. Overall, the adjusted estimates are

of a size that it is not prudent to dismiss. Moreover, our estimates indicate the change in

investment (education or health) for one additional birth but, as fertility rates remain high in

many developing countries, the total effect can be large.

5.3 Bounding the QQ Trade-off

5.3.1 Generalised Bounds

The adjusted twin-IV results will not provide consistent estimates of β1 as there are almost

certainly omitted indicators of maternal health. Although documenting that observable mea-

22

sures of health (which also impact child quality) are correlated with the instrument does not

prove instrumental invalidity, it does suggest that it is highly likely that similar non-observable

factors will also be correlated, thus resulting in invalidity. A recent study proposes a formal

test of instrument invalidity (Kitagawa, 2015). Using the 2+ sample for the DHS data this

test rejects the validity of the twin instrument – see Appendix Figure A8 and Table A10;

however this test is sensitive to curse of dimensionality considerations, and so to implement

it we had to simplify the specification of controls.15 We do not report results for the NHIS

data because the sample is too small to obtain informative confidence intervals.

Rather than discard the twin-IV estimator altogether, we harness its power in predicting

fertility using IV bounds to assess the empirical significance of the omitted variables. As

outlined in section 3.2, we begin by estimating Nevo and Rosen (2012) bounds. These are

based on the assumptions that twins are positively selected and fertility is negatively selected.

Evidence for both of these assumptions is in Tables 5 and 6-7 where it is observed that

controlling for education and health results in the OLS coefficients on fertility growing less

negative and the IV coefficients on twins growing more negative. It is further assumed that

twins is a less endogenous variable than fertility. The bounds are in Table 9 (columns 2-3; IV

point estimates are presented for comparison in column 1). These estimates provide a lower

bound on the QQ parameter estimated in Tables 6-7 of approximately 5-8% of a standard

deviation across the DHS and NHIS samples.16 As discussed in section 3.2, the upper bound in

Nevo and Rosen’s bounding procedure is determined by the upper bound of the 95% confidence

interval of the original twin IV estimates. As such, estimates which are not significant at 95%

confidence levels in Tables 6-7 will once again be non-informative when using the Nevo and

Rosen (2012) procedure.

15In particular, the inclusion of a large number of fixed effects is prohibitive, and so we replace country and motheryear of birth fixed effects with continent and decade of birth fixed effects respectively.

16The NHIS data contain only 21,000-61,000 observations (depending on the parity sample), about 10-15% of theDHS sample. As highlighted by Angrist et al. (2010), the twin IV estimator is typically under-powered. When weconstruct bounds, we further challenge statistical power. So the bounds for the NHIS sample are often imprecise,irrespective of whether we use Conley or Nevo Rosen bounds. As a result, in general here we focus on bounds in thedeveloping country sample, although we present bounds from both settings.

23

In order to gain additional precision in bounds estimates at the upper bound, we also

estimate Conley et al. (2012) bounds. As discussed, we need to define a prior belief over the

sign and magnitude that the coefficient on twin birth (γ) would take in equation 4. To begin,

we assume a range of values for γ from 0 to 0.05, or 5% of a standard deviation, in which case

instrument validity is violated, and having a twin mother has a positive effect on child quality

conditional on fertility. The results are in Figure 2 for developing countries and Figure 3 for

the US for the 3+ samples; results for the 2+ and 4+ samples are in Appendix Figures A9 and

A10. We assume γ ∼ U(0, δ) with δ displayed on the x-axis. Thus, when δ = 0, γ is exactly

0, and the bounds collapse to the 95% confidence interval for the traditional IV estimate.

Given that twin IV estimators tend to produce wide confidence intervals (Angrist et al.,

2010), Conley et al. (2012) bounds will also tend to be wide. As δ increases, the violation of

the exclusion restriction increases. We observe, firstly, a widening of the estimated bounds as

the size of the violation increases,17 and secondly that the upper bound becomes increasingly

negative, moving in the direction of finding a QQ trade-off.18 In both figures the vertical red

line displays our preferred estimate for γ, the estimation of which we discuss further below.

For developing countries and for the US (when the outcome is a measure of child health,

but not for education, where the estimates are considerably less precise) we observe baseline

IV results with bounds that are not informative of the sign of the trade-off when the exclusion

restriction is assumed to hold exactly. However, as γ grows, the bounds do quickly become

informative, suggesting that with a γ as low as 0.002 in the US or 0.008 in developing countries,

a significant QQ trade-off emerges. While using an interval of values for γ has the advantage

of being unrestrictive (0.05 is a very large value for the exclusion restriction), the bounds are

quite wide.

With a view to improving the precision and relevance of these bounds, we estimate rather

17As Conley et al. (2012) discuss, the degree of failure of the exclusion restriction is analogous to sampling uncertaintyrelated to the IV parameter β1. As the exclusion restriction is increasingly relaxed, the “exogeneity error” (in Conleyet al.’s terminology) related to the instrument inflates the traditional variance-covariance matrix.

18This is in line with the twin-IV estimates becoming more negative upon including controls that mitigate theomitted variable bias which leads to violation of the exclusion restriction.

24

than assume γ, the measure of the extent of the violation of the exclusion restriction. This

is (as usual) the product of two relationships which, here, are the relationship between the

probability of a twin birth and maternal health, and the relationship between maternal health

and investments in children. The data requirements for this are non-trivial—we need data

on two generations, with an exogenous shock to maternal health in the first generation, and

measures of child quality in the second generation. For this, we exploit natural experiments

in the US and Nigeria. This is in line with Conley et al. (2012) who illustrate their estimator

with examples involving back of the envelope calculations of γ for each case. In Appendix E

we detail how we leverage two historical natural experiments involving a shock to the health

of women, namely, the Biafra war in Nigeria and the introduction of the first antibiotics to the

US, to estimate γ.19 We also conduct a number of back of the envelope plausibility tests. In

general these suggest that γ is around 0.004-0.006, or that having a (positively-selected) twin

mother has a direct effect of around 0.4 to 0.6% of a standard deviation in quality outcomes.

As we outline at more length in Appendix E.1 and E.2, the generation of this estimate

for γ is based on particular shocks which impact maternal health. We present evidence

supporting the assumptions for these estimates in Appendix E.3, and put these in the context

of Conley et al.’s methods in Appendix E.4. These reduced form estimates of γ based on

exogenous events provide a well founded estimate to use in the Conley et al. (2012) procedure,

but one may be concerned about external validity of these estimates, given that they are

derived from 1930s America (sulfa drugs) and 1970s Nigeria (Biafra war). We can, however,

show that estimates of γ from contemporary DHS data (which are used in the main analysis

and hence relevant for estimates of γ) are in fact of the same order of magnitude as our

estimates from America and Nigeria. Consider Appendix Table A11, which shows that a

one standard deviation change in maternal BMI is associated with a 0.070 s.d. increase in

19We are agnostic about intermediate variables (“mediators”) and simply show what is key to the violation of thetwin instrument, which is that the exogenous shock to maternal health impacts on an indicator of child quality. Wethen scale this estimate by the difference in health between women who give birth to twins and women who give birthto singletons.

25

the child’s educational Z-score (column 4). We observe that twin mothers in the same data

sample have BMI 0.050 s.d. higher than non-twin mothers. Scaling (multiplying) this by

the estimated association (0.070×0.050) produces an estimate of gamma (a measure of the

violation in the exclusion restriction, or the twin-mediated effect of maternal BMI on child

outcomes) of 0.0035 s.d. This is of the same order of magnitude as the value of γ that we

estimate from the Biafra (0.0040) and Sulfa (0.0062) case studies. We can calculate a range

of such estimates using education and height (as well as BMI), and find values of 0.025 for

education (0.215×0.0121) and 0.00196 for height (0.019×0.103). Importantly, all of these

values fall within the estimated distributions of γ used to calculate Conley et al. bounds,

displayed in Appendix Figure A13.

It is important to note that while degree of the violation of the exclusion restriction is

estimated to be relatively small (at 0.4 or 0.6 percent of a standard deviation of the child

quality measure, education), γ is obtained after scaling the estimated impact of maternal

health shocks/characteristics (that predict twinning) on the final outcomes of interest (child

quality indicators). The scaling factor is the difference in the maternal health indicator be-

tween mothers of twins and mothers of singletons – this is 0.050 in the BMI example above, it

is 0.125 in the sulfa experiment, and 0.267 in the Biafra experiment (all figures from Appendix

Table A15). Thus γ is in fact much smaller than the measure of violation that is of interest.

Using these estimates of γ, we are able to pin down the bounds described in Figures 2-

3. See Table 9, columns 4-5, where we present the UCI approach in which we assume that

γ ∈ [0, 2γ]. This assumption is chosen such that the true γ described in Appendix E in each

case will lie precisely in the middle of the confidence interval, following Conley et al. (2012)’s

empirical example. For the LTZ approach, we use estimates of both γ and its distribution,

which allow uncertainty for our estimates of γ and assume that γ is distributed precisely

according to the estimated empirical distribution (refer to Appendix E.5).

Our preferred bounds estimates are those in the right-hand columns of Table 9, as these are

26

more efficient, being based on the estimated bootstrap distribution. For the developing country

sample, estimates of the QQ trade-off in determining educational attainment, in the 3+ and

4+ samples, are bounded between slightly less than zero and 6% of a standard deviation and

the mid-point of these bounds falls at 2.6% and 3.7% of a standard deviation respectively. An

additional sibling thus does appear to depress a child’s educational attainment.

For the US sample we see that while the mid-point of the bounds is virtually always

negative (health in the 2+ group is the only exception), the bounds are most informative for

the 2+ (education) and 3+ (health) samples. These indicate that an additional birth reduces

the grade progression of an older sibling by 16.6% of a s.d. (upper bound), or 8.3% of a s.d.

(mid point), and their likelihood of being reported as being in excellent health by 7% (upper

bound) or 3.5% (mid point).

In Figure 4 we plot the estimated coefficients and bounds for the developing country sample

altogether so they are readily compared. The corresponding plot of all estimates for the (much

smaller) US sample is in Appendix Figure A11). The figures show the OLS and IV estimates,

with base controls, health, and health and socioeconomic controls and we show the Nevo–

Rosen and Conley et al. bounds for each of the 2+, 3+ and 4+ groups. The informativeness

of the bounds is evaluated against the criteria laid out by Hotz et al. (1997): firstly do the

bounds enable us to determine if the effect is negative or positive, secondly can we reject the

point estimates of linear IV, and thirdly do our bounds allow us to reject the OLS estimate

of the causal effect. In general, for the 3+ and 4+ samples in the DHS data, the bounds are

informative of the (negative) sign of the trade-off, but not for the 2+ sample. In terms of the

second and third criteria, we can never exclude the point estimate of the original IV estimate

from our bounds, however we often can reject the original OLS estimate, which is important

given recent evidence that many IV estimates are inaccurate, and frequently include OLS

point estimates in their confidence intervals (Young, 2018).

Using summary statistics from Table A2, we can convert standardised estimates from these

27

bounds into years of education. The effect on education of first and second-borns from having

a fertility shock at the third birth, or on first to third-borns from a fertility shock at the fourth

birth is estimated to be approximately 5% of a standard deviation in the developing country

sample.20 Using the standard deviation in the sample of 3.8 years, this implies an average

effect of around 0.19 years of education per additional sibling at the age of 13 years (the

average age in the sample). In the case of the US estimates, for the same 2+ and 3+ groups

the average estimated effect based on the midpoint of bounds estimates is 8% of a standard

deviation in grade retention, which equates to a marginal effect of 0.22 years of education by

the age of 11 years. On average the likelihood of being reported as being in excellent health

falls by 4.2% according to the midpoint of bounds following an additional birth among the

same group. Overall, these are quite large effects relative to the marginal effects of different

policy interventions considered in the literature (see section 5.2.4).

Conclusion and Discussion

This paper demonstrates that twin-IV estimates of the fertility- human capital trade-off tend

to be biased downward on account of positive selection of women into twin birth, a problem

that has not been previously recognized. We show that even partially correcting for twin

endogeneity is sufficient to push estimates of the trade-off up by about 3%-5% of a standard

deviation. Using partial identification to bound the effect of child quantity on child quality

suggests that the true effect size may be as high as 8% of a standard deviation, though it is

typically centered around 3%-5% of a standard deviation.

We conclude that additional unexpected births do have quantitatively important effects on

their siblings’ educational outcomes. The estimated 4%-5% of a standard deviation increase

is equivalent to an additional 0.15 to 0.19 years in the classroom in the developing country

sample, and estimates of approximately 8% of a standard deviation in the US account for 0.22

20This estimate is the average midpoint if the bound estimates from the three plus and four plus samples in Table9 and can be calculated as: 1

2× [(0.0646− 0.0067)/2 + 0.0067 + (0.0748 + 0.0235)/2 + 0.0235].

28

more grades progressed on average. As detailed in the Introduction, the implications of these

findings are far-reaching, not only in terms of vindication of Beckerian theory but because

they guide fertility control policies.

Any human capital costs of fertility are naturally of greater concern not only when fertility

is high but also when a large share of it is unwanted. In 2015 the average number of births per

woman in low income countries was five and, comparing actual with stated desired fertility,

we estimate the share of unwanted births is as high as 60 per cent in some countries, with

a mean of 27 per cent. Unwanted fertility is not unique to poorer countries. For instance,

despite access to contraceptive methods, 21 percent of all pregnancies in 2011 in the US ended

in elective abortion (Guttmacher Institute, 2016). Moreover, there is a strong trend in IVF

use, and up to 40% of IVF successes result in multiple births to women who wanted one child

(Kulkarni et al., 2013), creating a growing set of unwanted children. This might exacerbate

impacts of additional births on investments in preceding births.

29

References

D. Aaronson, R. Dehejia, A. Jordan, C. Pop-Eleches, C. Samii, and K. Schulze. The Effectof Fertility on Mothers’ Labor Supply over the Last Two Centuries. IZA Discussion Papers10559, Institute for the Study of Labor (IZA), Feb. 2017.

A. Aizer and F. Cunha. The Production of Human Capital: Endowments, Investments andFertility. NBER Working Papers 18429, National Bureau of Economic Research, Inc, Sept.2012.

D. Almond. Is the 1918 Influenza Pandemic Over? Long-Term Effects of In Utero InfluenzaExposure in the Post-1940 U.S. Population. Journal of Political Economy, 114(4):672–712,2006.

D. Almond and B. Mazumder. The 1918 Influenza Pandemic and Subsequent Health Out-comes: An Analysis of SIPP Data. American Economic Review, 95(2):258–262, May 2005.

D. Almond and B. Mazumder. Fetal Origins and Parental Responses. Annual Review ofEconomics, 5(1):37–56, 05 2013.

D. Almond, K. Y. Chay, and D. S. Lee. The costs of low birth weight. The Quarterly Journalof Economics, 120(3):1031–1083, August 2005.

J. G. Altonji, T. E. Elder, and C. R. Taber. Selection on observed and unobserved variables:Assessing the effectiveness of catholic schools. Journal of Political Economy, 113(1):151–184, February 2005.

J. Angrist, V. Lavy, and A. Schlosser. Multiple experiments for the causal link between thequantity and quality of children. Journal of Labor Economics, 28(4):pp. 773–824, 2010.

J. D. Angrist and W. N. Evans. Children and their parents’ labor supply: Evidence fromexogenous variation in family size. American Economic Review, 88(3):450–77, June 1998.

O. Aslund and H. Gronqvist. Family size and child outcomes: Is there really no trade-off?Labour Economics, 17(1):130–39, 2010.

J. Bagger, J. A. Birchenall, H. Mansour, and S. Urza. Education, Birth Order, and FamilySize. NBER Working Papers 19111, National Bureau of Economic Research, Inc, June 2013.

S. Baird, J. H. Hicks, M. Kremer, and E. Miguel. Worms at Work: Long run Impacts of aChild Health Investment. The Quarterly Journal of Economics, 131(4):1637–1680, 2016.

A. I. Barreca. The Long-Term Economic Impact of In Utero and Postnatal Exposure toMalaria. Journal of Human Resources, 45(4):865–892, 2010.

G. S. Becker. An Economic Analysis of Fertility. In Demographic and Economic Changein Developed Countries, NBER Chapters, pages 209–240. National Bureau of EconomicResearch, Inc, June 1960.

30

G. S. Becker and H. G. Lewis. On the interaction between the quantity and quality of children.Journal of Political Economy, 81(2):S279–88, Part II, 1973.

G. S. Becker and N. Tomes. Child endowments and the quantity and quality of children.Journal of Political Economy, 84(4):S143–62, August 1976.

S. Bhalotra and D. Clarke. Twin Birth and Maternal Condition. Review of Economics andStatistics, xx(x):xxx–xxx, forthcoming.

S. Bhalotra and T. Cochrane. Where Have All the Young Girls Gone? Identification of SexSelection in India. IZA Discussion Papers 5381, Institute for the Study of Labor (IZA),Dec. 2010.

S. Bhalotra and A. Venkataramani. Shadows of the Captain of the Men of Death: Early LifeHealth Interventions, Human Capital Investments, and Institutions. Mimeo, University ofEssex, 2014.

S. R. Bhalotra and A. Venkataramani. Cognitive Development and Infectious Disease: GenderDifferences in Investments and Outcomes. IZA Discussion Papers 7833, Institute for theStudy of Labor (IZA), Dec. 2013.

J. Bisbee, R. Dehejia, C. Pop-Eleches, and C. Samii. Local instruments, global extrapolation:External validity of the labor supply–fertility local average treatment effect. Journal ofLabor Economics, 35(S1):S99–S147, 2017.

S. E. Black, P. J. Devereux, and K. G. Salvanes. The more the merrier? the effect of familysize and birth order on children’s education. The Quarterly Journal of Economics, 120(2):669–700, 2005.

S. E. Black, P. J. Devereux, and K. G. Salvanes. Does Grief Transfer across Generations? Be-reavements during Pregnancy and Child Outcomes. American Economic Journal: AppliedEconomics, 8(1):193–223, January 2016.

J. Blake. Family Size and Achievement. University of California Press, Berkeley, 1989.

M. Bougma, T. K. LeGrand, and J.-F. Kobiane. Fertility Decline and Child Schooling inUrban Settings of Burkina Faso. Demography, 52(1):281–313, 2015.

C. Brinch, M. Mogstad, and M. Wiswall. Beyond LATE with a Discrete Instrument. Journalof Political Economy, 125(4):985–1039, 2017.

S. G. Bronars and J. Grogger. The economic consequences of unwed motherhood: Using twinbirths as a natural experiment. The American Economic Review, 84(5):1141–1156, 1994.

J. Caceres-Delpiano. The impacts of family size on investment in child quality. Journal ofHuman Resources, 41(4):738–754, 2006.

A. Case, D. Lubotsky, and C. Paxson. Economic Status and Health in Childhood: The Originsof the Gradient. The American Economic Review, 92(5):1308–1334, 2002.

31

K. Chay and M. Greenstone. The Impact of Air Pollution on Infant Mortality: Evidence fromGeographic Variation in Pollution Shocks Induced by a Recession. The Quarterly Journalof Economics, 118(3):1121–1167, 2003.

D. Clarke. Children And Their Parents: A Review Of Fertility And Causality. Journal ofEconomic Surveys, 32(2):518–540, April 2018.

T. G. Conley, C. B. Hansen, and P. E. Rossi. Plausibly Exogenous. The Review of Economicsand Statistics, 94(1):260–272, February 2012.

J. Currie and E. Moretti. Mother’s Education and the Intergenerational Transmission ofHuman Capital: Evidence from College Openings. The Quarterly Journal of Economics,118(4):1495–1532, 2003.

D. N. De Tray. Child quality and the demand for children. Journal of Political Economy, 81(2):S70–95, March 1973.

H. Farbmacher, R. Guber, and J. Vikstrom. Increasing the credibility of the Twin birthinstrument. Working Paper Series 2016:10, IFAU - Institute for Evaluation of LabourMarket and Education Policy, June 2016.

E. Fitzsimons and B. Malde. Empirically probing the quantity-quality model. IFS WorkingPapers W10/20, Institute for Fiscal Studies, Sep 2010.

E. Fitzsimons and B. Malde. Empirically probing the quantity-quality model. Journal ofPopulation Economics, 27(1):33–68, Jan 2014.

O. Galor. The demographic transition: causes and consequences. Cliometrica, Journal ofHistorical Economics and Econometric History, 6(1):1–28, January 2012.

O. Galor and D. N. Weil. Population, Technology, and Growth: From Malthusian Stagnationto the Demographic Transition and Beyond. The American Economic Review, 90(4):806–828, 2000.

N. D. Grawe. The quality–quantity trade-off in fertility across parent earnings levels: a testfor credit market failure. Review of Economics of the Household, 6(1):29–45, 2008.

Guttmacher Institute. Induced Abortion in the United States. Fact sheet, Guttmacher Insti-tute, Sept. 2016.

E. A. Hanushek. The trade-off between child quantity and quality. Journal of PoliticalEconomy, 100(1):84–117, February 1992.

J. Heckman, R. Pinto, and P. Savelyev. Understanding the Mechanisms through Whichan Influential Early Childhood Program Boosted Adult Outcomes. American EconomicReview, 103(6):2052–86, October 2013.

V. J. Hotz, C. H. Mullin, and S. G. Sanders. Bounding Causal Effects Using Data From aContaminated Natural Experiment: Analysis the Effects of Teenage Chilbearing. Reviewof Economic Studies, 64(4):575–603, Oct. 1997.

32

J. P. Jacobsen, J. W. P. III, and J. L. Rosenbloom. The effects of childbearing on marriedwomen’s labor supply and earnings: Using twin births as a natural experiment. Journal ofHuman Resources, 34(3):449–474, 1999.

R. Jensen. The (Perceived) Returns to Education and the Demand for Schooling. The Quar-terly Journal of Economics, 125(2):515–548, 2010.

R. S. Kahn, B. Zuckerman, H. Bauchner, C. J. Homer, and P. H. Wise. Women’s Health AfterPregnancy and Child Outcomes at Age 3 Years: A Prospective Cohort Study. AmericanJournal of Public Health, 92(8):1312–1318, 2002.

T. Kitagawa. A test for instrument validity. Econometrica, 83(5):2043–2063, 2015.

A. D. Kulkarni, D. J. Jamieson, H. W. J. Jones, D. M. Kissin, M. F. Gallo, M. Macaluso, andE. Y. Adashi. Fertility Treatments and Multiple Births in the United States. New EnglandJournal of Medicine, 369(23):2218–2225, 2013.

J. Lee. Sibling size and investment in children’s education: an Asian instrument. Journal ofPopulation Economics, 21(4):855–875, October 2008.

A. Lleras-Muney and F. Lichtenberg. The Effect Of Education On Medical Technology Adop-tion: Are The More Educated More Likely To Use New Drugs? Annales d’Economie etStatistique, 79/80, 2005.

M. Manacorda. The Cost of Grade Retention. The Review of Economics and Statistics, 94(2):596–606, 2012.

C. F. Manski and J. V. Pepper. Monotone Instrumental Variables: With an Application tothe Returns to Schooling. Econometrica, 68(4):997–1010, 2000.

B. Mazumder and Z. Seeskin. Breakfast skipping, extreme commutes and the sex compositionat birth. Biodemography and Social Biology, 61(2):187–208, 2015.

O. Moav. Cheap Children and the Persistence of Poverty. The Economic Journal, 115(500):88–110, 2005.

M. Mogstad and M. Wiswall. Testing the Quantity-Quality Model of Fertility: Linearity,Marginal Effects, and Total Effects. Quantitative Economics, 7(1):157–192, 2016.

A. Nevo and A. M. Rosen. Identification with Imperfect Instruments. The Review of Eco-nomics and Statistics, 94(3):659–671, August 2012.

V. Ponczek and A. P. Souza. New Evidence of the Causal Effect of Family Size on ChildQuality in a Developing Country. Journal of Human Resources, 47(1):64–106, 2012.

N. Qian. Quantity-quality and the one child policy: The only-child disadvantage in schoolenrollment in rural China. NBER Working Papers 14973, National Bureau of EconomicResearch, Inc, May 2009.

M. R. Rosenzweig and K. I. Wolpin. Testing the quantity-quality fertility model: The use oftwins as a natural experiment. Econometrica, 48(1):227–40, January 1980a.

33

M. R. Rosenzweig and K. I. Wolpin. Life-cycle labor supply and fertility: Causal inferencesfrom household models. Journal of Political Economy, 88(2):pp. 328–348, 1980b.

M. R. Rosenzweig and K. I. Wolpin. Natural “Natural Experiments” in Economics. Journalof Economic Literature, 38(4):827–874, December 2000.

M. R. Rosenzweig and J. Zhang. Do population control policies induce more human capitalinvestment? twins, birth weight and China’s one-child policy. Review of Economic Studies,76(3):1149–1174, 2009.

H. Royer. Separated at Girth: US Twin Estimates of the Effects of Birth Weight. AmericanEconomic Journal: Applied Economics, 1(1):49–85, January 2009.

T. P. Schultz. School subsidies for the poor: evaluating the Mexican Progresa poverty program.Journal of Development Economics, 74(1):199–250, 2004.

C. Uggla and R. Mace. Parental investment in child health in sub-Saharan Africa: a cross-national study of health-seeking behaviour. Royal Society Open Science, 3(2), 2016.

J. Vere. Fertility and parents’ labour supply: new evidence from US census data. OxfordEconomic Papers, 63(2):211–231, 2011.

J. R. Warren, E. Hoffman, and M. Andrew. Patterns and Trends in Grade Retention Ratesin the United States, 1995–2010. Educational Researcher, 43(9):433–443, 2014.

R. J. Willis. A New Approach to the Economic Theory of Fertility Behavior. Journal ofPolitical Economy, 81(2):S14–S64, 1973.

A. Young. Consistency without Inference: Instrumental Variables in Practical Application.mimeo, London School of Economics, June 2018.

34

Tables

Table 1: Probability of Giving Birth to Twins USA (NHIS)

Twin×100 All Time

1982-1990 1991-2013

Mother’s Education (Years) 0.060** 0.115 0.056**(0.025) (0.096) (0.026)

Mother’s Height (Inches) 0.012 0.049 0.008(0.025) (0.087) (0.026)

Mother’s BMI 0.010** 0.025 0.009*(0.004) (0.016) (0.005)

Smoked Prior to Birth -0.285** -1.336** -0.183(0.137) (0.526) (0.142)

Observations 103,589 6,891 96,698R-Squared 0.004 0.031 0.004

This table presents regressions of whether each birth is a twin or a singleton

on a number of maternal characteristics. All specifications include a full set of

mother’s age, survey year, region of birth, and mother’s race dummies and are

estimated as linear probability models. Twin is multiplied by 100 for presentation.

Height is measured in inches and BMI is weight in kg divided by height in metres

squared. Heteroscedasticity-robust standard errors are included in parentheses.

*** p<0.01, ** p<0.05, * p<0.1

35

Tab

le2:

Pro

bab

ilit

yof

Giv

ing

Bir

thto

Tw

ins

(Dev

elop

ing

Cou

ntr

ies

by

Inco

me

and

Tim

eP

erio

d)

Tw

in×

100

All

Inco

me

Tim

eP

renat

al

Low

inc

Mid

dle

inc

1990

-201

319

72-1

989

Mot

her

’sA

ge0.

540*

**0.

550*

**0.

517*

**0.

601*

**0.

314*

**0.

541*

**(0

.027

)(0

.033

)(0

.047

)(0

.031

)(0

.058

)(0

.027

)M

other

’sA

geSquar

ed-0

.007

***

-0.0

07**

*-0

.007

***

-0.0

08**

*-0

.003

**-0

.007

***

(0.0

00)

(0.0

01)

(0.0

01)

(0.0

01)

(0.0

01)

(0.0

00)

Age

atF

irst

Bir

th-0

.050

***

-0.0

82**

*-0

.001

-0.0

52**

*-0

.040

***

-0.0

50**

*(0

.008

)(0

.010

)(0

.013

)(0

.010

)(0

.015

)(0

.008

)M

other

’sE

duca

tion

(yea

rs)

0.02

1***

0.01

9**

0.01

30.

021*

**0.

019

0.01

8**

(0.0

07)

(0.0

09)

(0.0

10)

(0.0

08)

(0.0

12)

(0.0

07)

Mot

her

’sH

eigh

t(c

m)

0.05

9***

0.05

8***

0.05

8***

0.06

3***

0.04

4***

0.05

8***

(0.0

04)

(0.0

05)

(0.0

07)

(0.0

05)

(0.0

07)

(0.0

04)

Mot

her

’sB

MI

0.04

7***

0.05

9***

0.03

8***

0.04

5***

0.05

0***

0.04

5***

(0.0

06)

(0.0

09)

(0.0

09)

(0.0

07)

(0.0

10)

(0.0

06)

Pre

nat

alC

are

(Doct

or)

0.33

3**

(0.1

42)

Pre

nat

alC

are

(Nurs

e)0.

312*

*(0

.142

)P

renat

alC

are

(Non

e)0.

008

(0.1

81)

Obse

rvat

ions

2,04

6,90

71,

287,

585

759,

322

1,52

5,96

652

0,94

12,

043,

217

R-S

quar

ed0.

006

0.00

60.

005

0.00

60.

005

0.00

6

Notes:

This

table

pre

sents

resu

lts

for

the

dev

elopin

gco

untr

ysa

mple

splitt

ing

by

pre

-and

post

-1990.

Main

spec

ifica

tions

for

the

dev

elopin

gco

untr

ysa

mple

are

poole

dfo

rall

yea

rs.

All

spec

ifica

tions

incl

ude

afu

llse

tof

yea

rof

bir

thand

countr

ydum

mie

s,

and

are

esti

mate

das

linea

rpro

babilit

ym

odel

s.T

win

ism

ult

iplied

by

100

for

pre

senta

tion.

Hei

ght

ism

easu

red

incm

and

BM

I

isw

eight

inkg

div

ided

by

hei

ght

inm

etre

ssq

uare

d.

Pre

nata

lca

reva

riable

sre

fer

toav

erage

level

sof

cover

age

inD

HS

clust

ers.

Thes

epre

nata

lm

easu

res

are

only

reco

rded

for

bir

ths

in5

yea

rspre

cedin

gea

chsu

rvey

wav

e,and

as

such

,a

small

num

ber

of

(sm

all)

clust

ers

do

not

hav

ere

cord

sav

ailable

.Sta

ndard

erro

rscl

ust

ered

by

moth

ers

are

pre

sente

din

pare

nth

eses

.∗p<

0.1

;∗∗

p<

0.0

5;∗∗∗p<

0.0

1

36

Table 3: Maternal Health and Child Investments/Outcomes (NHIS)

No Health Health Education ExcellentInsurance Limits Z-Score Health

Mother’s Education (Years) -0.016*** -0.001*** 0.019*** 0.020***(0.001) (0.000) (0.002) (0.001)

Mother’s Height (Inches) -0.001** -0.002*** 0.005*** 0.008***(0.000) (0.000) (0.002) (0.001)

Mother’s BMI 0.000 0.000*** 0.000 -0.002***(0.000) (0.000) (0.000) (0.000)

Smoked Prior to Birth 0.008*** 0.031*** -0.046*** -0.058***(0.002) (0.003) (0.009) (0.004)

Observations 103,589 103,589 74,777 103,589R-Squared 0.047 0.013 0.019 0.033

Regressions are presented of child investments or child outcomes on a number of maternal charac-

teristics. All specifications and variable definitions follow Table 1 and include a full set of mother’s

age, survey year, region of birth, and mother’s race dummies. No Health insurange, health limits

and excellent health are binary variables, and models are estimated as linear probability models.

Education Z-Score is a standardized score of the child’s completed years of education compared with

his or her birth year and birth month cohort. Height is measured in inches and BMI is weight in

kg divided by height in metres squared. Heteroscedasticity-robust standard errors are included in

parentheses. *** p<0.01, ** p<0.05, * p<0.1

37

Tab

le4:

Mat

ernal

Hea

lth

and

Child

Inve

stm

ents

/Outc

omes

(Dev

elop

ing

Cou

ntr

ySam

ple

)

Mat

ernal

Char

acte

rist

ics

Wit

hC

lust

er-L

evel

Hea

lth

Mea

sure

s

Hom

eA

nte

nat

alE

duca

tion

Hom

eA

nte

nat

alE

duca

tion

Bir

thV

isit

sZ

-Sco

reB

irth

Vis

its

Z-S

core

Mot

her

’sA

ge0.

026*

**0.

024*

**0.

004*

*0.

025*

**0.

030*

**0.

002

(0.0

01)

(0.0

05)

(0.0

02)

(0.0

01)

(0.0

04)

(0.0

02)

Mot

her

’sA

geSquar

ed-0

.000

***

-0.0

01**

*-0

.000

***

-0.0

00**

*-0

.001

***

-0.0

00(0

.000

)(0

.000

)(0

.000

)(0

.000

)(0

.000

)(0

.000

)A

geat

Fir

stB

irth

-0.0

11**

*0.

070*

**0.

009*

**-0

.010

***

0.06

0***

0.00

8***

(0.0

00)

(0.0

02)

(0.0

00)

(0.0

00)

(0.0

01)

(0.0

00)

Mot

her

’sE

duca

tion

(yea

rs)

-0.0

34**

*0.

267*

**0.

075*

**-0

.029

***

0.21

9***

0.07

2***

(0.0

00)

(0.0

01)

(0.0

00)

(0.0

00)

(0.0

01)

(0.0

00)

Mot

her

’sH

eigh

t(c

m)

-0.0

01**

*0.

020*

**0.

005*

**-0

.001

***

0.01

6***

0.00

5***

(0.0

00)

(0.0

01)

(0.0

00)

(0.0

00)

(0.0

01)

(0.0

00)

Mot

her

’sB

MI

-0.0

13**

*0.

076*

**0.

022*

**-0

.011

***

0.06

0***

0.02

0***

(0.0

00)

(0.0

01)

(0.0

00)

(0.0

00)

(0.0

01)

(0.0

00)

Pre

nat

alC

are

(Doct

or)

-0.1

90**

*1.

213*

**0.

063*

**(0

.004

)(0

.030

)(0

.008

)P

renat

alC

are

(Nurs

e)-0

.039

***

0.11

0***

0.06

9***

(0.0

04)

(0.0

28)

(0.0

08)

Pre

nat

alC

are

(Non

e)0.

336*

**-3

.673

***

-0.4

21**

*(0

.005

)(0

.033

)(0

.010

)O

bse

rvat

ions

749,

010

615,

621

1,12

8,72

974

9,00

661

5,61

91,

125,

305

R-S

quar

ed0.

292

0.33

40.

129

0.32

00.

385

0.13

7

Reg

ress

ions

are

pre

sente

dof

child

inves

tmen

tsor

child

outc

om

eson

anum

ber

of

mate

rnal

chara

cter

isti

cs.

All

spec

ifica

tions

and

vari

able

defi

nit

ions

follow

Table

2and

incl

ude

afu

llse

tof

countr

yand

yea

rof

bir

thfixed

effec

ts.

Hom

ebir

thand

ante

nata

l

vis

its

are

reco

rded

only

for

childre

naged

0-4

at

the

tim

eof

the

surv

ey,

and

the

standard

ised

educa

tion

score

isre

cord

edonly

for

childre

naged

6-1

8(o

fsc

hool

age)

.A

ddit

ional

note

sare

available

inT

able

2.

DH

Ssa

mple

wei

ghts

are

use

d,

and

standard

erro

rs

are

clust

ered

by

moth

er.

***

p<

0.0

1,

**

p<

0.0

5,

*p<

0.1

38

Tab

le5:

OL

SE

stim

ates

ofth

eQ

QT

rade-

off:

Dev

elop

ing

Cou

ntr

yan

dU

S

Dep

enden

tV

aria

ble

:2+

3+4+

Child

Qual

ity

Bas

e+

H+

S&

HB

ase

+H

+S&

HB

ase

+H

+S&

H

Panel

A:

Develo

pin

gC

ountr

yR

esu

lts

Dep

enden

tV

aria

ble

=Sch

ool

Z-S

core

Fer

tility

-0.1

52**

*-0

.130

***

-0.0

85**

*-0

.139

***

-0.1

16**

*-0

.074

***

-0.1

20**

*-0

.098

***

-0.0

61**

*(0

.002

)(0

.002

)(0

.002

)(0

.002

)(0

.002

)(0

.001

)(0

.002

)(0

.002

)(0

.001

)

Obse

rvat

ions

259,

958

259,

958

259,

958

395,

687

395,

687

395,

687

409,

576

409,

576

409,

576

R-S

quar

ed0.

109

0.13

20.

193

0.09

30.

120

0.19

00.

081

0.11

30.

188

Alt

onji

etal

.R

atio

1.25

81.

137

1.03

5

Panel

B:

US

Resu

lts

Dep

enden

tV

aria

ble

=Sch

ool

Z-S

core

[1em

]F

erti

lity

-0.0

31**

*-0

.031

***

-0.0

24**

*-0

.032

***

-0.0

31**

*-0

.023

***

-0.0

20-0

.018

-0.0

11(0

.006

)(0

.006

)(0

.006

)(0

.007

)(0

.007

)(0

.007

)(0

.013

)(0

.013

)(0

.013

)

Obse

rvat

ions

61,2

6761

,267

61,2

6747

,308

47,3

0847

,308

21,3

5221

,352

21,3

52R

-Squar

ed0.

027

0.03

00.

034

0.02

70.

030

0.03

40.

041

0.04

50.

049

Alt

onji

etal

.R

atio

3.20

22.

587

1.15

7

Dep

enden

tV

aria

ble

=E

xce

llen

tH

ealt

h[1

em]

Fer

tility

-0.0

02-0

.005

**-0

.003

-0.0

10**

*-0

.008

***

-0.0

07**

*-0

.024

***

-0.0

18**

*-0

.016

***

(0.0

03)

(0.0

02)

(0.0

02)

(0.0

03)

(0.0

02)

(0.0

02)

(0.0

04)

(0.0

03)

(0.0

03)

Obse

rvat

ions

70,2

7770

,277

70,2

7753

,393

53,3

9353

,393

24,3

5824

,358

24,3

58R

-Squar

ed0.

033

0.32

10.

323

0.04

10.

329

0.33

10.

054

0.34

10.

343

Alt

onji

etal

.R

atio

-4.0

691.

801

2.12

6

OL

Sre

gre

ssio

ns

des

crib

edin

equati

on

1are

pre

sente

dusi

ng

dev

elopin

gco

untr

y(D

HS)

and

US

(NH

IS)

data

.T

he

2+

,3+

and

4+

sam

ple

sare

defi

ned

inth

e

esti

mati

on

sam

ple

sect

ion

of

the

pap

er(s

ecti

on

4).

Base

contr

ols

consi

stof

fixed

effec

tsfo

rch

ild’s

age

and

yea

rof

bir

th,

child

gen

der

,m

oth

er’s

age

at

bir

th,

and

acu

bic

for

moth

er’s

age

at

tim

eof

surv

ey.

For

the

USA

sam

ple

,m

oth

er’s

race

fixed

effec

tsare

incl

uded

.F

or

DH

Sdata

,co

untr

yfixed

effec

tsare

als

oin

cluded

.

Addit

ionalso

cioec

onom

icco

ntr

ols

consi

stof

moth

er’s

educa

tion

and

(for

DH

Sdata

)w

ealt

hquin

tile

fixed

effec

ts,

and

hea

lth

contr

ols

incl

ude

aco

nti

nuous

mea

sure

of

moth

er’s

BM

I,and

for

DH

S,

moth

er’s

hei

ght

and

cover

age

of

pre

nata

lca

reat

the

level

of

the

surv

eycl

ust

er.

For

USA

data

,w

ein

clude

contr

ols

for

moth

er’s

self

ass

esse

dhea

lth

on

aL

iker

tsc

ale

.Sta

ndard

erro

rsare

clust

ered

by

moth

er.∗

p<

0.1

;∗∗

p<

0.0

5;∗∗∗p<

0.0

1

39

Tab

le6:

Dev

elop

ing

Cou

ntr

yIV

Est

imat

es

2+3+

4+

Bas

e+

H+

S&

HB

ase

+H

+S&

HB

ase

+H

+S&

H

Panel

A:

Fir

stSta

ge

Dep

enden

tV

aria

ble

=F

erti

lity

Tw

ins

0.83

2***

0.84

2***

0.84

3***

0.82

9***

0.83

8***

0.83

9***

0.86

0***

0.86

6***

0.86

8***

(0.0

29)

(0.0

29)

(0.0

28)

(0.0

25)

(0.0

25)

(0.0

25)

(0.0

26)

(0.0

26)

(0.0

26)

Kle

iber

gen-P

aap

rkst

atis

tic

826.

3287

2.87

928.

6910

66.3

311

24.7

611

58.3

011

19.7

510

94.9

611

41.4

3p-v

alue

ofrk

stat

isti

c0.

000

0.00

00.

000

0.00

00.

000

0.00

00.

000

0.00

00.

000

Panel

B:

IVR

esu

lts

Dep

enden

tV

aria

ble

=Sch

ool

Z-S

core

Fer

tility

-0.0

04-0

.015

-0.0

12-0

.028

-0.0

41*

-0.0

46**

-0.0

27-0

.038

*-0

.037

**(0

.028

)(0

.027

)(0

.026

)(0

.022

)(0

.021

)(0

.020

)(0

.022

)(0

.021

)(0

.019

)

Obse

rvat

ions

259,

958

259,

958

259,

958

395,

687

395,

687

395,

687

409,

576

409,

576

409,

576

Coeffi

cien

tD

iffer

ence

0.01

90.

354

0.00

60.

055

0.10

80.

316

Panel

sA

and

Bpre

sent

coeffi

cien

tsand

standard

erro

rsfo

rth

efirs

tand

seco

nd

stages

ineq

uati

ons

2a

and

2b.

The

2+

subsa

mple

refe

rsto

all

firs

tb

orn

childre

n

infa

milie

sw

ith

at

least

two

bir

ths.

3+

refe

rsto

firs

t-and

seco

nd-b

orn

sin

fam

ilie

sw

ith

at

least

thre

ebir

ths,

and

4+

refe

rsto

firs

t-to

thir

d-b

orn

sin

fam

ilie

s

wit

hat

least

four

bir

ths.

Panel

Apre

sents

the

firs

t-st

age

coeffi

cien

tsof

twin

nin

gon

fert

ilit

yfo

rea

chgro

up.

Base

contr

ols

consi

stof

child

age

and

moth

er’s

age

at

bir

thfixed

effec

tsplu

sco

untr

yand

yea

r-of-

bir

thF

Es.

Addit

ional

soci

oec

onom

icco

ntr

ols

consi

stof

moth

er’s

educa

tion

and

wea

lth

quin

tile

fixed

effec

ts,

and

hea

lth

contr

ols

incl

ude

aco

nti

nuous

mea

sure

of

moth

er’s

hei

ght

and

BM

Iand

cover

age

of

pre

nata

lca

reat

the

level

of

the

surv

eycl

ust

er.

Inea

chca

seth

esa

mple

ism

ade

up

of

all

childre

naged

bet

wee

n6-1

8yea

rsfr

om

fam

ilie

sin

the

DH

Sw

ho

fulfi

ll2+

to4+

requir

emen

ts.

Inpanel

Bea

chce

llpre

sents

the

coeffi

cien

tof

a2SL

Sre

gre

ssio

nw

her

efe

rtilit

yis

inst

rum

ente

dby

twin

nin

gat

bir

thord

ertw

o,

thre

eor

four

(for

2+

,3+

and

4+

gro

ups

resp

ecti

vel

y).

Therk

test

stati

stic

and

corr

esp

ondin

gp-v

alu

ere

ject

that

the

twin

inst

rum

ents

are

wea

kin

each

case

.C

oeffi

cien

tD

iffer

ence

inP

anel

Bre

fers

toa

test

that

the

coeffi

cien

tes

tim

ate

on

Fer

tility

ina

giv

enm

odel

isid

enti

cal

toth

ees

tim

ate

on

Fer

tility

inth

ebase

case

.T

his

test

takes

acc

ount

of

the

corr

elati

on

bet

wee

ner

rors

inth

ebase

and

augm

ente

dre

gre

ssio

nm

odel

(in

the

spir

itof

seem

ingly

unre

late

dre

gre

ssio

ns)

,but

ises

tim

ate

dby

GM

Mto

house

the

IVm

odel

ses

tim

ate

dher

e.L

owp-v

alu

esare

evid

ence

again

steq

uality

of

the

two

esti

mate

s.Sta

ndard

erro

rsare

clust

ered

by

moth

er.∗

p<

0.1

;∗∗

p<

0.0

5;∗∗∗p<

0.0

1

40

Tab

le7:

US

IVE

stim

ates

2+3+

4+

Bas

e+

H+

S&

HB

ase

+H

+S&

HB

ase

+H

+S&

H

Panel

A:

Fir

stSta

ge

Dep

enden

tV

aria

ble

=F

erti

lity

(Sch

ool

Z-S

core

Sec

ond

Sta

ge)

[1em

]T

win

s0.

698*

**0.

701*

**0.

704*

**0.

740*

**0.

740*

**0.

743*

**0.

795*

**0.

799*

**0.

813*

**(0

.026

)(0

.026

)(0

.026

)(0

.047

)(0

.047

)(0

.047

)(0

.080

)(0

.080

)(0

.079

)

Kle

iber

gen-P

aap

rkst

atis

tic

704.

1071

0.40

737.

6224

5.84

249.

1225

0.47

98.5

499

.30

105.

52p-v

alue

ofrk

stat

isti

c0.

000

0.00

00.

000

0.00

00.

000

0.00

00.

000

0.00

00.

000

Panel

B:

IVR

esu

lts

Dep

enden

tV

aria

ble

=Sch

ool

Z-S

core

Fer

tility

-0.0

98-0

.099

-0.1

01*

-0.0

12-0

.015

-0.0

17-0

.124

-0.1

34-0

.142

(0.0

61)

(0.0

61)

(0.0

60)

(0.0

67)

(0.0

67)

(0.0

67)

(0.1

52)

(0.1

52)

(0.1

49)

Obse

rvat

ions

61,2

6761

,267

61,2

6747

,308

47,3

0847

,308

21,3

5221

,352

21,3

52C

oeffi

cien

tD

iffer

ence

0.81

80.

434

0.54

70.

394

0.23

10.

113

Dep

enden

tV

aria

ble

=E

xce

llen

tH

ealt

hF

erti

lity

0.00

90.

027

0.02

6-0

.036

-0.0

58*

-0.0

57*

0.03

3-0

.025

-0.0

31(0

.025

)(0

.021

)(0

.021

)(0

.039

)(0

.032

)(0

.032

)(0

.060

)(0

.053

)(0

.052

)

Obse

rvat

ions

70,2

7770

,277

70,2

7753

,393

53,3

9353

,393

24,3

5824

,358

24,3

58C

oeffi

cien

tD

iffer

ence

0.16

40.

212

0.34

10.

366

0.12

20.

089

Notes:

Reg

ress

ions

inea

chpanel

and

the

defi

nit

ion

of

the

2+

,3+

and

4+

gro

ups

are

iden

tica

lto

Table

6and

are

des

crib

edin

note

sto

Table

6.

This

table

pre

sents

the

sam

ere

gre

ssio

ns

how

ever

now

usi

ng

NH

ISsu

rvey

data

(2004-2

014).

Base

contr

ols

incl

ude

child

age

FE

(in

month

s),

moth

er’s

age

and

moth

er’s

race

FE

s.A

ddit

ional

soci

oec

onom

icco

ntr

ols

consi

stof

moth

er’s

educa

tion

fixed

effec

ts,

and

hea

lth

contr

ols

incl

ude

aco

nti

nuous

mea

sure

of

moth

er’s

BM

I,and

a

Lik

ert

scale

mea

sure

of

am

oth

er’s

self

-ass

esse

dhea

lth.

Inea

chca

seth

esa

mple

ism

ade

up

of

all

childre

naged

bet

wee

n6-1

8yea

rsfr

om

fam

ilie

sin

the

NH

ISw

ho

fulfi

ll2+

to4+

requir

emen

tsfo

rsc

hooling

vari

able

s,and

for

childre

naged

bet

wee

n1-1

8yea

rsfo

rhea

lth

vari

able

s.T

he

firs

tst

age

resu

lts

and

test

sof

inst

rum

ent

stre

ngth

are

dis

pla

yed

for

the

regre

ssio

nusi

ng

the

educa

tion

sam

ple

only

.Q

ualita

tivel

ysi

milar

resu

lts

are

obse

rved

for

the

hea

lth

sam

ple

.A

des

crip

tion

of

the

Kle

iber

gen

-Paap

stati

stic

and

Coeffi

cien

tD

iffer

ence

are

pro

vid

edin

note

sto

Table

6.

Des

crip

tive

stati

stic

sfo

rea

chva

riable

can

be

found

inta

ble

A2.

Sta

ndard

erro

rsare

clust

ered

by

moth

er.∗p<

0.1

;∗∗

p<

0.0

5;∗∗∗p<

0.0

1

41

Tab

le8:

Lin

ear

and

Non

-Lin

ear

IVE

stim

ates

for

Mar

ginal

Eff

ects

wit

han

dw

ithou

tF

ull

Tw

inC

ontr

ols

Tw

o-P

lus

Thre

e-P

lus

Fou

r-P

lus

Bas

elin

e+

S&

HB

asel

ine

+S&

HB

asel

ine

+S&

HIn

stru

men

tN

on-l

inea

rIV

Non

-lin

ear

IVN

on-l

inea

rIV

Non

-lin

ear

IVN

on-l

inea

rIV

Non

-lin

ear

IV

Panel

A:

Lin

ear

Est

imate

sof

Marg

inal

Eff

ect

sN

um

ber

ofC

hildre

n0.

031

0.01

1-0

.011

-0.0

48∗∗

0.01

2-0

.024

(0.0

36)

(0.0

33)

(0.0

26)

(0.0

24)

(0.0

29)

(0.0

25)

Panel

B:

Unre

stri

cted

Est

imate

sof

Marg

inal

Eff

ect

sSib

lings≥

20.

209∗∗

0.01

75(0

.083

)(0

.121

)Sib

lings≥

3-0

.089

-0.1

02∗

-0.0

12-0

.063∗

(0.0

57)

(0.0

54)

(0.0

52)

(0.0

37)

Sib

lings≥

4-0

.032

-0.0

41-0

.036

-0.0

450.

014

-0.0

36(0

.066

)(0

.044

)(0

.043

)(0

.039

)(0

.041

)(0

.051

)Sib

lings≥

50.

055

0.05

90.

034

0.03

60.

036

0.03

5(0

.078

)(0

.040

)(0

.073

)(0

.038

)(0

.041

)(0

.036

)

Obse

rvat

ions

245,

534

245,

534

356,

892

356,

892

334,

924

334,

924

Each

colu

mn

and

panel

pre

sents

ase

para

tere

gre

ssio

nusi

ng

DH

Sdata

.Sib

lings≥

2re

fers

toth

em

arg

inal

effec

tof

mov

ing

from

1to

2si

blings,

Sib

lings≥

3

refe

rsto

mov

ing

from

2to

3si

blings,

and

sofo

rth.

Each

model

incl

udes

mate

rnal

age,

countr

y,su

rvey

yea

rand

child

age

fixed

effec

tsas

wel

las

child’s

gen

der

.

The

regre

ssio

ns

inco

lum

ns

2,

4and

6are

augm

ente

dw

ith

all

soci

oec

onom

icand

hea

lth

contr

ols

des

crib

edin

Table

5of

the

pap

er.

Sta

ndard

erro

rsare

esti

mate

d

usi

ng

ablo

ckb

oots

trap

sam

pling

each

fam

ily

wit

hre

pla

cem

ent,

and

for

each

boots

trap

replica

tion

the

both

the

regre

ssio

nand

the

const

ruct

edin

stru

men

tsare

rees

tim

ate

d.

Fir

stst

age

regre

ssio

ns

are

dis

pla

yed

inT

able

A14.

42

Tab

le9:

Bou

nds

Est

imat

esof

the

Quan

tity

–Qual

ity

Tra

de-

off

Nev

oan

dR

osen

(201

2)C

onle

yet

al.

(201

2)IV

Imp

erfe

ctIV

Bou

nds

UC

I:γ∈

[0,2γ

]LT

Z:

Em

pir

ical

Dis

trib

uti

onγ

wit

hC

ontr

ols

Low

erB

ound

Upp

erB

ound

Low

erB

ound

Upp

erB

ound

Low

erB

ound

Upp

erB

ound

Pan

elA

:D

HS

Tw

oP

lus

-0.0

124

-0.0

842

0.03

84-0

.070

20.

0184

-0.0

657

0.01

32T

hre

eP

lus

-0.0

456

-0.0

759

-0.0

068

-0.0

690

-0.0

012

-0.0

646

-0.0

067

Fou

rP

lus

-0.0

371

-0.0

632

-0.0

001

-0.0

800

-0.0

197

-0.0

748

-0.0

235

Pan

elB

:U

SA

(Educa

tion

)T

wo

Plu

s-0

.102

3-0

.048

00.

0164

-0.2

195

-0.0

026

-0.2

101

-0.0

113

Thre

eP

lus

-0.0

164

-0.0

448

0.11

49-0

.129

10.

0795

-0.1

208

0.07

09F

our

Plu

s-0

.148

8-0

.070

90.

1547

-0.4

329

0.12

00-0

.424

20.

1132

Pan

elB

:U

SA

(Hea

lth)

Tw

oP

lus

0.02

67-0

.084

30.

0374

-0.0

247

0.06

15-0

.016

40.

0534

Thre

eP

lus

-0.0

539

-0.0

764

-0.0

072

-0.1

107

-0.0

137

-0.1

027

-0.0

219

Fou

rP

lus

-0.0

298

-0.0

638

0.00

01-0

.104

10.

0295

-0.0

972

0.02

17

Notes:

This

table

pre

sents

upp

erand

low

erb

ounds

of

a95%

confiden

cein

terv

al

for

the

effec

tsof

fam

ily

size

on

(sta

ndard

ised

)ch

ildre

n’s

educa

tional

att

ain

men

tand

hea

lth

(hea

lth

inU

SA

only

).N

evo

and

Rose

n(2

012)

bounds

are

pre

sente

din

colu

mns

2and

3,

and

vari

ants

of

Conle

yet

al.

(2012)

bounds

are

pre

sente

din

colu

mns

4-7

.th

eIV

poin

tes

tim

ate

wit

hfu

llco

ntr

ols

isdis

pla

yed

for

com

pari

son

inco

lum

n1.

Nev

oand

Rose

n(2

012)

bounds

are

base

don

the

ass

um

pti

on

that

twin

nin

gis

posi

tivel

yse

lect

edand

fert

ilit

yis

neg

ati

vel

yse

lect

ed,

and

twin

sare

“le

ssen

dogen

ous”

than

fert

ilit

y.C

onle

yet

al.

(2012)

bounds

are

esti

mate

das

des

crib

edin

sect

ion

3.2

under

vari

ous

pri

ors

ab

out

the

dir

ect

effec

tth

at

bei

ng

from

atw

infa

mily

has

on

educa

tional

outc

om

es

(γ).

Inth

eU

CI

(unio

nof

confiden

cein

terv

al)

appro

ach

,it

isass

um

edth

etr

ueγ∈

[0,2γ

],w

hile

inth

eLT

Z(l

oca

lto

zero

)appro

ach

itis

ass

um

edth

atγ

follow

sth

eem

pir

ical

dis

trib

uti

on

esti

mate

din

each

case

.T

he

pre

ferr

edpri

or

forγ

(γ)

and

its

dis

trib

uti

on

isdis

cuss

edin

App

endix

E,

and

esti

mate

sfo

rγ

are

pro

vid

edin

Table

A15.

Com

pari

sons

under

ara

nge

of

pri

ors

are

pre

sente

din

Fig

ure

s2-3

.E

ach

esti

mate

isbase

don

the

spec

ifica

tions

wit

hfu

llco

ntr

ols

from

Table

s6

and

7.

43

Figures

Figure 1: Twins shift the fertility distribution outward

0.0

5.1

.15

Density

0 10 20 30total children ever born

Twin Family Singleton Family

(a) Developing Countries

0.0

5.1

.15

.2D

ensity

0 5 10 15total children ever born

Twin Family Singleton Family

(b) United States

Note to Figure 1: Densities of family size come from the full estimation samples from DHS and NHIS data. Kerneldensities are plotted (bandwidth equals two in all cases), and present the frequency of the total number of children perfamily by family type.

Figure 2: Plausibly Exogenous Bounds: School Z-Score (Developing Countries 3+)

−.1

−.0

8−

.06

−.0

4−

.02

0β

0 .01 .02 .03 .04 .05δ

Point Estimate (LTZ) CI (LTZ)

Methodology described in Conley et al. (2012)

Note to Figure 2: Confidence intervals and point estimates are calculated according to Conley et al. (2012) usingDHS data and specifications described in section 5.3. Estimates reflect a range of priors regarding the validity of theexclusion restriction required to consistently estimate βfert using twinning in a 2SLS framework. The local to zero(LTZ) approach applied here assumes that γ, the sign on the instrument when included in the structural equation, isdistributed γ ∼ U(0, δ). The vertical dashed line indicates the point at which the preferred estimate γ lies precisely atthe centre of the assumed support for γ. Further discussion is provided in section 3.2 and Table 9.

44

Figure 3: Plausibly Exogenous Bounds: (USA 3+)

−.1

5−

.1−

.05

0β

0 .01 .02 .03 .04 .05δ



(a) Excellent Health

−.2

−.1

0.1

β

0 .01 .02 .03 .04 .05δ



(b) Education Z-Score

Notes to Figure 3: See notes to Figure 2. An identical approach is employed, however now using USA (NHIS) data.

Figure 4: Parameter and Bound Estimates of the Q–Q Trade-off

Two−Plus Three−Plus Four−Plus

−.1

−.0

50

.05

.1E

stim

ate

d Q

−Q

95

% B

ou

nd

s

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

Note to Figure 4: Each set of estimates refer to the 95% confidence intervals on parameter bounds of the impact offertility on child education. Two-Plus, Three-Plus and Four-Plus refer to parity specific groups. Base IV refer to theIV estimate most closely following the existing literature, with +H and +S&H presenting IV estimates controllingfor maternal health and socioeconomic variables. OLS point estimates are presented along with their 95% confidenceintervals, which are quite narrow. OLS estimates include all maternal controls (corresponding to base, and +S&H).Versions without maternal controls are even more negative. The final two sets of bounds in each group are estimatedfollowing Nevo and Rosen (2012) and Conley et al. (2012) procedures, and do not have a corresponding point estimate.

45

ONLINE APPENDIX

For the paper:

The Twin Instrument:Fertility and Human Capital Investment

Sonia Bhalotra and Damian Clarke

Contents

A Appendix Figures and Tables A2

B Data Definitions A22

C Testing for Equality of Coefficients Between IV Models A23

D Loosening the Linear Effect Specification of the Q–Q Trade-off A23

E Estimating Values for γ A26E.1 Estimating γ: A case from the United States . . . . . . . . . . . . . . . . . . . . . . . . . . . . A27E.2 Estimating γ in Nigeria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A28E.3 Estimated Values for γ in US and Nigeria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A28E.4 Assumptions and Evidence Underlying the Calculation of Plausibly Exogenous Bounds . . . . . A29E.5 Bootstrap Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A32

A1

A Appendix Figures and Tables

Figure A1: Education and Fertility Trends (USA)

1.5

22.5

33.5

Fert

ility

per

Wom

an (

World B

ank)

12

12.5

13

13.5

14

Avera

ge Y

ears

of E

ducation (

AC

S)

1960 1970 1980 1990 2000 2010Year

Average Years of Education Fertility per Woman

Note to figure A1: Trends in fertility and education are compiled from the World Bank databank and the American CommunitySurveys (ACS), respectively. Trends in fertility are directly reported by the World Bank as completed fertility per woman wereshe exposed to prevailing rates in a given year for her whole fertile life. Education is calculated using all women aged over 25 yearsin the ongoing ACS (2001-2013) collected by the United States Census Bureau. The figure presents average completed educationfor all women aged 25 in the year in question.

Figure A2: Education and Fertility (Developing Countries)

24

68

10

Birth

s p

er

Wom

an

1940 1950 1960 1970 1980Mother’s year of birth

Fertility per Woman Desired Fertility

(a) Trends in Fertility

23

45

6

Years

of E

ducation

1940 1960 1980 2000

DHS Birth Cohort

(b) Trend in Education

Note to figure A2: Cohorts are made up of all individuals from the DHS who are aged over 35 years (for fertility), and over 15years (for education). In each case the sample is restricted to those who have approximately completed fertility and educationrespectively. Full summary statistics for these variables are provided in table A2, and a full list of country and survey years areavailable in table A12.

A2

Figure A3: Birth Size of Twins versus Singletons (Developing Countries)

0.05

0.13 0.13

0.26

0.52

0.44

0.22

0.13

0.07

0.04

0.1

.2.3

.4.5

Pro

port

ion o

f C

hild

ren

Very Small Small Average Large Very Large

Singletons Twins

Note to figure A3 Estimation sample consists of all surveyed births from DHS countries occurring within 5 years prior to the dateof the survey. For each of these births, all mothers retrospectively report the (subjective) size of the baby at the time of birth.

Figure A4: Birth Weight of Twins versus Singletons (USA)

0.0

002

.0004

.0006

.0008

Fra

ction

0 2000 4000 6000Birth Weight

Twins Singletons

Note to figure A4 Estimation sample consists of all non-ART births from NVSS data between 2009 and 2013. Birthweights below500 grams and above 6,500 grams are trimmed from the sample.

A3

Figure A5: Proportion of Twins by Birth Order (United States)

.02

.03

.04

.05

.06

Fra

ction T

win

s

1 2 3 4 5 6

Birth Order

Note to figure A5 The fraction of twin births are calculated from the full sample of non-ART users in NVSS data from 2009-2013.The solid line represents the average fraction of twins in the full sample (2.89%), while the dotted line presents twin frequency bybirth order. The dotted line joins points at each birth order. Birth orders greater than 6 are removed from the sample given thatthese account for less than 0.5% of all recorded births.

Figure A6: Proportion of Twins by Birth Order (Developing Countries)

0.0

1.0

2.0

3.0

4

Fra

ction tw

ins

0 2 4 6 8 10

Birth Order

Note to figure A6 The fraction of twin births are calculated from the full sample of DHS data. The solid line represents the averagefraction of twins in the full sample (1.85%), while the dotted line presents twin frequency by birth order. The dotted line joinspoints at each birth order ∈ {1, . . . , 10}. The fraction of singleton births is 1−frac(twin).

A4

Fig

ure

A7:

Tot

alF

amil

yS

ize

inA

nal

ysi

sS

amp

les

0.1.2.3.4Fraction

05

10

15

To

tal n

um

be

r o

f ch

ildre

n in

th

e f

am

ily

(a)

Tw

o-P

lus

Gro

up

0.1.2.3.4

Fraction

05

10

15

To

tal n

um

be

r o

f ch

ildre

n in

th

e f

am

ily

(b)

Th

ree-

Plu

sG

rou

p

0.1.2.3.4

Fraction

51

01

5

To

tal n

um

be

r o

f ch

ildre

n in

th

e f

am

ily

(c)

Fou

r-P

lus

Gro

up

Note

tofigure

A7:

His

togra

ms

dis

pla

yth

eto

tal

fam

ily

size

of

fam

ilie

sm

eeti

ng

incl

usi

on

crit

eria

for

each

esti

mati

on

sam

ple

(tw

o-p

lus,

thre

e-plu

s,and

four-

plu

s).

By

defi

nit

ion,

the

two-p

lus

sam

ple

only

incl

udes

fam

ilie

sw

ith

at

least

two

bir

ths,

the

thre

e-plu

ssa

mple

only

incl

udes

fam

ilie

sw

ith

at

least

thre

ebir

ths,

and

the

four-

plu

ssa

mple

only

incl

udes

fam

ilie

sw

ith

at

least

four

bir

ths.

A5

Figure A8: Density Test of Instrumental Validity from Kitagawa (2015)

-2 -1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

School Z-Score, Treated Outcome

Y(1)

density

Z=1:Twin Birth

Z=0:Singleton Birth

Note to figure A8: Kernel density plots document the sub-densities of the outcome variable of interest in IV regressions (schoolZ-score) for children preceding twins and for children not preceding twins in the 2+ sample. “Treated” refers to families with atleast 3 children, and so both densities document frequencies only for this group. The Kitagawa (2015) test consists of determiningwhether the two densities intersect, with intersection being evidence of instrumental invalidity. We follow Kitagawa in using aGaussian kernel and bandwidth of 0.08. Outliers are suppressed from the graph to ease visualisation of the sub-densities. Resultsfor the full version of the test including controls along with p-values associated with instrumental invalidity are presented in tableA10.

Figure A9: Plausibly Exogenous Bounds: School Z-Score (Developing Countries 2+ and 4+)

−.1

−.0

50

.05

β

0 .01 .02 .03 .04 .05δ



(a) Two Plus

−.1

2−

.1−

.08

−.0

6−

.04

−.0

2β

0 .01 .02 .03 .04 .05δ



(b) Four Plus

Note to figure A9: Refer to notes to figure 5 of the main text.

A6

Figure A10: Plausibly Exogenous Bounds: (USA 2+ and 4+)−

.1−

.05

0.0

5β

0 .01 .02 .03 .04 .05δ



(a) Excellent Health (2+)

−.1

5−

.1−

.05

0.0

5β

0 .01 .02 .03 .04 .05δ



(b) Excellent Health (4+)

−.2

5−

.2−

.15

−.1

−.0

50

β

0 .01 .02 .03 .04 .05δ



(c) Education Z-Score (2+)

−.3

−.2

−.1

0.1

β

0 .01 .02 .03 .04 .05δ



(d) Education Z-Score (4+)

Note to figure A10: Refer to notes to figure 5 of the main text.

Figure A11: Parameter and Bound Estimates of the Q–Q Trade-off (USA)


−.6

−.4

−.2

0.2

.4E

stim

ate

d Q

−Q

95

% B

ou

nd

s

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

(a) Education Z-Score


−.2

−.1

0.1

.2E

stim

ate

d Q

−Q

95

% B

ou

nd

s

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

OLS

Base

IV

+H IV

+S&H

IV

Nev

o Ros

en

Con

ley et

al

(b) Excellent Health

Note to Figure A11: Refer to notes to Figure 4. Identical bounds are presented, but in this case based on NHIS data (withconsiderably fewer observations).

A7

Tab

leA

1:

Th

eQ

uanti

ty–Q

ual

ity

Tra

de-

offan

dth

eT

win

Inst

rum

ent:

Rec

ent

Stu

die

s

Est

imate

s

Au

thor

Data

,P

erio

dC

ontr

ols

Incl

ud

edS

am

ple

OL

SIV

(1)

Bla

cket

al.

(2005

)N

orw

aym

atch

edad

min

istr

ativ

efi

les

of

ind

ivid

ual

sag

ed16

-74

du

rin

g198

6-200

0,(c

hil

dre

n>

25ye

ars

).O

utc

om

eis

com

ple

ted

year

sof

edu

cati

on.

Age

,p

aren

ts’ag

e,p

aren

ts’ed

-u

cati

on,

sex.

Tw

oP

lus

Th

ree

Plu

s

Fou

rP

lus

-0.0

60

(0.0

03)

-0.0

76

(0.0

04)

-0.0

59

(0.0

06)

-0.0

38

(0.0

47)

-0.0

16

(0.0

44)

-0.0

24

(0.0

59)

(2)

Cace

res-

Del

pia

no

(200

6)U

SA

198

0C

ensu

sF

ive-

Per

cent

Pu

bli

cU

seM

icro

Sam

ple

.C

hil

dre

nag

ed6-

16

yea

rs.

Ou

tcom

e(r

epor

ted

her

e)is

an

ind

icat

or

ofw

het

her

the

chil

dis

beh

ind

his

or

her

coh

ort.

Age

,st

ate

ofre

sid

ence

,m

oth

er’s

edu

cati

on,

race

,m

oth

er’s

age,

sex.

Tw

oP

lus

Th

ree

Plu

s

0.0

11

(0.0

00)

0.0

17

(0.0

01)

0.0

02

(0.0

03)

0.0

10

(0.0

06)

(3)

An

gri

stet

al.

(2010

)Is

rael

20%

pu

bli

c-u

sem

icro

dat

asa

mp

les

from

199

5an

d19

83ce

n-

suse

s,18-

60

year

old

resp

ond

ents

.O

utc

om

e(r

epor

ted

her

e)is

hig

hes

tgr

ad

eco

mp

lete

d.

Age

,m

issi

ng

mon

thof

bir

th,

mot

her

’sag

e,ag

eat

firs

tb

irth

and

age

atim

mig

rati

on

,m

oth

er’s

and

fath

er’s

pla

ceof

bir

th,

and

censu

sye

ar.

Tw

oP

lus

Th

ree

Plu

s

-0.1

45

(0.0

05)

-0.1

43

(0.0

05)

0.1

74

(0.1

66)

0.1

67

(0.1

17)

(4)

Li

etal

.(2

008

)T

he

1p

erce

nt

sam

ple

ofth

e19

90C

hin

ese

Pop

ula

tion

Cen

sus.

Su

b-

ject

sare

6-1

7yea

rold

sw

ith

mot

her

sw

ho

are

35

year

sof

age

oryo

un

ger.

Ou

tcom

e(r

eport

edh

ere)

isye

ars

ofsc

hooli

ng.

Ch

ild

age,

gen

der

,et

hn

icgr

oup

,bir

thor

der

,an

dp

lace

ofre

sid

ence

.P

aren

talag

eand

edu

cati

onal

leve

l.

Tw

oP

lus

Th

ree

Plu

s

-0.0

31

(-29.6

)†

-0.0

38

(-21.4

)†

0.0

02

(0.1

8)†

-0.0

24

(-1.7

0)†

(5)

Fit

zsim

ons

and

Mald

e(2

014

)M

exic

anS

urv

eyd

ata

(EN

CA

SE

H)

from

1996

-1999

.S

ub

ject

sar

e12

-17

year

old

s.O

utc

om

e(r

epor

ted

her

e)is

yea

rsof

sch

ooli

ng.

Par

ent’

sag

e,p

aren

ts’

years

of

sch

ool

ing

and

sch

ool

ing

du

m-

mie

s,b

irth

spac

ing,

hou

seh

old

good

s(r

oom

s,la

nd

,w

ate

r,et

c).

Tw

oP

lus

Th

ree

Plu

s

Fou

rP

lus

-0.0

20

(0.0

01)

-0.0

20

(0.0

01)

-0.0

18

(0.0

02)

-0.0

19

(0.0

15)

0.0

07

(0.0

25)

-0.0

32

(0.0

36)

A8

Est

imate

s

Au

thor

Data

,P

erio

dC

ontr

ols

Incl

ud

edS

am

ple

OL

SIV

(6)

Ros

enzw

eig

and

Zh

an

g(2

009)

Th

eC

hin

ese

Ch

ild

Tw

ins

Su

r-ve

y(C

CT

S),

200

2-20

03.

In-

div

idu

als

sele

cted

from

twin

s’(a

ged

7-18)

an

dn

on-t

win

hou

seh

old

s.O

utc

ome

(re-

por

ted

her

e)is

yea

rsof

sch

ool

-in

g

Mot

her

’sag

eat

tim

eof

bir

th,

chil

dge

nd

eran

dag

e.R

edu

ced

Form

Red

uce

dF

orm

+B

wt

-0.3

07

(1.9

2)†

-0.2

25

(1.3

1)†

(7)

Pon

czek

and

Sou

za(2

012)

1991

Bra

zili

anC

ensu

sm

icro

-d

ata

,10

an

d20%

sam

ple

.C

hil

dre

nof

10-

15

yea

rs,

and

18-2

0ye

ars

old

.O

utc

ome

re-

por

ted

her

eis

year

sof

sch

ool

com

ple

ted

.

Ch

ild

’sge

nd

er,

age

and

race

contr

ols,

;m

oth

eran

dfa

mil

yh

ead

’syea

rsof

sch

ool

ing,

an

dag

e.

Tw

oP

lus

(M)

Tw

oP

lus

(F)

Th

ree

Plu

s(M

)T

hre

eP

lus

(F)

-0.2

33

(0.0

10)

-0.2

77

(0.0

15)

-0.2

30

(0.0

10)

-0.2

83

(0.0

15)

-0.1

37

(0.1

46)

-0.3

72

(0.1

98)

-0.0

60

(0.1

64)

-0.6

34

(0.1

94)

Note

s:In

div

idual

sourc

esdis

cuss

edfu

rther

inth

eb

ody

of

the

text.

Est

imate

sre

port

edin

each

study

are

pre

sente

dalo

ng

wit

hth

eir

standard

erro

rsin

pare

nth

esis

.P

are

nth

eses

mark

edas†

conta

inth

et-

stati

stic

rath

erth

an

the

standard

erro

r.

A9

Table A2: Summary Statistics

Developing Countries United States

Single Twins All Single Twins All

Mother’s CharacteristicsFertility 3.592 6.489 3.711 1.925 3.094 1.955

(2.351) (2.724) (2.436) (1.002) (1.185) (1.024)Age 31.18 35.49 36.16 37.24 36.19

(8.095) (7.385) (8.113) (8.423) (8.069) (8.415)Education 4.823 3.582 4.772 12.57 12.74 12.58

(4.721) (4.330) (4.712) (2.310) (2.220) (2.308)Height 155.6 157.4 155.7 - - -

(7.075) (7.050) (7.083) - - -BMI 23.31 23.69 23.32 27.65 28.12 27.66

(4.819) (5.004) (4.827) (6.715) (7.326) (6.732)Pr(BMI)<18.5 0.124 0.100 0.123 0.0197 0.0159 0.0196

(0.330) (0.300) (0.329) (0.139) (0.125) (0.139)Excellent Health - - - 0.318 0.324 0.318

- - - (0.465) (0.468) (0.465)Children’s OutcomesAge 11.55 11.67 11.56 11.19 10.77 11.18

(3.287) (3.278) (3.286) (3.891) (3.901) (3.891)Education (Years) 3.584 3.174 3.556 5.151 4.650 5.139

(3.152) (3.022) (3.145) (3.851) (3.769) (3.850)Education (Z-Score) 0.00423 -0.100 0.000 0.00274 -0.110 0.0000

(0.982) (0.981) (1.000) (1.001) (0.950) (1.000)Infant Mortality 0.0587 0.137 0.0592 - - -

(0.235) (0.137) (0.236) - - -Excellent Health - - - 0.531 0.541 0.531

- - - (0.499) (0.498) (0.499)Fraction Twin 0.0203 0.0257

(0.139) (0.158)Birth Order Twin 4.448 2.196

(2.457) (1.064)

Observations 2,046,879 41,547 2,005,332 221,381 5,832 227,213

Notes: Summary statistics are presented for the full estimation sample consisting of all children 18 years of age

and under born to the 874,945 mothers responding to any publicly available Demographic and Health Survey or the

88,178 mothers responding to the National Health Interview Survey from 2004 to 2014. Group means are presented

with standard deviation below in parenthesis. Education is reported as total years attained, and Z-score presents

educational attainment relative to birth and country cohort for DHS, and birth quarter cohort for NHIS (mean 0,

std deviation 1). Infant mortality refers to the proportion of children who die before 1 year of age. Maternal height

is reported in centimetres, and BMI is weight in kilograms over height in metres squared. For a full list of DHS

country and years of survey, see Appendix Table A12.

A10

Tab

leA

3:O

LS

Est

imat

esw

ith

and

wit

hou

tB

irth

Ord

erC

ontr

ols

(Poole

dD

HS

Data

)

No

Bir

thO

rder

FE

sB

irth

Ord

erF

Es

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Base

+S

+S

+H

No

Fer

tili

tyB

ase

+S

+S

+H

Fer

tili

ty-0

.117*

**-0

.101

***

-0.0

67**

*-0

.128

***

-0.1

08***

-0.0

72***

(0.0

01)

(0.0

01)

(0.0

01)

(0.0

01)

(0.0

01)

(0.0

01)

Bir

thO

rder

2-0

.175

***

-0.0

57***

-0.0

61***

-0.0

40***

(0.0

04)

(0.0

04)

(0.0

03)

(0.0

03)

Bir

thO

rder

3-0

.352

***

-0.0

99***

-0.1

09***

-0.0

71***

(0.0

05)

(0.0

05)

(0.0

05)

(0.0

05)

Bir

thO

rder

4-0

.493

***

-0.0

99***

-0.1

17***

-0.0

75***

(0.0

06)

(0.0

06)

(0.0

06)

(0.0

06)

Bir

thO

rder

5-0

.596

***

-0.0

62***

-0.0

89***

-0.0

57***

(0.0

07)

(0.0

07)

(0.0

07)

(0.0

07)

Bir

thO

rder

6-0

.688

***

-0.0

18**

-0.0

57***

-0.0

44***

(0.0

08)

(0.0

09)

(0.0

09)

(0.0

08)

Bir

thO

rder

7-0

.750

***

0.05

1***

-0.0

05

-0.0

14

(0.0

09)

(0.0

10)

(0.0

10)

(0.0

10)

Bir

thO

rder

8-0

.786

***

0.13

8***

0.0

66***

0.0

31***

(0.0

10)

(0.0

12)

(0.0

12)

(0.0

11)

Bir

thO

rder

9-0

.839

***

0.20

6***

0.1

13***

0.0

54***

(0.0

12)

(0.0

14)

(0.0

14)

(0.0

13)

Bir

thO

rder≥

10

-0.8

56**

*0.

395***

0.2

68***

0.1

63***

(0.0

14)

(0.0

16)

(0.0

16)

(0.0

15)

Ob

serv

ati

ons

1,128

,699

1,1

28,6

991,

128,

699

1,12

8,69

91,

128,6

99

1,1

28,6

99

1,1

28,6

99

A11

Tab

leA

4:

OL

SE

stim

ates

wit

han

dw

ith

out

Bir

thO

rder

Con

trols

(US

A)

No

Bir

thO

rder

FE

sB

irth

Ord

erF

Es

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Base

+S

+S

+H

No

Fer

tili

tyB

ase

+S

+S

+H

Fer

tili

ty-0

.026*

**-0

.027

***

-0.0

23**

*-0

.023

***

-0.0

24***

-0.0

20***

(0.0

04)

(0.0

04)

(0.0

04)

(0.0

04)

(0.0

04)

(0.0

04)

Bir

thO

rder

2-0

.049

***

-0.0

32***

-0.0

33***

-0.0

33***

(0.0

08)

(0.0

08)

(0.0

08)

(0.0

08)

Bir

thO

rder

3-0

.103

***

-0.0

61***

-0.0

60***

-0.0

59***

(0.0

15)

(0.0

15)

(0.0

15)

(0.0

15)

Bir

thO

rder

4-0

.121

***

-0.0

53**

-0.0

50**

-0.0

46*

(0.0

25)

(0.0

25)

(0.0

25)

(0.0

25)

Bir

thO

rder

5-0

.095

**0.

002

0.0

12

0.0

18

(0.0

46)

(0.0

45)

(0.0

45)

(0.0

45)

Bir

thO

rder

6-0

.194

**-0

.065

-0.0

57

-0.0

43

(0.0

83)

(0.0

81)

(0.0

81)

(0.0

81)

Bir

thO

rder

7-0

.236

-0.0

79

-0.0

62

-0.0

47

(0.1

57)

(0.1

57)

(0.1

56)

(0.1

58)

Bir

thO

rder

80.

012

0.19

10.1

96

0.2

20

(0.4

98)

(0.4

97)

(0.4

95)

(0.4

95)

Bir

thO

rder

9-0

.460

***

-0.2

59**

-0.2

50**

-0.2

07

(0.1

07)

(0.1

15)

(0.1

23)

(0.1

33)

Bir

thO

rder≥

10

-0.4

21**

*-0

.181

***

-0.1

84***

-0.1

48**

(0.0

54)

(0.0

56)

(0.0

54)

(0.0

68)

Ob

serv

ati

ons

163

,931

163,

931

163,

931

163,

931

163,

931

163,9

31

163,9

31

A12

Table A5: Full Output on Health and Socioeconomic Controls from IV Estimates (Developing Countries)

Dependent Variable 2+ 3+ 4+

School Z-Score +H +S&H +H +S&H +H +S&H

Fertility -0.015 -0.012 -0.041* -0.046** -0.038* -0.037**(0.027) (0.026) (0.021) (0.020) (0.021) (0.019)

Mother’s Height 0.009*** 0.003*** 0.009*** 0.003*** 0.008*** 0.003***(0.000) (0.000) (0.000) (0.000) (0.000) (0.000)

Mother’s BMI 0.026*** 0.012*** 0.027*** 0.013*** 0.028*** 0.014***(0.001) (0.001) (0.001) (0.001) (0.001) (0.001)

Doctor Availability 0.203*** -0.029 0.189*** -0.056*** 0.194*** -0.048***(0.024) (0.019) (0.020) (0.016) (0.020) (0.017)

Nurse Availability 0.103*** 0.113*** 0.122*** 0.114*** 0.156*** 0.120***(0.013) (0.012) (0.013) (0.012) (0.014) (0.013)

No Prenatal Care Available -0.457*** -0.259*** -0.483*** -0.302*** -0.523*** -0.364***(0.022) (0.019) (0.019) (0.017) (0.019) (0.017)

Poorest Quintile -0.275*** -0.265*** -0.246***(0.014) (0.011) (0.011)

Quintile 2 -0.114*** -0.114*** -0.088***(0.011) (0.010) (0.010)

Quintile 3 -0.037*** -0.030*** 0.002(0.011) (0.010) (0.010)

Quintile 4 0.026** 0.058*** 0.116***(0.010) (0.010) (0.010)

Richest Quintile 0.155*** 0.229*** 0.327***(0.012) (0.011) (0.012)

Observations 259,958 259,958 395,687 395,687 409,576 409,576R-Squared 0.075 0.153 0.078 0.158 0.071 0.154

Notes: Full output is presented from IV regressions displayed in Table 6 on health and socioeconomic controls

from models denoted “+H” (adding health controls) and “+S&H” (adding health and socioeconomic controls).

Additionally, fixed effects for years of education of the mother are included in regressions though are not displayed in

the interests of space. These fixed effects show a positive gradient with higher education associated with additional

child education. Full notes are available in Table 6.

A13

Table A6: Full Output on Health and Socioeconomic Controls from IV Estimates (USA Education)


School Z-Score +H +S&H +H +S&H +H +S&H

Fertility -0.099 -0.101* -0.015 -0.017 -0.134 -0.142(0.061) (0.060) (0.067) (0.067) (0.152) (0.149)

Excellent Health 0.139 0.131 -0.047 -0.027 0.325 0.353(0.181) (0.178) (0.228) (0.230) (0.602) (0.606)

Very good Health 0.141 0.134 -0.049 -0.028 0.293 0.322(0.181) (0.178) (0.228) (0.229) (0.602) (0.606)

Good Health 0.080 0.086 -0.102 -0.066 0.247 0.289(0.181) (0.178) (0.228) (0.230) (0.602) (0.606)

Fair Health 0.006 0.024 -0.186 -0.140 0.200 0.249(0.181) (0.179) (0.229) (0.230) (0.602) (0.606)

Poor Health -0.098 -0.070 -0.293 -0.232 -0.020 0.047(0.186) (0.183) (0.235) (0.236) (0.609) (0.612)

Mother’s Height 0.079 0.061 0.187 0.168 0.123 0.128(0.102) (0.102) (0.139) (0.138) (0.239) (0.240)

Mother’s Height Squared -0.001 -0.000 -0.001 -0.001 -0.001 -0.001(0.001) (0.001) (0.001) (0.001) (0.002) (0.002)

Smoked Prior to Pregnancy -0.047*** -0.041*** -0.051** -0.046** -0.055 -0.051(0.015) (0.015) (0.020) (0.020) (0.040) (0.041)

No Response to Smoking 0.046* 0.041 0.062* 0.052 0.094* 0.079(0.026) (0.025) (0.034) (0.033) (0.055) (0.054)

Observations 61,267 61,267 47,308 47,308 21,352 21,352R-Squared 0.000 0.003 0.003 0.008 -0.005 -0.004



Additionally, fixed effects for years of education of the mother are included in regressions though are not displayed

in the interests of space. These fixed effects show a positive gradient with higher education associated with

additional child education. Full notes are available in Table 7.

A14

Table A7: Full Output on Health and Socioeconomic Controls from IV Estimates (USA Health)


Excellent Health +H +S&H +H +S&H +H +S&H

Fertility 0.027 0.026 -0.058* -0.057* -0.025 -0.031(0.021) (0.021) (0.032) (0.032) (0.053) (0.052)

Excellent Health 0.501*** 0.499*** 0.451*** 0.455*** 0.089 0.090(0.090) (0.090) (0.136) (0.136) (0.134) (0.128)

Very good Health -0.022 -0.023 -0.076 -0.071 -0.435*** -0.434***(0.090) (0.090) (0.136) (0.136) (0.134) (0.128)

Good Health -0.112 -0.107 -0.172 -0.164 -0.547*** -0.541***(0.090) (0.090) (0.136) (0.136) (0.134) (0.128)

Fair Health -0.096 -0.087 -0.146 -0.136 -0.492*** -0.485***(0.090) (0.090) (0.137) (0.136) (0.134) (0.128)

Poor Health -0.097 -0.085 -0.132 -0.119 -0.598*** -0.588***(0.092) (0.091) (0.139) (0.138) (0.138) (0.132)

Mother’s Height -0.018 -0.024 -0.001 -0.003 0.013 0.022(0.046) (0.046) (0.068) (0.068) (0.120) (0.121)

Mother’s Height Squared 0.000 0.000 0.000 0.000 -0.000 -0.000(0.000) (0.000) (0.001) (0.001) (0.001) (0.001)

Smoked Prior to Pregnancy 0.016** 0.019*** 0.008 0.011 0.008 0.010(0.007) (0.007) (0.010) (0.010) (0.019) (0.019)

No Response to Smoking 0.001 -0.001 -0.004 -0.005 -0.025 -0.027(0.011) (0.011) (0.016) (0.016) (0.027) (0.027)

Observations 70,277 70,277 53,393 53,393 24,358 24,358R-Squared 0.295 0.298 0.295 0.296 0.304 0.306



Additionally, fixed effects for years of education of the mother are included in regressions though are not displayed

in the interests of space. These fixed effects show a positive gradient with higher education associated with

additional child education. Full notes are available in Table 7.

A15

Tab

leA

8:D

evel

opin

gC

ountr

yIV

Est

imat

esU

sin

gS

ame

Sex

Tw

ins

On

ly

2+3+

4+

Bas

e+

H+

S&

HB

ase

+H

+S

&H

Base

+H

+S

&H

Pan

el

A:

Fir

stS

tage

Dep

end

ent

Var

iab

le=

Fer

tili

tyS

am

eS

exT

win

s0.7

03*

**0.7

13*

**0.

717*

**0.

687*

**0.

709*

**0.

713***

0.7

73***

0.7

76***

0.7

83***

(0.0

34)

(0.0

34)

(0.0

33)

(0.0

31)

(0.0

30)

(0.0

30)

(0.0

33)

(0.0

34)

(0.0

34)

Kle

iber

gen

-Paa

prk

stati

stic

419

.61

440.

9747

5.44

506.

9454

7.91

561.6

1552.3

9517.4

9544.8

8p

-val

ue

of

rkst

atis

tic

0.000

0.0

000.

000

0.00

00.

000

0.0

00

0.0

00

0.0

00

0.0

00

Pan

el

B:

IVR

esu

lts

Dep

end

ent

Var

iab

le=

Sch

ool

Z-S

core

Fer

tili

ty0.0

07

-0.0

06-0

.007

-0.0

39-0

.065

*-0

.072**

0.0

13

0.0

06

-0.0

00

(0.0

46)

(0.0

45)

(0.0

43)

(0.0

36)

(0.0

33)

(0.0

31)

(0.0

35)

(0.0

32)

(0.0

28)

Ob

serv

ati

ons

259

,954

259,

954

259,

954

395,

693

395,

693

395,6

93

409,5

73

409,5

73

409,5

73

Coeffi

cien

tD

iffer

ence

0.102

0.31

30.

001

0.0

22

0.5

22

0.3

99

Ref

erto

note

sto

table

6.

This

table

follow

sid

enti

cal

spec

ifica

tions,

how

ever

now

only

sam

ese

xtw

ins

are

use

das

an

inst

rum

ent

inst

ead

of

all

twin

s.

Inth

eD

HS,

64.1

%of

twin

pair

sare

of

the

sam

egen

der

.Sta

ndard

erro

rsare

clust

ered

by

moth

er.∗

p<

0.1

;∗∗

p<

0.0

5;∗∗∗p<

0.0

1

A16

Tab

leA

9:U

SIV

Est

imat

esU

sin

gS

ame

Sex

Tw

ins

On

ly

2+3+

4+

Bas

e+

H+

S&

HB

ase

+H

+S

&H

Base

+H

+S

&H

Pan

el

A:

Fir

stS

tage

Dep

end

ent

Var

iab

le=

Fer

tili

ty(S

chool

Z-S

core

Sec

ond

Sta

ge)

Sam

eS

exT

win

s0.7

15*

**0.7

17*

**0.

718*

**0.

767*

**0.

767*

**0.

770***

0.8

40***

0.8

45***

0.8

53***

(0.0

31)

(0.0

31)

(0.0

31)

(0.0

55)

(0.0

54)

(0.0

54)

(0.1

12)

(0.1

12)

(0.1

11)

Kle

iber

gen

-Paa

prk

stati

stic

522

.05

526.

2654

1.38

196.

8719

9.35

202.3

855.8

656.5

159.5

6p

-val

ue

of

rkst

atis

tic

0.000

0.0

000.

000

0.00

00.

000

0.0

00

0.0

00

0.0

00

0.0

00

Dep

end

ent

Var

iab

le=

Fer

tili

ty(E

xce

llen

tH

ealt

hS

econ

dS

tage

)S

am

eS

exT

win

s0.7

52*

**0.7

54*

**0.

755*

**0.

780*

**0.

781*

**0.

783***

0.8

23***

0.8

31***

0.8

41***

(0.0

30)

(0.0

30)

(0.0

30)

(0.0

51)

(0.0

50)

(0.0

50)

(0.1

05)

(0.1

05)

(0.1

04)

Kle

iber

gen

-Paa

prk

stati

stic

630

.23

637.

6265

4.36

235.

3723

9.03

243.6

761.0

162.5

466.0

2p

-val

ue

of

rkst

atis

tic

0.000

0.0

000.

000

0.00

00.

000

0.0

00

0.0

00

0.0

00

0.0

00

Pan

el

B:

IVR

esu

lts

Dep

end

ent

Var

iab

le=

Sch

ool

Z-S

core

Fer

tili

ty-0

.063

-0.0

63-0

.065

-0.0

18-0

.022

-0.0

26

0.0

96

0.0

89

0.0

84

(0.0

61)

(0.0

60)

(0.0

60)

(0.0

82)

(0.0

82)

(0.0

83)

(0.1

19)

(0.1

16)

(0.1

15)

Ob

serv

ati

ons

61,

267

61,2

6761

,267

47,3

0847

,308

47,3

08

21,3

52

21,3

52

21,3

52

Coeffi

cien

tD

iffer

ence

0.963

0.77

20.

421

0.2

72

0.5

36

0.4

36

Dep

end

ent

Var

iab

le=

Exce

llen

tH

ealt

hF

erti

lity

0.0

03

0.032

0.03

1-0

.020

-0.0

62*

-0.0

61*

0.0

74

-0.0

01

-0.0

04

(0.0

30)

(0.0

25)

(0.0

25)

(0.0

46)

(0.0

37)

(0.0

37)

(0.0

67)

(0.0

55)

(0.0

54)

Ob

serv

ati

ons

70,

277

70,2

7770

,277

53,3

9353

,393

53,3

93

24,3

58

24,3

58

24,3

58

Coeffi

cien

tD

iffer

ence

0.056

0.07

00.

133

0.1

43

0.0

83

0.0

73

Notes:

Ref

erto

note

sin

table

7.

This

table

follow

sid

enti

cal

spec

ifica

tions,

how

ever

now

only

sam

ese

xtw

ins

are

use

das

an

inst

rum

ent

inst

ead

of

all

twin

s.In

the

NH

IS,

66.0

%of

twin

pair

sare

of

the

sam

egen

der

.Sta

ndard

erro

rsare

clust

ered

by

moth

er.∗p<

0.1

;∗∗

p<

0.0

5;∗∗∗p<

0.0

1

A17

Table A10: Results for Kitagawa (2015) Tests with Controls (DHS)

Baseline Socioeconomic Socioeconomicplus Health

Kitagawa Test Statistic 14.559 15.963 16.558Instrumental Validity (p-value) 0.028 0.224 0.462

Coefficient (IV model) -0.013 -0.032 -0.042(0.073) (0.068) (0.068)

Observations 251,831 251,831 251,831

Notes: Results are presented for the Kitagawa (2015) test of instrumental validity. This

test exists for a binary endogenous variable, and as such rather than estimate a model

with fertility as the endogenous variable, we estimate a model with the binary variable

“greater than 2 births” as the endogenous variable. The instrument considered is twinning

at birth order 2. The estimation results of a typical IV model using this specification are

presented and indicated as “IV model”. Instrumental validity can not be proven, but can

be disproven, with low p-values being evidence against instrumental validity. The first row

shows the value for the variance weighted test statistic proposed by Kitagawa (2015), and

the second row displays the p-values associated with the Kitagawa test. Baseline controls

consist of mother year of birth fixed effects, continent fixed effects, child sex, and decade of

birth fixed effects. Socioeconomic controls add indicators for mother’s education (0 years,

1-6 years, 7-11 years, or 12+ years), and Health controls add indicators for overweight

or underweight mothers, and whether the majority of births in the mother’s region were

attended by doctors, nurses or unattended. A trimming constant of 0.07 is used for the

instrumental validity test, (as laid out in Kitagawa (2015)), and 500 bootstrap replications

are run to determine the p-value.

A18

Tab

leA

11:

Mat

ern

alC

har

acte

rist

ics

and

Ch

ild

Ed

uca

tion

al

Ou

tcom

es

Z-S

core

sC

onti

nu

ou

sV

ari

ab

les/

Ind

icato

rs

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

Ed

uc

Ed

uc

Ed

uc

Ed

uc

Ed

uc

Ed

uc

Ed

uc

Ed

uc

Mat

ern

alE

du

cati

on(Z

)0.2

25*

**0.

215*

**(0

.002

)(0

.002

)

BM

I(Z

-Sco

re)

0.088

***

0.07

0***

(0.0

01)

(0.0

01)

Hei

ght

(Z-S

core

)0.

032*

**0.

019*

**(0

.001

)(0

.001

)

Bod

yM

ass

Ind

ex0.

014*

**0.0

15***

0.0

14***

(0.0

00)

(0.0

00)

(0.0

00)

Hei

ght

inC

enti

met

res

0.0

03***

0.0

03***

0.0

03***

(0.0

00)

(0.0

00)

(0.0

00)

Ava

ilab

ilit

yof

Doct

ors

-0.0

58***

(0.0

09)

Ava

ilab

ilit

yof

Nu

rses

0.1

01***

(0.0

08)

No

Pre

nat

al

Care

-0.3

05***

(0.0

10)

R-S

qu

are

d0.1

60.

130.

120.

170.

170.1

60.1

70.1

7O

bse

rvati

ons

1,1

28,

699

1,1

28,6

991,

128,

699

1,12

8,69

91,

128,

699

1,1

28,6

99

1,1

28,6

99

1,1

28,6

99

The

dep

enden

tva

riable

inea

chm

odel

isea

chch

ild’s

standard

ized

educa

tional

att

ain

men

tco

mpare

dw

ith

his

/her

cohort

.T

able

hea

der

s

(Z-S

core

sand

Conti

nuous

Vari

able

s)re

fers

toth

efo

rmof

the

indep

enden

tva

riable

sat

the

level

of

the

moth

er.

All

regre

ssio

ns

are

esti

mate

d

by

OL

S,

and

clust

erst

andard

erro

rsby

moth

er.

Each

spec

ifica

tion

incl

udes

fixed

effec

tsfo

rm

oth

erand

child

age,

tota

lfe

rtilit

y,co

untr

y

and

yea

r,and

fam

ily

wea

lth

quin

tile

.In

colu

mns

5-7

mate

rnal

yea

rof

educa

tion

fixed

effec

tsare

incl

uded

.C

olu

mns

1-4

use

standard

ised

vari

able

sfo

red

uca

tion,

BM

Iand

hei

ght,

wher

eZ

-sco

res

are

const

ruct

edco

mpari

ng

each

moth

erto

those

inher

countr

yand

surv

eyw

ave.

A19

Table A12: Full Survey Countries and Years (DHS)

Survey Year

Country Income 1 2 3 4 5 6 7

Albania Middle 2008Armenia Low 2000 2005 2010Azerbaijan Middle 2006Bangladesh Low 1994 1997 2000 2004 2007 2011Benin Low 1996 2001 2006Bolivia Middle 1994 1998 2003 2008Brazil Middle 1991 1996Burkina Faso Low 1993 1999 2003 2010Burundi Low 2010Cambodia Low 2000 2005 2010Cameroon Middle 1991 1998 2004 2011Central African Republic Low 1994Chad Low 1997 2004Colombia Middle 1990 1995 2000 2005 2010Comoros Low 1996Congo Brazzaville Middle 2005 2011Congo Democratic Republic Low 2007Cote d Ivoire Low 1994 1998 2005 2012Dominican Republic Middle 1991 1996 1999 2002 2007Egypt Low 1992 1995 2000 2005 2008Ethiopia Low 2000 2005 2011Gabon Middle 2000 2012Ghana Low 1993 1998 2003 2008Guatemala Middle 1995Guinea Low 1999 2005Guyana Middle 2005 2009Haiti Low 1994 2000 2006 2012Honduras Middle 2005 2011India Low 1993 1999 2006Indonesia Low 1991 1994 1997 2003 2007 2012Jordan Middle 1990 1997 2002 2007Kazakhstan Middle 1995 1999Kenya Low 1993 1998 2003 2008Kyrgyz Republic Low 1997Lesotho Low 2004 2009Liberia Low 2007Madagascar Low 1992 1997 2004 2008Malawi Low 1992 2000 2004 2010Maldives Middle 2009Mali Low 1996 2001 2006Moldova Middle 2005Morocco Middle 1992 2003Mozambique Low 1997 2003 2011Namibia Middle 1992 2000 2006Nepal Low 1996 2001 2006 2011Nicaragua Low 1998 2001

A20

Niger Low 1992 1998 2006Nigeria Low 1990 1999 2003 2008Pakistan Low 1991 2006Paraguay Middle 1990Peru Middle 1992 1996 2000Philippines Middle 1993 1998 2003 2008Rwanda Low 1992 2000 2005 2010Sao Tome and Principe Middle 2008Senegal Middle 1993 1997 2005 2010Sierra Leone Low 2008South Africa Middle 1998Swaziland Middle 2006Tanzania Low 1992 1996 1999 2004 2007 2010 2012Togo Low 1998Turkey Middle 1993 1998 2003Uganda Low 1995 2000 2006 2011Ukraine Middle 2007Uzbekistan Middle 1996Vietnam Low 1997 2002Yemen Low 1991Zambia Low 1992 1996 2002 2007Zimbabwe Low 1994 1999 2005 2010

Notes: Country income status is based upon World Bank classifications described

at http://data.worldbank.org/about/country-classifications and available for download at

http://siteresources.worldbank.org/DATASTATISTICS/Resources/OGHIST.xls (consulted 1 April,

2014). Income status varies by country and time. Where a country’s status changed between DHS

waves only the most recent status is listed above. Middle refers to both lower-middle and upper-middle

income countries, while low refers just to those considered to be low-income economies.

A21

B Data Definitions

All outcome and control variables used in principal IV and OLS analyses are described in the following table.As well as variable definitions, units and any functional forms are indicated, which refer to the way variablesenter IV or OLS models.

Table A13: Variable Definitions

Variable Definition

Panel A: DHS DataSchool Z-score Z-score of years of schooling, standardised relative to country and year of birth

cohort.Male Child Binary measure, one for boy, zero for girlsCountry Fixed effect for country of surveyYear of Birth Fixed effect for year of birthChild’s Age Fixed effect for child’s ageContraceptive Intent Fixed effect for mother’s use of contraceptive methodsMother’s Age Fixed effect for mother’s age at child birthMother’s Age at First Birth Inferred from age at survey time and age of childMother’s Education Fixed effect for total years of education achievedFamily Wealth Fixed effect for DHS-assigned wealth quintile. Where not recorded a separate

fixed effect for “no wealth quintile” is includedMother’s Height Measured in centimetresMother’s BMI Measured in units (weight in kilograms divided by height in metres squared)Prenatal Doctor Availability Proportion of births in the same DHS cluster which received a prenatal check-

up from a doctorPrenatal Nurse Availability Proportion of births in the same DHS cluster which received a prenatal check-

up from a nurseNo Prenatal Care Proportion of births in the same DHS cluster which received no prenatal check-

ups from health professionalsPanel B: NHIS DataEducation Z-Score Z-score of grade progression, standardised relative to month and year of birth

cohortExcellent Health Indicator of whether a child is classified by the family as being in “excellent

health” (chosen from a categorical list)Male Child Binary measure, one for boy, zero for girlsSurvey Year Fixed effect for year NHIS wave was runChild Age Fixed effect for age at interview in months and yearsRegion Fixed effect for census bureau region of residenceMother’s Race Fixed effect for mother’s raceMother’s Age Fixed effect for mother’s age in yearsMother’s Age at First Birth Inferred from age at survey time and age of childMother’s Education Fixed effects for mother’s highest completed year of educationMother’s Health Status Self-reported based on categorical listMother’s Height Mother’s Height in InchesSmoking Status Binary variable indicating whether the mother smoked prior to pregnancySmoking Status Missing Binary variable indicating no response to the mother’s smoking status

A22

C Testing for Equality of Coefficients Between IV Models

When estimating subsequent IV models with the progressive inclusion of controls to capture maternal se-lection, our point is really that column 1 (“Base”) is not distinguishable from 0, while column 3 (“+S&H”)often is, as this is the important thing in considering the literature and in showing that partial bias adjustmentrecovers the trade-off. We have nevertheless added a formal test of coefficients between IV models in all IV ta-bles. This is added as a row called “Coefficient Difference” at the bottom of Tables 6 and 7. This computationis not entirely trivial, as these tests must take account of correlations between variance-covariance matricesof each IV regression in the style of seemingly unrelated regression. Thus, we calculate these test statisticsby jointly estimating the models with GMM (seemingly unrelated regression is an Feasible Generalised LeastSquares technique, and hence not suitable for IV models). To do this we form two equations which are thetwo models we wish to compare in the following format:

qualityij = b0 + b1 × quantityj + baseline′ij × bb (A1)

qualityij = c0 + c1 × quantityj + baseline′ij × cb + health′ij × ch. (A2)

Our goal is to test the equality of coefficients b1 = c1. Given that we are using instruments for endogenousquantity (fertility) in each case, we can thus form the following population moment conditions which holdunder the null of instrumental validity in each case (ie, replicate the specifications we are estimating in thepaper):

twin′i(qualityij − b0 − b1 × quantityj − baseline′ij × bb) = 0 (A3)

twin′i(qualityij − c0 − c1 × quantityj − baseline′ij × cb − health′ij × ch) = 0. (A4)

Using the sample analogues of these moments, we can then estimate the parameters b and c via GMM.Denoting the two moments as the 2 element vector g(bc), we then estimate the parameters b and c using theGMM objective function J(bc) = ng(bc)′Wg(bc). An unadjusted weight matrix is used which assumes thatthe moment conditions are independent, which replicates all parameters and standard errors from the originalIV model, but now the estimates can be formally tested for equality against one-another using a χ2 testwhich also considers the correlation between the observations in the two models when estimating the eventualvariance-covariance matrix.

D Loosening the Linear Effect Specification of the Q–Q Trade-off

Theoretical statements of the QQ model tend to assume, for simplicity, that all children in a family have thesame endowments and receive the same parental investment. More recent work (for example the theoreticalwork of Aizer and Cunha (2012) and empirical papers by Rosenzweig and Zhang (2009); Brinch et al. (2017);Mogstad and Wiswall (2016); Bagger et al. (2013) relax this assumption. Among other things, this allows forreinforcing or compensating behaviours in parental investment choices (Almond and Mazumder, 2013). Thisimplies allowing the coefficient β1 to vary across children in the family.1

Using DHS data for which we have sufficient power to split instruments, we re-estimate our regressionsfollowing the non-linear marginal fertility models of Brinch et al. (2017); Mogstad and Wiswall (2016), andfind that as is the case with the linear models reported in Tables 6 and 7, the inclusion of twin predictors nearlyuniversally increases the size of the estimated QQ trade-off in non-linear models, and in some specifications,the trade-off is statistically significant. Thus the emergence of a trade-off following partial correction for twinnon-randomness is not sensitive to functional form and, in particular, holds when the impact of fertility isallowed to vary by parity.

1In this paper we focus nearly exclusively on the internal validity of twins estimates (IV consistency). In recent work, Aaronsonet al. (2017) examine the external validity of twin instrumented estimates of the impact of fertility on female labour supply.

A23

As laid out in Mogstad and Wiswall (2016), this consists of the following 2SLS procedure (for the two-plussample):

quantitysj = λs2twin2j + λs3twin∗3j + λs4twin

∗4j + λs5twin

∗5j +XλXs + νsj , for s = 2, 3, 4, 5 (A5)

qualityij = β0 + β1 quantity2j + β1 quantity3j + β1 quantity4j + β1 quantity5j +XβX + ηij , (A6)

where (A5) is a series of first stages for the likelihood effect of moving from the sth to (s+ 1)th child, and (A6)is the second stage estimate of the effect of an additional child after s births on the human capital of the firstborn child. As the estimation sample consists of families with at least two births, twin2j : a binary variablefor a twin at the second birth, is defined for all families. However, when moving to higher birth orders, twin3jis not defined for families with only two births. We thus follow Mogstad and Wiswall (2016) in replacinghigher-order twin birth indicators with:

twin∗cj =

{0, if cj < c

twincj − E[twincj |Xj , cj ≥ c], if cj ≥ c

where, as described in Mogstad and Wiswall (2016) E[twincj |Xj , cj ≥ c] is a non-parametric estimate ofthe conditional mean of the probability of twin birth in the non-missing subsample. We similarly followMogstad and Wiswall (2016) in considering family sizes up to 6 children. The above specification (A5 andA6) is estimated for the two-plus sample, however we also estimate analogous specifications for the three-plussample, and four-plus sample, where in each case we only consider the marginal impacts of fertility at birthorders greater than the birth orders of the children included in the estimation sample.

As our interest is in examining the impact of non-random twin births, we estimate the above specificationsin two circumstances: the first, following exactly the procedure laid out in Mogstad and Wiswall (2016) wheretwins are assumed to be exogenous, and the second where we additionally control for observable health andsocioeconomic predictors of twins in A5 and A6. These results are presented and discussed in Section 5.2.3 ofthe paper.

A24

Table A14: First Stages for Non-Linear IV Estimates

Instrument twin2j twin∗3j twin∗4j twin∗5j

Panel A: Two Plus SampleSiblings ≥ 2 0.296*** 0.213*** 0.121*** 0.032***

(0.005) (0.012) (0.010) (0.007)Siblings ≥ 3 -0.011 0.429*** 0.189*** 0.077***

(0.008) (0.007) (0.013) (0.010)Siblings ≥ 4 -0.014* -0.012 0.525*** 0.174***

(0.008) (0.017) (0.008) (0.017)Siblings ≥ 5 -0.001 -0.023 -0.009 0.653***

(0.009) (0.021) (0.034) (0.012)

Panel B: Three Plus SampleSiblings ≥ 3 0.393*** 0.199*** 0.075***

(0.004) (0.009) (0.007)Siblings ≥ 4 0.014 0.518*** 0.186***

(0.009) (0.006) (0.011)Siblings ≥ 5 -0.007 0.008 0.645***

(0.011) (0.020) (0.008)

Panel C: Four Plus SampleSiblings ≥ 4 0.480*** 0.190***

(0.004) (0.009)Siblings ≥ 5 0.009 0.634***

(0.012) (0.005)

Each row reports the first stage estimate of the number of children on

twin births from the IV regressions displayed in table 8. In each case

we report the first stages for the baseline specification of the Non-

Linear IV, although results are quantitatively similar in the case

of the +S&H specification. Standard errors are clustered by fam-

ily(three plus and four plus samples), or robust to heteroscedasticity

when only one child from each family is included in the regressions

(two plus sample).

A25

E Estimating Values for γ

We propose a number of methods of arriving to a non-arbitrary prior regarding γ in the Conley et al. (2012)method, where γ is the violation of the exclusion restriction when using twins as an instrument in the QQmodel. From equation 4, γ represents the conditional effect of being born of a twin mother on child quality:

γ =∂qualityij∂twinj

∣∣∣∣X

In practice, bounds identification based on γ only pushes the identification problem back by one step, asconsistent bounds rely on having an unbiased estimate of γ, which is not trivial. In this appendix we firstdiscuss a proposed manner to causally estimate γ, and then present a number of consistency checks based onthe data used in the QQ models of the paper which support estimated values of γ

So as to obtain a consistent estimate of γ, albeit from different samples, we exploit quasi-experimentalchanges in maternal health (healthj) and use these to obtain consistent estimates of the impact of maternalhealth on (a) child quality and (b) the probability of a twin birth. We then ‘scale’ the first by the second.First, we estimate

∂qualityij∂healthj

∣∣∣∣X

= φq.

Under the assumption that the change in health is quasi-experimental, this is a causal estimate of a 1 unitchange in healthj on child quality. Since γ is the effect of maternal health scaled by the difference in healthbetween twin and non-twin mothers we also estimate:

∂healthj∂twinj

∣∣∣∣X

= φt.

With these two parameters in hand, we obtain a causal estimate of γ as:

γ =∂qualityij∂twinj

∣∣∣∣X

=∂qualityij∂healthj

∣∣∣∣X

× ∂healthj∂twinj

∣∣∣∣X

= φq × φt. (A7)

As it involves the estimated quantities φq and φq, γ will be subject to sampling uncertainty: γ = φq × φt.Thus, the estimate γ will have a distribution. If we can estimates both γ and its distribution, this gives usthe consistent prior for the full distribution of γ required in Conley et al.’s LTZ approach. We estimate thedistribution using resampling (bootstrap) methods, using which we can compare the analytical distributionwith a series of known distributions2, or indeed use the analytical distribution of γ directly in the boundsestimate of β1.

3 We provide a summary of the assumptions underlying our bounds estimates and evidence intheir favour in appendix E.4 and a full description of the resampling process in appendix E.5.

Implementing this approach imposes fairly strong data requirements. We require data that capture differen-tial exposure of women to a quasi-experimental change in their pre-pregnancy health, together with measuresof the quality of their children. In addition, we need information on the prevalence of twin births in this sampleof women. In the following subsections, we describe two studies, one set in the United States, and the other inNigeria, which offer a large and representative sample of women with birth data and intergenerational linkage,and in which we observe the incidence of a quasi-experimental shock to maternal health. In the United Statesthe shock is the introduction of antibiotics in 1937 and in Nigeria it is the Biafra war that raged through1967-1970. We show how we exploit these cases to estimate γ and its distribution. We observe that bounds

2If, for example, we determine that γ is normally distributed, estimation then proceeds by imposing the prior distribution forγ as: γ ∼ N (µγ , σ

2γ).

3Conley et al. (2012) discuss a simulation-based algorithm (p. 265) for estimation which can be used given any prior, includingnon-normal priors, for the distribution of γ. In practice, our preferred estimates are based on the entire empirical distribution,and we proceed using Conley et al. (2012)’s suggested simulation method.

A26

estimates of this type are necessarily case specific (see, for instance, the examples provided in the Conley et al.paper) so, although our approach is of general interest in suggesting a process for bounding when violation ofthe exclusion restriction is small, the estimates produced here are only representative of the cases examined.

E.1 Estimating γ: A case from the United States

The first antibiotics, sulfonamide drugs, were introduced across the United States in 1937, following clinicaltrials in London and New York and there was nothing else on the stage until penicillin was introduced duringthe Second World War. There was immediate and widespread uptake and the drugs were hailed as a “miracle”(Lesch, 2006). Their arrival was associated with a sharp drop in a range of infectious diseases that weretreatable by these drugs (Jayachandran et al., 2010). In particular, pneumonia, the leading cause of deathamong children after congenital causes, fell sharply and this decline was largest among infants (Bhalotra andVenkataramani, 2014). Although there are no direct measures of the adult health of individuals exposed tothe antibiotics at birth, it is plausible that infant health improvements persist and generate improvements inadult health; some evidence of this is in (Almond et al., 2011; Butikofer and Salvanes, 2015; Hjort et al., 2016;Bhalotra et al., 2015).

What is pertinent for our purposes is whether any improvements in the adult health of women are such asto influence the quality of their children.4 We therefore estimate this reduced form using the identificationstrategy of Bhalotra and Venkataramani (2014) but with outcomes of the children of exposed women ratherthan the outcomes of the women themselves as dependent variables. Identification exploits the timing of thisshock to health at birth together with the fact that the largest drops in pneumonia occurred in states with thehighest initial burdens of disease. This assumes that states with high vs low burdens of pneumonia did nothave different trends in the outcomes before the introduction of antibiotics. To demonstrate that this is thecase we estimated an event study (see Figure A12).5

Let m signify the mother, and m + 1 signify her children. Using the United States micro-census files, weestimate:

qualitym+1stc = α+ φq1(Postt × basePneumonia

ms ) + θrs + ηrt + ϕXm

st + λrc + (θs × t) + εstc (A8)

where φq1 is an estimate of the change in child quality associated with the mother’s exposure to antibiotics in herinfancy. The pre-intervention mean pneumonia mortality rate at the state level, s, is denoted basePneumoniamsand interacted with (Postt), which indicates birth cohorts 1937 and after. We control for race-specific fixedeffects for census year t, mother’s birth cohort c, and mother’s birth state s as well as state-specific lineartime trends. The coefficient of interest is of similar size and significance conditional upon the state and timevarying controls (health and education infrastructure, state income) and upon a vector of rates of mortalityfrom control diseases (diseases not treatable with sulfonamides) interacted with the indicator post.

The second step is to estimate the association of the health shock experienced by women at their birth withthe probability that they have a twin birth. This is an experimental analogue of the twin non-randomnessassociations we present in the paper. We take the conditional average rate of baseline pneumonia in the stateof residence for all women who give birth to a twin, and the similar conditional rate for non-twin mothers,using the same controls as in equation A8. In other words, we calculate

φt1 = bPstwinj=1|X − bPstwinj=0|X =∂bPs∂twinj

∣∣∣∣X

.

4Results from (Bhalotra and Venkataramani, 2014) show that on a range of outcomes, scarring dominates selection – ie Sulfaexposure improves all socioeconomic outcomes. This suggests that selection due to survival of weaker births is small, and that thearrival of Sulfa drugs is appropriately viewed as a positive health shock.

5Bhalotra and Venkataramani (2014) demonstrate parallel trends for first generation outcomes; we demonstrate this for secondgeneration outcomes.

A27

In view of our findings related to twin selection, our expectation is that women with lower exposure topneumonia at birth will be more likely to have twins, and hence φt1 < 0.

As discussed, with these two quantities in hand, we can estimate γ by taking their product:

φq1 × φt1 =

∂qualityij∂bPs

× ∂bPs∂twinj

∣∣∣∣X

=∂qualityij∂twinj

∣∣∣∣X

= γUS . (A9)

We can plug this into our estimates of the bounds on β1 using following Conley et al. (2012), as describedearlier.

E.2 Estimating γ in Nigeria

Since we shall proceed to analyse alternative estimators of the QQ trade-off in developing countries andnot only in the US, we obtained an estimate of γ from Nigeria. Here, we exploit the exposure of individualsthrough their growing years to the Nigerian civil war. This was the first modern war in sub-Saharan Africaafter independence and one of the bloodiest. It raged in Biafra, the secessionist region in the South-East ofNigeria from 6 July 1967 to 15 January 1970, killing between 1 to 3 million people and causing widespreadmalnutrition and devastation. The war created a virtual famine in the Southeast, where it was fought, andthe effects of under-nutrition were potentially reinforced by trauma and the increased incidence of infections.Akresh et al. (2012, 2016) investigate long run effects of war exposure, exploiting the differential exposure ofthe Christian Igbo community resident in Biafra relative to other ethnic groups (in other states), interactedwith the timing of the war. They show that women exposed to the war were shorter as adults, and more likelyto be over-weight. As height and obesity are measures of health, they thus establish that the war was a shockto maternal health. We use their identification strategy to estimate impacts on children’s education of themother being exposed to the war in utero, using a continuous measure for the number of months exposed.

The estimated equation is:

qualitym+1ites = α+ φq2war

mte + αt + θe + λs + µet+ uites (A10)

for woman i of ethnicity e born in year t and state s. The indicator of quality is a z-score (standardizedby age and gender) for the years of education of children in generation m + 1 and φq2 is the reduced formeffect on this of the maternal health shock created by the war. Analogous to the US case, we thus estimate

φt2 = wartwin=1−wartwin=0 = ∂war∂twin

∣∣∣∣X

, so that we can estimate γ, the twin-mediated effect of maternal health

on child-quality as:

φq2 × φt2 =

∂qualityij∂wars

× ∂wars∂twinj

∣∣∣∣X

=∂qualityij∂twinj

∣∣∣∣X

= γNigeria. (A11)

E.3 Estimated Values for γ in US and Nigeria

The United States. In panel A, we use quasi-experimental variation in the exposure of women to antibioticsin their birth year in early twentieth century America to estimate impacts of mother’s health on children’seducation, cast as a Z-score, with the standardization using the birth cohort distribution. Following equationA8 (and Bhalotra and Venkataramani (2014)), we estimate that the reduced form effect of the mother’sexposure is an increase in the child’s completed education of 4.97% of a standard deviation, or approximately0.15 years of education.6 This estimate is the quantity φq1 in equations A8 and A9. In the second column,

6The results from Bhalotra and Venkataramani (2014) suggest that exposure to sulfa drugs increased schooling of the firstgeneration (the mothers) by 0.7 years. Our estimates suggest that the trickle down to the next generation was smaller (by morethan a factor of four), but still significant.

A28

we show estimates that imply that, conditional upon health and fertility controls, mothers who produce twinbirths are, on average, in states with 12.5% lower rates of pneumonia. This augments the evidence presentedin twin non-randomness tests of this paper, adding a further case of twin births being a function of healthconditions.

Following equation A9, in column 3 we interact φq and φt to form a consistent estimate for γ of 0.62% of astandard deviation. Bootstrapping this distribution results in an estimated variance of 0.0027. The empiricaldistribution estimated from 100 bootstrap replications is displayed in Figure A13a, overlaid with an analyticalnormal distribution with the same mean and variance. When comparing our estimate of γ to IV estimatesdiscussed in section 5.2, we see that the direct effect of having a (healthier) twin mother on child quality(the violation) is considerably smaller than the point estimates of the effect of fertility on child outcomes (theparameter of interest). While it is reassuring that the violation of the exclusion restriction is estimated as small,in that it implies that the instrument is “close to” being exogenous (in Conley et al. (2012)’s terminology),the evidence we provide shows that it is nevertheless sufficient to generate substantively different conclusionsregarding the QQ trade-off.

Nigeria. We repeat the procedure for estimating the violation of the exclusion restriction using quasi-experimental variation in the mother’s foetal exposure to the Biafra war that was fought in Nigeria in 1967-1970.Results are in panel B of Table A15. The first column presents an estimate of φq2 from equation A10. Chil-dren of mothers exposed to the war in utero have 1.54% of a standard deviation less education, equivalentto 0.052 years (compared to children of mothers unexposed to the war in utero).7 The second column showsthat, on average, twin mothers come from states and cohorts that are 26.7% less likely to have suffered war.Together these estimates imply a positive estimate for γ of 0.4% of a standard deviation in education, notdissimilar to the value estimated using a different shock to maternal health in early twentieth century America.The bootstrapped distribution of γ based on 100 replications is displayed in Figure A13b (bootstrap variance0.0022).

E.4 Assumptions and Evidence Underlying the Calculation of Plausibly Exogenous Bounds

The calculation of bounds using Conley et al. (2012)’s plausibly exogenous methodology relies on a number ofassumptions relating to the exclusion restriction. We provide a full list of these assumptions, their precedence(be it from Conley et al. (2012)’s methodology or our extension to estimating γ and its empirical distribution),and supporting evidence for each.

1. There exists prior information that implies γ (the violation of the exclusion restriction) is near 0 butperhaps not exactly 0. Precedence: Conley et al. (2012), p. 262. Evidence: Tables 1-2 of our paperdocuments that twins occur more frequently to healthy women. This renders the exclusion restrictionon which the twin-IV rests invalid if, in addition, earlier children of healthy women are higher qualitychildren. Nevertheless, it is unlikely that the violation of the exclusion restriction is very large given thatmaternal health is only a small part of the production function of child quality.

2. The prior which is assumed for γ ∼ F is correct. Precedence: Conley et al. (2012), p. 265. Evidence:Refer to subpoints below.

(a) The average value of γ for a particular context can be estimated using a single maternal healthshock as a mediator to examine both elements of the violation of the exclusion restriction (twinnon-randomness and the effect of maternal health on child quality). Precedence: This paper,equation A7. Evidence: the particular maternal health shock examined is a common factor in botheffects. In the simplest case, if we scale a maternal health shock by a fixed parameter (for example

7This is not directly relevant here but, again, notice that the second-generation effect is smaller than the impact on the firstgeneration, which is 0.6 years of education (Akresh et al., 2016).

A29

Table A15: Estimates of γ Using Maternal Health Shocks

∂Educ∂Health

∂Health∂Twin γ = ∂Educ

∂Twin γ (bootstrap)

Panel A: United StatesEstimate 0.0497*** 0.125*** 0.0062 0.0062

(0.0181) (0.0181) (0.0027)

Observations 943,038 943,038R-squared 0.011 0.069

Panel B: NigeriaEstimate -0.0154** -0.267** 0.0040 0.0040

(0.00637) (0.00637) (0.0022)

Observations 26,205 26,205R-squared 0.022 0.991

Notes: Regression results for panel A use the 5% sample of 1980 US census data and follow the

specifications in Bhalotra and Venkataramani (2014). Regression results from panel B are based

on all Nigerian DHS data in which children can be linked to their mothers. Specifications and

samples are identical to those described in Akresh et al. (2012). The estimate of γ is formed by

taking the product of the column 1 and column 2 estimates. A full description of this process,

along with the non-pivotal bootstrap process to estimate the standard error of γ is provided in

this Appendix.

Figure A12: Test of Parallel Trends of Second Generation Sulfa Effects for γ

−.2

−.1

0.1

.2E

stim

ate

d S

choolin

g E

ffect

1930 1935 1940 1943Year

Point Estimate 95 % CI

Note to Figure A12: Graph replicates specification (A8), however now interacting basePneumonia with each mother’s birth year,rather than a single Post dummy starting from 1937. Each coefficient and confidence interval displays the differential effect of achild’s mother being born in a high- or low-pneumonia state by birth year surrounding the sulfa reform. The year preceding thearrival of sulfa reform is omitted (1936) and post sulfa estimates and confidence intervals represent the differential impact of sulfadrugs on second generation (educational) outcomes of children of affected women. Standard errors are clustered by state.

A30

Figure A13: Bootstrap Estimates of γ

0

.05

.1

.15

Fra

ction

0 .005 .01 .015 .02

Empirical Distribution

Analytical Distribution

(a) United States (Sulfonamide)

0

.05

.1

.15

Fra

ction

−.005 0 .005 .01 .015

Empirical Distribution

Analytical Distribution

(b) Nigeria (Biafra Civil War)

Notes to Figure A13: The empirical distribution is generated by performing J=100 bootstrap replications to estimate φt andφq for each of Nigeria and USA (see complete discussion in section 3.2). The overlaid analytical distribution in each figure is anormal distribution ∼ N (µγ , σγ). The estimates for φt and φq and γ are displayed in Table A15.

considering the effect of being exposed to a 1% reduction in rates of pneumonia or the effect ofbeing exposed to a 10% reduction in pneumonia) these scale effects will be perfectly canceled out inthe numerator and denominator of equation A7. To the degree that a large or small health shockimpacts maternal health and rates of twinning by a similarly large or small amount, the particularmediator used will produce an identical value for γ. This assumption would be violated if differenthealth shock have different relative effects on twinning and on child quality, for example a shockwhich is particularly important for child quality but not for twinning. We return to this point inthe caveat below.

(b) The true distribution of γ around its mean can be approximated by a resampling algorithm. Prece-dence: Conley et al. (2012), p. 265. Evidence: Conley et al. (2012) demonstrate that a simulation-based estimate for the confidence intervals of β can be generated based on resampling of the un-derlying distribution of interest. In this paper we propose the use of an analytical distribution.This follows if we view our sample of data as the population, and resample from the population,as is typical in bootstrap methods. In both cases (USA and Nigeria) our resampling is based on arepresentative sample of the full population of mothers, leading to a valid bootstrapped distribution.

Caveat: If the above assumptions are not met, particularly assumption 2 or any of its parts, our estimateof the bounds on β will no longer be correct. However, as Conley et al. (2012) point out:

“It [this method] will produce valid frequentist inference under the assumption that the prior iscorrect and will provide robustness relative to the conventional approach (which assumes γ ≡ 0)even when incorrect.”

In the case that the above assumptions are not correct, we provide a full set of bounds over a wider range ofvalues in Figures 2 and 3, to determine the robustness of bounds estimates to (even non-conservative) changesin assumptions of γ.

A31

E.5 Bootstrap Confidence Intervals

The methodology to estimate γ in equations (A9) and (A11) is described in previous sub-sections of thisAppendix. In the case of Conley et al.’s UCI approach, this estimate is then sufficient to produce boundson β1, assuming that: γ ∈ [0, 2γ]. We scale γ by the factor of 2 in order for this value to fall precisely inthe middle of the range. Conley et al. (2012) provide a similar example to calculate the returns to educationusing the UCI approach. In the case of the more precise LTZ approach (our preferred method) the logic issimilar, however now we must form a prior over the entire distribution of γ. Calculating the variance of γis not as straightforward as using the variance-covariance matrix corresponding to each of the estimates φt

and φq. In this case however we can use bootstrapping to calculate J replications of φt × φq, and from theseestimates construct an estimated distribution of γ, which allows us to determine our prior for the distributionof γ. From this empirical distribution, we observe the estimated mean and standard deviation, and finally testwhether the distribution is normal using a Shapiro Wilk test for normality. We also use Kolmogorov-Smirnovtests for equality of distributions to test whether the distribution is more likely to be log normal, uniform,and a number of other known analytical distributions. In order to do this, we first estimate the empiricaldistribution as described previously. We then observe the mean µ and the standard deviation σ, and run aone-sample test to determine whether the observed empirical distribution is is significantly different to eachanalytical distribution N (µ, σ2), U(µ, σ2) or lnN (µ, σ2).

Estimates of the full distribution of γ are presented in Figures A13a and A13b. These are the estimatedγj from j ∈ {1, . . . , 100} bootstrap replications for γ in Nigeria and the United States. In all cases, when theunderlying empirical distribution is tested for equality against the overlaid analytical distribution (uniform,normal, log normal, χ2), the normal distribution provides the best fit of the analytical with the empiricaldistribution.8

However, the underlying distribution appears to not be perfectly normal, and it appears doubtful that thiswould be the case asymptotically. Fortunately, Conley et al. (2012) describe a simulation-based estimationmethod to calculate γ in the case of a non-normal distribution for γ. We have followed this methodology usingthe empirical distribution calculated bootstrapping for γ. This code has been publicly released as plausexog

for Stata (Clarke, 2014). The simulation-based estimation procedure is described fully in Conley et al. (2012)p. 265 as a five step algorithm. The procedure consists of taking repeated draws from the variance-covariancematrix estimated using IV with the plausibly exogenous instrument, and in each case adding to it a draw fromthe distribution of γ, scaled by a quantity which depends on the strength of the instrument. Conley et al. referto the underlying distribution of γ as F , and the scale parameter as A, where A = (X ′Z(Z ′Z)−1Z ′X)−1(X ′Z).These repeated draws then lead to a large number of estimates for β, the parameter of interest, and a 95%confidence interval is taken by forming [β − c1−α/2, β + cα/2], where c are percentiles of the distribution ofsimulated estimates.

Thus, as well as estimating the LTZ case where we assume that γ is distributed ∼ N (µγ , σ2γ), we can estimate

a version fully utilizing the bootstrapped distribution of γ described in the previous sub-section. In this case,we use as F , the distribution of γ, the empirically estimated distribution of γ. The simulation based algorithmthen consists of taking b ∈ 1, . . . , B draws from the empirically estimated F , as well as B draws from thevariance-covariance matrix, and defining the 95% confidence interval based on the 2.5 and 97.5% quintiles ofthe resulting simulated values for β.

8In the US, We cannot reject that γ is normal with a p-value of 0.782. In this case, although we can’t reject that γ is lognormal, the p-value is much lower, at 0.203. Values for Nigeria suggest a quantitatively similar result.

A32

References

D. Aaronson, R. Dehejia, A. Jordan, C. Pop-Eleches, C. Samii, and K. Schulze. The Effect of Fertility onMothers’ Labor Supply over the Last Two Centuries. IZA Discussion Papers 10559, Institute for the Studyof Labor (IZA), Feb. 2017.

A. Aizer and F. Cunha. The Production of Human Capital: Endowments, Investments and Fertility. NBERWorking Papers 18429, National Bureau of Economic Research, Inc, Sept. 2012.

R. Akresh, S. Bhalotra, M. Leone, and U. Osili. War and Stature: Growing Up During the Nigerian CivilWar. American Economic Review (Papers & Proceedings), 102(3):273–77, 2012.

R. Akresh, S. Bhalotra, M. Leone, and U. Osili. First and Second Generation Impacts of the Nigeria-BiafraWar. Mimeo, 2016.

D. Almond and B. Mazumder. Fetal Origins and Parental Responses. Annual Review of Economics, 5(1):37–56, 05 2013.

D. Almond, H. W. Hoynes, and D. W. Schanzenbach. Inside the War on Poverty: The Impact of Food Stampson Birth Outcomes. The Review of Economics and Statistics, 93(2):387–403, May 2011.

J. Angrist, V. Lavy, and A. Schlosser. Multiple experiments for the causal link between the quantity andquality of children. Journal of Labor Economics, 28(4):pp. 773–824, 2010.

J. Bagger, J. A. Birchenall, H. Mansour, and S. Urza. Education, Birth Order, and Family Size. NBERWorking Papers 19111, National Bureau of Economic Research, Inc, June 2013.

S. Bhalotra and A. Venkataramani. Shadows of the Captain of the Men of Death: Early Life Health Interven-tions, Human Capital Investments, and Institutions. Mimeo, University of Essex, 2014.

S. Bhalotra, M. Karlsson, and T. Nilsson. Infant Health and Longevity: Evidence from a Historical Trial inSweden. Discussion Paper 8969, IZA, April 2015.

S. E. Black, P. J. Devereux, and K. G. Salvanes. The more the merrier? the effect of family size and birthorder on children’s education. The Quarterly Journal of Economics, 120(2):669–700, 2005.

C. Brinch, M. Mogstad, and M. Wiswall. Beyond LATE with a Discrete Instrument. Journal of PoliticalEconomy, 125(4):985–1039, 2017.

A. Butikofer and K. G. Salvanes. Disease Control and Inequality Reduction: Evidence from a TuberculosisTesting and Vaccination Campaign. Discussion Paper 28/2015, NHH Dept. of Economics, November 2015.

J. Caceres-Delpiano. The impacts of family size on investment in child quality. Journal of Human Resources,41(4):738–754, 2006.

D. Clarke. PLAUSEXOG: Stata module to implement Conley et al’s plausibly exogenous bounds. StatisticalSoftware Components, Boston College Department of Economics, May 2014.

T. G. Conley, C. B. Hansen, and P. E. Rossi. Plausibly Exogenous. The Review of Economics and Statistics,94(1):260–272, February 2012.

E. Fitzsimons and B. Malde. Empirically probing the quantity-quality model. Journal of Population Economics,27(1):33–68, Jan 2014.

J. Hjort, M. Sølvsten, and M. Wust. Universal Investment in Infants and Long-run Health. Mimeo, TechnicalReport, 2016.

A33

S. Jayachandran, A. Lleras-Muney, and K. V. Smith. Modern Medicine and the 20th-Century Decline inMortality: Evidence on the Impact of Sulfa Drugs. American Economic Journal: Applied Economics, 2(2):118–46, 2010.

T. Kitagawa. A test for instrument validity. Econometrica, 83(5):2043–2063, 2015.

J. E. Lesch. The First Miracle Drugs: How the Sulfa Drugs Transformed Medicine. Oxford University Press,Oxford, 2006.

H. Li, J. Zhang, and Y. Zhu. The quantity-quality trade-off of children in a developing country: Identificationusing Chinese twins. Demography, 45:223–243, 2008.

M. Mogstad and M. Wiswall. Testing the Quantity-Quality Model of Fertility: Linearity, Marginal Effects,and Total Effects. Quantitative Economics, 7(1):157–192, 2016.

V. Ponczek and A. P. Souza. New Evidence of the Causal Effect of Family Size on Child Quality in a DevelopingCountry. Journal of Human Resources, 47(1):64–106, 2012.

M. R. Rosenzweig and J. Zhang. Do population control policies induce more human capital investment? twins,birth weight and China’s one-child policy. Review of Economic Studies, 76(3):1149–1174, 2009.

A34

Date post:	30-Sep-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

DICION PAPER ERIEftp.iza.org/dp11878.pdf · IZA DP No. 11878 Sonia Bhalotra Damian Clarke The Twin...

Documents