+ All Categories
Home > Documents > Risky Business: Correlation and Causation in Longitudinal ...

Risky Business: Correlation and Causation in Longitudinal ...

Date post: 13-Feb-2022
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
14
Risky Business: Correlation and Causation in Longitudinal Studies of Skill Development Drew H. Bailey, Greg J. Duncan, and Tyler Watts University of California, Irvine Doug H. Clements and Julie Sarama University of Denver Developmental theories often posit that changes in children’s early psychological character- istics will affect much later psychological, social, and economic outcomes. However, tests of these theories frequently yield results that are consistent with plausible alternative theories that posit a much smaller causal role for earlier levels of these psychological characteristics. Our article explores this issue with empirical tests of skill-building theories, which predict that early boosts to simpler skills (e.g., numeracy or literacy) or behaviors (e.g., antisocial behavior or executive functions) support the long-term development of more sophisticated skills or behaviors. Substantial longitudinal associations between academic or socioemotional skills measured early and then later in childhood or adolescence are often taken as support of these skill-building processes. Using the example of skill-building in mathematics, we argue that longitudinal correlations, even if adjusted for an extensive set of baseline covariates, constitute an insufficiently risky test of skill-building theories. We first show that experi- mental manipulation of early math skills generates much smaller effects on later math achievement than the nonexperimental literature has suggested. We then conduct falsification tests that show puzzlingly high cross-domain associations between early math and later literacy achievement. Finally, we show that a skill-building model positing a combination of unmeasured stable factors and skill-building processes can reproduce the pattern of experi- mental impacts on children’s mathematics achievement. Implications for developmental theories, methods, and practice are discussed. Keywords: early childhood, interventions, skill-building, cognitive development, education Supplemental materials: http://dx.doi.org/10.1037/amp0000146.supp Developmental theories often posit that changes in chil- dren’s early psychological characteristics will affect their much later psychological, social, and economic outcomes. Such theories include skill-building theories (e.g., Baroody, 1987; Cunha & Heckman, 2007; Stanovich, 1986), theories of the life-course development of psychopathology (e.g., Moffitt, 1993), theories that posit reciprocal effects between children and their environments (e.g., Scarr & McCartney, 1983), and theories of early critical periods in children’s social and cognitive development (Fraley & Roisman, 2015). Tests of these theories are often conducted by estimating correlations between important outcomes and children’s early psychological characteristics that have been adjusted by statistical controls for variables that might affect both early and later child characteristics. These findings are given varying degrees of causal interpretation. A cautious, yet superficial, alternative approach adopted by many au- thors writing about these kinds of correlations is to assert that because only random-assignment designs can prove causation, correlational evidence should not be interpreted as evidence of causality. But many of these same authors then go on to discuss the policy implications of their evi- dence (for review, see Reinhart, Haring, Levin, Patall, & Drew H. Bailey, Greg J. Duncan, and Tyler Watts, School of Education, University of California, Irvine; Doug H. Clements and Julie Sarama, Morgridge College of Education, University of Denver. We are grateful to the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under award number P01-HD065704. This research was also supported by the Institute of Education Sciences, U.S. Department of Education through Grants R305K05157 and R305A120813. The opinions expressed are those of the authors and do not represent views of the U.S. Department of Education nor the views of the NIH. We would also like to thank Peg Burchinal, Paul Hanselman, Fred Oswald, David Purpura, Deborah Stipek and seminar par- ticipants at Teachers College, Columbia University, for helpful comments on a prior draft and Ken T.H. Lee for his help with analyses. Finally, we would like to express appreciation to the school districts, teachers, and students who participated in the TRIAD research. Correspondence concerning this article should be addressed to Drew H. Bailey, School of Education, University of California, 3200 Education, Irvine, CA 92697-5500. E-mail: [email protected] This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. American Psychologist © 2018 American Psychological Association 2018, Vol. 73, No. 1, 81–94 0003-066X/18/$12.00 http://dx.doi.org/10.1037/amp0000146 81
Transcript

Risky Business: Correlation and Causation in Longitudinal Studiesof Skill Development

Drew H. Bailey, Greg J. Duncan, and Tyler WattsUniversity of California, Irvine

Doug H. Clements and Julie SaramaUniversity of Denver

Developmental theories often posit that changes in children’s early psychological character-istics will affect much later psychological, social, and economic outcomes. However, tests ofthese theories frequently yield results that are consistent with plausible alternative theoriesthat posit a much smaller causal role for earlier levels of these psychological characteristics.Our article explores this issue with empirical tests of skill-building theories, which predictthat early boosts to simpler skills (e.g., numeracy or literacy) or behaviors (e.g., antisocialbehavior or executive functions) support the long-term development of more sophisticatedskills or behaviors. Substantial longitudinal associations between academic or socioemotionalskills measured early and then later in childhood or adolescence are often taken as support ofthese skill-building processes. Using the example of skill-building in mathematics, we arguethat longitudinal correlations, even if adjusted for an extensive set of baseline covariates,constitute an insufficiently risky test of skill-building theories. We first show that experi-mental manipulation of early math skills generates much smaller effects on later mathachievement than the nonexperimental literature has suggested. We then conduct falsificationtests that show puzzlingly high cross-domain associations between early math and laterliteracy achievement. Finally, we show that a skill-building model positing a combination ofunmeasured stable factors and skill-building processes can reproduce the pattern of experi-mental impacts on children’s mathematics achievement. Implications for developmentaltheories, methods, and practice are discussed.

Keywords: early childhood, interventions, skill-building, cognitive development, education

Supplemental materials: http://dx.doi.org/10.1037/amp0000146.supp

Developmental theories often posit that changes in chil-dren’s early psychological characteristics will affect theirmuch later psychological, social, and economic outcomes.

Such theories include skill-building theories (e.g., Baroody,1987; Cunha & Heckman, 2007; Stanovich, 1986), theoriesof the life-course development of psychopathology (e.g.,Moffitt, 1993), theories that posit reciprocal effects betweenchildren and their environments (e.g., Scarr & McCartney,1983), and theories of early critical periods in children’ssocial and cognitive development (Fraley & Roisman,2015).

Tests of these theories are often conducted by estimatingcorrelations between important outcomes and children’searly psychological characteristics that have been adjustedby statistical controls for variables that might affect bothearly and later child characteristics. These findings aregiven varying degrees of causal interpretation. A cautious,yet superficial, alternative approach adopted by many au-thors writing about these kinds of correlations is to assertthat because only random-assignment designs can provecausation, correlational evidence should not be interpretedas evidence of causality. But many of these same authorsthen go on to discuss the policy implications of their evi-dence (for review, see Reinhart, Haring, Levin, Patall, &

Drew H. Bailey, Greg J. Duncan, and Tyler Watts, School of Education,University of California, Irvine; Doug H. Clements and Julie Sarama,Morgridge College of Education, University of Denver.

We are grateful to the Eunice Kennedy Shriver National Institute of ChildHealth & Human Development of the National Institutes of Health underaward number P01-HD065704. This research was also supported by theInstitute of Education Sciences, U.S. Department of Education through GrantsR305K05157 and R305A120813. The opinions expressed are those of theauthors and do not represent views of the U.S. Department of Education northe views of the NIH. We would also like to thank Peg Burchinal, PaulHanselman, Fred Oswald, David Purpura, Deborah Stipek and seminar par-ticipants at Teachers College, Columbia University, for helpful comments ona prior draft and Ken T.H. Lee for his help with analyses. Finally, we wouldlike to express appreciation to the school districts, teachers, and students whoparticipated in the TRIAD research.

Correspondence concerning this article should be addressed to Drew H.Bailey, School of Education, University of California, 3200 Education,Irvine, CA 92697-5500. E-mail: [email protected]

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

American Psychologist© 2018 American Psychological Association 2018, Vol. 73, No. 1, 81–940003-066X/18/$12.00 http://dx.doi.org/10.1037/amp0000146

81

Robinson, 2013), a linkage that requires causal evidence.One cannot have it both ways.

We argue that regression-adjusted correlations often pro-vide insufficiently “risky” tests of developmental theories.We borrow from Meehl’s (1978, 1990) insight that whendiverse theories make the same predictions, it is importantto conduct “risky” tests that can distinguish among them. Aprediction is considered risky if the probability of such aprediction being true, assuming that the theory is false, islow. Nonzero regression-adjusted correlations between chil-dren’s early psychological characteristics and their muchlater psychological, social, and economic outcomes do notconstitute a risky test of a developmental theory becausesuch correlations are consistent with a number of plausiblecompeting theories, including those positing that a combi-nation of differentially stable general cognitive abilities,personality, and environmental affordances is responsiblefor generating the correlational patterns. We explore theseissues in the context of a well-trodden area in child devel-opment and education: the substantial correlations betweenchildren’s early and much later academic achievement.

Correlational Tests of Skill-Building Theories

Even after adjusting for a large set of controls, includingbaseline measures of other academic and socioemotionalskills and capacities, domain-general cognitive abilities, andsocioeconomic status, strong longitudinal correlations areoften observed in studies of academic domains of schoolreadiness, across many years (Aunola, Leskinen, Lerk-kanen, & Nurmi, 2004; Bailey, Siegler, & Geary, 2014;Duncan et al., 2007; Geary, Hoard, Nugent, & Bailey, 2013;

Jordan, Kaplan, Ramineni, & Locuniak, 2009; Siegler et al.,2012; Watts, Duncan, Siegler, & Davis-Kean, 2014). Theselongitudinal correlations constitute an important part of theempirical basis for skill-building theories.

Researchers, including authors of this article (Bailey,Siegler, et al., 2014; Duncan et al., 2007; Watts et al., 2014),attribute to these robust correlations varying degrees ofcausality. For example, the Duncan et al. (2007; p. 1,430)study of school readiness states: “. . . we implement rigorousanalytic methods that attempt to isolate the effects ofschool-entry academic, attention, and socioemotional skillsby controlling for an extensive set of prior child, family, andcontextual influences that may be related to children’sachievement.” Focusing on the development of children’smathematics achievement, this article uses experimentalevidence, falsification tests, and alternative model structuresto show that riskier tests suggest a much smaller causal rolefor skill-building processes than is commonly believed.

Causal Mechanisms and Correlational Patterns

What are the causal mechanisms through which boosts inearly school readiness skills and behaviors promote thedevelopment of much later academic and socioemotionalskills? Skill-building models provide one clear answer: Formath and literacy, early academic skills are the foundationsupon which later skills are built. In the case of math,counting serves as a basis for children’s early additionproblem solving (Baroody, 1987), and addition is oftenemployed as a subroutine of children’s multiplication prob-lem solving (Lemaire & Siegler, 1995). Such findings mightreasonably lead one to predict that the children with themost solid early foundations of math skills, in the context ofK–12 instruction will tend to maintain higher levels of mathskills throughout childhood and adolescence.

In the development of reading skills, children’s ability tomatch letters to sounds supports their learning to recognizewritten words, which in turn supports their vocabularylearning, which then supports their reading comprehension.Causal relations among these literacy skills are likely bidi-rectional with; for example, increases in reading compre-hension facilitating more reading, which increases vocabu-lary (Stanovich, 1986).

The skill-building model of Cunha and Heckman (2007)is more comprehensive in that it allows for simpler skills tosupport more sophisticated skills, but also posits a kind ofmultiplier effect in which early skills and capacities canincrease the productivity of subsequent schooling and otherinvestments. Moreover, it assumes that the list of “inputs” toproduce any particular skill or behavior may include a widearray of past skills and behaviors.

Substantial longitudinal correlations within domains ofacademic achievement and socioemotional behaviors arepredicted by these skill-building causal models and are

Drew H. Bailey

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

82 BAILEY, DUNCAN, WATTS, CLEMENTS, AND SARAMA

generally found in studies that estimate zero-order andregression-adjusted correlations within many domains ofachievement and socioemotional skills across time (e.g.,Duncan et al., 2007). For example, the “math to math,”“reading to reading,” and “anti-social to anti-social” inter-wave correlations with fall of kindergarten values, calcu-lated from national data from the 1998–99 Early ChildhoodLongitudinal Study (ECLS-K), appear in Figure 1 (measureand sample details in the online supplemental material Ap-pendix). These correlations decay the most across the kin-dergarten year, but then flatten out to a moderate (about .45to .65) magnitude by fifth grade. These kinds of patternshave been well documented in longitudinal correlationalstudies of children’s cognitive abilities (Bayley, 1949;Tucker-Drob & Briley, 2014) and in the development ofpersonality (Anusic & Schimmack, 2015).

Although clearly consistent with skill-building develop-mental theories, these correlations do not constitute a riskytest of those theories because they are also consistent withalternative theories in which the causal effects of early skillson much later skills play a minor role in contributing to thestability of psychological characteristics during cognitivedevelopment. We focus on one set of competing theories:those that posit an important role for some combination offoundational and relatively stable psychological character-istics and persistent environmental characteristics, such asfamily functioning or neighborhood poverty. If these influ-ences are not captured sufficiently with regression controlsor other techniques for reducing omitted-variables bias(OVB), then the apparent role of skill-building processes ingenerating cross-time correlations could be seriously over-stated.

Riskier Tests

We discuss three promising approaches to understandingthe importance of skill-building processes. First, and mostimportant, is an experimental manipulation of children’searly skills or behaviors, which provides the riskiest (i.e.,most at risk of being refuted) test of theories of children’sskill development. Second, we detail how falsification testscan be applied more widely to correlational data. And third,we show that longitudinal correlations and experimentalimpact patterns can be modeled in ways that make moreprecise (and thus riskier) predictions about the effects ofprior skills or behaviors on later skills or behavior. Ourempirical evidence on these approaches is taken exclusivelyfrom the domain of math achievement. As we argue below,our analysis has implications for a much broader set ofdevelopmentally important skills and behaviors.

Experimental Evidence

Random assignment to programs that boost school read-iness skills and behaviors provide a very risky test of thetheory that early skills are a powerful cause of the learningof new content in a manner that allows students with earlyskill advantages to maintain this advantage throughoutschool. If, as Duncan et al. (2007) imply, controls for anextensive set of prior child, family, and contextual influ-ences enable an analyst to use nonexperimental data tocompare otherwise similar groups of children who differonly in one particular school-entry skill or behavior, thenthe multivariate regression approach of Duncan et al. (2007)and others could identify the causal impact of a particularskill or behavior on later school success. We would thenexpect the patterns of predicted impacts from these regres-sion models to match those generated by a genuine random-assignment experiment.

To investigate whether well-controlled correlational mod-els of long-run achievement patterns reliably generatecausal estimates, we draw data from a test of the

Figure 1. Bivariate correlations with fall of kindergarten measures.

Greg J. Duncan

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

83RISKY BUSINESS

Technology-Enhanced, Research-Based, Instruction, As-sessment, and Professional Development learning interven-tion model (TRIAD). TRIAD features a preschool mathe-matics curriculum, called Building Blocks, as its keycomponent (see Clements & Sarama, 2008). As explained inthe Appendix, the TRIAD study randomly assigned 42schools with state-funded preschool programs in Massachu-setts and New York either to a treatment condition in whichthe Building Blocks curriculum was implemented in pre-school classes, or to a control condition in which preschoolmath was taught as usual. In treatment schools, the curric-ulum was administered over the course of the preschoolyear. Math achievement was measured in the fall and springof the prekindergarten year; in the spring of the kindergar-ten, first-, fourth-, and fifth-grade years, as well as in the fallof the fourth-grade year. Random assignment checksshowed that treatment and control groups were balanced(Clements, Sarama, Spitler, Lange, & Wolfe, 2011).

We first ignored experimental variation in the TRIADdata and used the study’s control group to generate cross-time correlations between math achievement in the spring ofthe prekindergarten year and math achievement measured inall of the study’s follow-ups. In contrast to Figure 1, weadjusted these correlations for the baseline achievement anddemographic measures described in the Appendix and alsoshow 95% confidence intervals associated with each of theestimates. These confidence intervals were derived fromSEs that were adjusted for school-level clustering. Thecorrelations shown in the “TRIAD regression-adjusted cor-relations” line of Figure 2 display the same kind of asymp-totic pattern found with the unadjusted ECLS-K-based“math-to-math” correlations shown in Figure 1.1

Regression adjustments drop the estimated effect byaround .20 SD units, although these estimates still exceed.40 SD in fifth grade. Duncan and colleagues (2007) re-ported an average regression-adjusted math-to-math esti-mated predicted impact of a remarkably similar .42 SDwhen comparing an early measure of children’s mathemat-ics achievement with later measures across six data sets. Ifthe baseline covariates included in these regression-adjustedTRIAD estimates eliminate OVB, then we would expect tosee a similar pattern in the experimental data.

The “TRIAD Treatment Impacts” line in Figure 2 showsthat this is not at all the case. Treatment and control differ-ences at the end of the pre-K year amounted to .63 SD—alarge impact. To establish comparability between this .63SD impact and the 1.0 SD predicted impact implicit in theregression-adjusted estimate shown in Figure 2’s top line,we rescale this and all other experimental impact estimatesby multiplying by 1/.63.2 Rescaled impact estimates fall toabout .46 SD within a year and drop to statistically nonsig-nificant .08 SD and �.02 SD values for the two fourth-gradetests. The partial recovery of impacts in fifth-grade (.14 SD)is intriguing, but statistically indistinguishable from zero inthis analysis.3 Overall, the correlation-based estimate of thetreatment effect is very close to the observed treatmenteffect 1 year after the end of treatment, but then muchhigher than the observed treatment effects at all subsequentwaves. In highlighting the discrepancy between correla-tional estimates and experimental impacts, we do not intendto discourage the use of early intervention (we discusspossible implications for research and practice below).Rather, our goal is to highlight the inaccuracy of estimatedtheoretically important causal effects when confounds areassumed to be largely or fully controlled in a regressionmodel.

TRIAD’s ability to shed light on math skill-buildingprocesses is a function of the comprehensiveness of itsinitial impacts. End-of-preschool mathematics knowledgewas assessed using the Research-Based Early Maths As-sessment (REMA; Clements, Sarama, & Liu, 2008; de-scribed more completely in the supplemental materials,which are available online). The REMA assessed children’sconceptual and procedural knowledge, as well as problem-solving and strategic competencies in the domain of earlymathematics, and has been shown to strongly correlate with

1 Full correlation matrices for measures of children’s mathematicsachievement administered at several waves across development are avail-able in Bailey, Watts, Littlefield, and Geary (2014).

2 This means, for example, that TRIAD’s .63 SD impact at the end ofpre-kindergarten is shown as 1.0 and its the .29 SD experimental impact atthe end of the kindergarten is shown as .29 SD (� .18/.63). The scale forthe correlations and rescaled impact estimates are shown on the left y-axis,while the non-rescaled estimates appear on the right y-axis.

3 Using a 2-level random intercept logistic regression model, Clementsand colleagues (2017) found a statistically significant treatment impact ofthe TRIAD intervention at the spring of fifth grade.

Tyler Watts

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

84 BAILEY, DUNCAN, WATTS, CLEMENTS, AND SARAMA

other measures of early math learning (e.g., Applied Prob-lems, Child Math Assessment). Furthermore, it has beenshown to strongly predict later mathematics achievementmeasured through Grade 5 (Watts, Duncan, Clements, &Sarama, 2017). Thus, the REMA should provide a strongmeasure of the early mathematical competencies needed tobuild later skills in mathematics.

As explained in the Appendix, the end-of-Pre-K REMAtest showed considerable variation in four subdomains ofpreschool mathematics knowledge: counting, patterning,measurement, and geometry. Bearing in mind that the psy-

chometric properties of the overall REMA test, but not itssubscales, have been established (Clements, Sarama, & Liu,2008; Weiland et al., 2012), we grouped REMA items intoeach of these subdomains and created four measures definedas the proportion of correct responses on the items includedin each category. We then tested the impact of the treatmenton each standardized subdomain score. The interventiongenerated statistically significant impacts on all four sub-domains: counting (� � 0.45, SE � 0.06), patterning (� �0.36, SE � 0.06), geometry (� � 0.67, SE � 0.06), andmeasurement (� � 0.20, SE � 0.06), all of which have beenshown to predict later mathematics knowledge (Nguyen etal., 2016). Because the intervention boosted a wide varietyof preschool math skills, treated children should have had amuch stronger base of math competencies from which tobuild further math skills when compared with children inthe control group. In other words, these robust causal im-pacts at the end of the pre-K year suggest that the TRIADintervention provides an excellent foundation for tests ofsubsequent skill-building processes.

Returning to the patterns of experimental impacts in Fig-ure 1, it is noteworthy that TRIAD treatment effects do notdisappear completely immediately following the conclusionof treatment, which is indeed consistent with skill buildingprocesses at work in children’s academic development.However, skill-building processes following the conclusionof the intervention do not appear to sustain a substantialtreatment effect much beyond first grade. This pattern ofdeclining treatment effects is consistent with the patternsobserved in many randomized, controlled trials testing theeffects of interventions designed to boost children’s earlyacademic skills (Bus & van IJzendoorn, 1999; Puma et al.,2012; Smith, Cobb, Farran, Cordray, & Munter, 2013; for

Figure 2. Regression-adjusted correlations and experimental impacts in Technology-Enhanced, Research-Based, Instruction, Assessment, and Professional Development learning intervention model (TRIAD).

Doug H.Clements

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

85RISKY BUSINESS

review, see Bailey, Duncan, Odgers, & Yu, 2017, and for areview of treatment impacts on children’s intelligencescores, see Protzko, 2015).

The divergent lines in Figure 2 pose a profound challengeto the large correlational literature (including our own work)that has relied on longitudinal trajectories based on nonex-perimental data to infer developmental processes. It appearsthat experimentally induced changes in early skills mayhave temporary effects on children’s subsequent learning.Yet, in the longer run, and in the context of the elementaryschools most of these children attended, children’s skillsconverge to trajectories governed by other processes.

A Falsification Test Based on Cross-DomainCorrelations

Even in the absence of experimental data, it is possible tosubject skill-building models to riskier tests using correla-tional data. Falsification tests provide one example. One setis based on the argument that if within-domain skill buildingprocesses were generating the strong pattern of longitudinalcorrelations shown in Figure 1, then cross-domain correla-tional patterns should be much weaker. Specifically, in thecase of mathematics and reading, correlations between earlyand later math achievement scores should be persistentlyhigher than correlations between early math and later read-ing. Figure 1 shows this is the case for math and antisocialbehavior but decidedly not for math-to-reading correlations,which are virtually indistinguishable from math-to-mathcorrelations beyond first grade. That school-entry mathe-matics achievement is a robust predictor of children’s long-term reading outcomes was also observed by Duncan and

colleagues (2007) in their analysis of six longitudinal datasets (including the ECLS-K).

Skill-building models of children’s academic develop-ment have a difficult time explaining why early mathemat-ics achievement would exert a strong causal impact on laterreading achievement. To be sure, skills such as languagecomprehension are common to both mathematics and read-ing achievement. But other evidence shows that correlationsbetween early math and later reading scores (.26 in meta-analytic estimates in Duncan et al., 2007) are much higherthan correlations between early reading and later math (.10).Is it plausible that boosting children’s early mathematicsskills would affect later reading skills 1 to 2 years later tothe same extent that it affected children’s mathematicsskills? On one hand, the TRIAD early mathematics inter-vention did show effects on some measures of early orallanguage skills (Sarama, Lange, Clements, & Wolfe, 2012).On the other hand, the effects did not generalize to othertests of early reading skills, and the statistically significanteffects were much smaller than they were for children’smathematics skills. Learning mathematics may have a non-zero effect on children’s early reading achievement, but wedoubt that the effect would be almost identical to the effecton children’s achievement in the same domain. Still, weconsider additional falsification tests below.

Models Consistent With Temporal Patterns ofWithin-Domain Correlations

Consider the asymptotic rather than complete decline inthe within-domain correlations shown in Figure 1 and theregression-based correlations in Figure 2. What kind ofskill-building processes would cause the later impacts ofearly achievement to become constant throughout develop-ment? A simple skill-building model could explain theshape of these lines if learning a basic skill earlier thanone’s peers persistently enabled a child to learn more ad-vanced skills before his or her peers. For example, if learn-ing to count before one’s peers resulted in a high probabilityof learning to add before one’s peers, which in turn resultedin a high probability of learning to multiply before one’speers, the correlation between counting skills and multipli-cation fluency could be high.

However, probabilities of learning later skills conditionalon learning an early skill are the product of these interimprobabilities. This model is shown with solid lines in Figure3, where the paths MS1 and MS2 (i.e., “math skills”) repre-sent the impacts of a previous math skill on the immediatelyfollowing math skill. As long as these probabilities are lessthan 1, we should observe some kind of exponential decayin early-to-late correlations as skills become more ad-vanced. Research on transfer of learning, wherein knowl-edge or skills learned in one domain or setting are applied toother situations, provides a strong theoretical basis for this

Julie Sarama

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

86 BAILEY, DUNCAN, WATTS, CLEMENTS, AND SARAMA

decay. In particular, this work suggests that as the featuresof domains and settings diverge from those in which theinitial learning takes place, transfer becomes decreasinglylikely (Perkins & Salomon, 1988). As children progressthrough school, the settings and content to which they areexposed grow increasingly dissimilar, on average, to thesettings and content to which they are exposed at schoolentry.

The correlational estimates in Figures 1 and the top line ofFigure 2 show a different pattern. They decay a bit overtime, which is predicted by the hypothesis that skill-building plays a role in the stability of individual differencesin children’s early academic achievement. However, theysoon show a great deal of stability, especially after the firstyear of the study, which suggests that other factors orprocesses are at work.

Skill-building models may account for an asymptote inthe effects of early math achievement on much later mathachievement if individual differences in early achievementskills provide a basis for learning across development. Thistheory, illustrated by the dashed line in Figure 3, is appeal-ing, given that skills acquired early in development clearlyprovide a basis for children’s subsequent academic devel-opment. However, given that the most basic skills arequickly mastered by the vast majority of children (Engel,Claessens, Watts, & Farkas, 2016; Paris, 2005), individualdifferences in such skills are unlikely to account for robustlongitudinal associations between earlier and later academicskills. For example, if almost all fifth graders can count to10, it is difficult to imagine how children’s ability to countto 10 would underlie individual differences in fifth graders’learning, despite the obvious importance of being able tocount to 10 for learning mathematics throughout develop-ment. It is an open question whether broader foundationalproficiencies (as described by the Kilpatrick, Swafford, &Findell, 2001), which are not quickly or easily mastered(e.g., OECD, 2016) could be developed early and result inmore persistent effects on children’s mathematics achieve-ment.

An Alternative Developmental Model

What might account for the lingering discrepancy be-tween experimental and correlational estimates of the ef-fects of changes in early academic skills on academic skills

several years later? The discrepant correlational and exper-imental patterns shown in Figure 2 and the estimated pat-terns of correlations across time within and between aca-demic domains all suggest that omitted variablesinfluencing development throughout the observed periodmay be imparting a substantial upward bias to the correla-tional estimates. Directly and precisely measuring all of theimportant variables omitted from the regressions producingthe top line of Figure 2, and then controlling for them in aregression, is a Herculean task.

An instructive alternative approach is to partition thecauses of children’s academic development into factors thatexert a stable influence on children’s academic skillsthroughout development and—to continue with the exampleof mathematics achievement—children’s mathematicsknowledge assessed in the immediately preceding wave ofdata collection. This approach provides an additional riskytest of the hypothesis that skill-building effects can berecovered in longitudinal data sets. If confounds are suffi-ciently controlled in standard regression models, then re-moving variance attributable to stable factors influencingchildren’s learning across development should not impactestimates of the effects of early achievement on laterachievement. If, however, persistent confounds are respon-sible for the discrepancy between experimental and regres-sion derived estimates, then controlling for these confoundsacross development should enable us to reproduce experi-mental estimates using correlational data.

A simple version of one such model is depicted in Figure4. Developed by Steyer (1987), and implemented in struc-tural equation modeling software (for discussion, see Cole,Martin, & Steiger, 2005; Steyer & Schmitt, 1994; for an

Figure 3. Direct and indirect paths in a math skill-building model.

Figure 4. An alternative math skill-building model with unmeasuredpersistent influences.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

87RISKY BUSINESS

accessible introduction and code, see Prenoveau, 2016), thisso-called latent state-trait model considers children’s math-ematics achievement at any time to be caused by two sets offactors:

(1) Children’s mathematics skill at the immediately pre-ceding measurement occasion. The impacts of children’simmediately preceding mathematics achievement on theirsubsequent mathematics achievement, indicated in Figure 4by MS1, MS2, . . ., MSk�1, could occur through contentoverlap and two aspects of skill-building—transfer of learn-ing and indirect effects of mathematics achievement viaincreased motivation, teacher placement, or other media-tors.

(2) Characteristics with a stable influence on children’slearning throughout development, represented by the load-ing of children’s mathematics achievement at each occasionon a latent variable, labeled in Figure 4 as “Unmeasuredpersistent factor.” This factor is likely comprised of envi-ronmental and personal factors that differ between individ-uals in a similar manner across development and couldinclude a much broader set of stable influences than isusually implied by the term “trait.”

The model depicted in Figure 4 requires at least threewaves of data to estimate but has several advantages overtraditional regression and some other SEM-based ap-proaches for estimating the effects of children’s prior andsubsequent skills and behaviors. First, in the case of math,it simultaneously estimates effects of children’s priorachievement on their later achievement during several in-terwave periods (the MS paths), thereby allowing for simul-taneous tests of several theoretically important predictions(e.g., that a treatment effect on children’s Time 1 mathe-matics achievement will be reduced to the treatment effecttimes MS1 at the second wave, to the original treatmenteffect times MS1 times MS2 by the third wave, etc.).

Second, the model may plausibly account for the apparentdiscrepancy between correlational and experimental esti-mates of the effects of children’s prior mathematicsachievement on their later mathematics achievement. If oneassumes relatively large effects of stable environmental andpersonal factors on children’s mathematics learning, thenthe model predicts an approximately exponential decay oftreatment effects as the time between measurement occa-sions increases—a pattern consistent with experimental es-timates—accompanied by high correlational stability. In thepresence of stable confounding, more common alternativeapproaches such as the cross-lagged panel model mightyield upwardly biased predicted effects across development(Hamaker, Kuiper, & Grasman, 2015).

Third, because of the accumulating effects of stable fac-tors on children’s achievement across development, themodel generates a testable prediction of increasing interyearstability as children get older (Cole et al., 2005). Thispattern is well established in the development of children’s

general cognitive ability, both at the phenotypic and geneticlevels (Bayley, 1949; Tucker-Drob & Briley, 2014), and inthe development of personality (Anusic & Schimmack,2015). It is also evident in the ECLS-K dataset, where thecorrelations between mathematics achievement scoresacross waves increase, despite growing interwave intervals.Children’s mathematics achievement in the spring of kin-dergarten correlates .77 (SE � .01) with their mathematicsachievement 1 year later in the spring of first grade, whichcorrelates .80 (SE � .01) with their mathematics achieve-ment 2 years later than that in the spring of third grade,which correlates .89 (SE � .01) with mathematics achieve-ment 2 years later than that in the spring of fifth grade.

Fourth, the model can be easily adapted into an experi-mental design, in which the first wave of posttreatmentachievement and the stable latent variable are simultane-ously regressed on treatment status. Relying on the TRIADdata used previously, Watts and colleagues (2016) found asubstantial impact of the intervention on children’s mathskills, but no effect on the latent variable representing stablefactors that influence children’s achievement across devel-opment. However, this finding warrants replication underconditions in which persistence may be most likely, includ-ing for subgroups, treatments, and populations for whichskills affected by the intervention are least likely to developunder counterfactual conditions.

To be sure, the model depicted in Figure 4 leaves much tobe desired because it merely assigns a key role to unmea-sured persistent factors but does not identify them. Asdiscussed previously, the ideal test of what constitutes apersistent factor, and indeed, of whether such a factor trulyexists, is to regress the unmeasured persistent factor on asource of exogenous variation, such as a randomly assignedintervention. In correlational data sets, the unmeasured per-sistent factor can be regressed on hypothesized sources ofpersistent variation (Bailey, Watts, Littlefield, & Geary,2014), but this analysis is vulnerable to problems associatedwith all cross-sectional regression analyses, such as OVB.Furthermore, the model is only one of many possible ex-planations for how early and later mathematics achievementare related. We hope other models will be compared withthe one we use here, both on the basis of fit to correlationaldata sets and their predictions about experimentally inducedeffects across time.

Comparing the Persistent Factor Model WithExperimental Estimates

Assuming no effects of early mathematics interventionson the unmeasured persistent factor (an assumption thatdeserves continued scrutiny), the key model parameters forestimating the pattern of treatment effects over time are theMS paths in Figure 4. Table 1 shows estimates of the MSpaths from all of the correlational studies of the develop-

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

88 BAILEY, DUNCAN, WATTS, CLEMENTS, AND SARAMA

ment of children’s math achievement across prekindergar-ten and the elementary school years to which the state-traitmodel has been applied, to our knowledge. The average1-year lagged MS paths from the three data sets are allmodest, ranging from .29 to .35, depending on whethereffect estimates are weighted by their underlying samplesizes. Put another way, the model depicted in Figure 4predicts that the treatment effect should decay to approxi-mately one third of its previous magnitude each year.

How well do these estimates track patterns observed in theTRIAD experimental study? The bottom half of Table 1 showsestimates of MS paths implied by experimental impacts fromthree early mathematics interventions that followed childrenfor at least a year following the conclusion of the givenintervention included in the bottom half of Table 1. One-yearlagged MS paths can be calculated by dividing experimentalimpacts at the end of a 1-year period by the impacts at thebeginning of that period. The average 1-year lagged MS path

from the three studies ranged from .39 to .44, depending onhow effects were weighted across studies. In other words, MSpaths inferred from patterns of experimental impacts acrosstime follow a pattern of decay that is only slightly less steepthan that predicted by estimates derived from models based ona persistent latent factor. Patterns of impacts across time pre-dicted by estimated and inferred weighted study average MSpaths appear in Figure 5. These are calculated using the for-mula MSt, where MS is the estimated (.35) or inferred (.44)weighted study average MS path and t is the number of yearssince the end of the treatment. They are similar to each otherand to the pattern of impacts in the TRIAD study (this isunsurprising for the inferred paths given that these were basedin large part on the TRIAD impact estimates). In fact, theaverage MS path estimated from the state-trait model fallswithin the confidence interval of every observed impact in theTRIAD study, while this is true of only one out of fiveregression-based estimates.

Table 1Estimates of Math Skills (MS) Paths From Observational and Experimental Data

Source Sample size Period Implied 1-year MS estimate

State-trait estimatesBailey, Watts, et al., 2014: Missouri Math

Study292 Grade 1–Grade 2 .26

Bailey, Watts, et al., 2014: Missouri MathStudy

292 Grade 2–Grade 3 .18

Bailey, Watts, et al., 2014: Missouri MathStudy

292 Grade 3–Grade 4 .20

Bailey, Watts, et al., 2014: SECCYD 1124 Grade 1–Grade 3 .58Bailey, Watts, et al., 2014: SECCYD 1124 Grade 3–Grade 5 .30Watts et al., 2016: TRIAD 834 PreK–K .25Watts et al., 2016: TRIAD 834 K–Grade 1 .04Watts et al., 2016: TRIAD 834 Grade 1–Grade 4 .51

Simple average .29Unweighted study average .31Weighted study average .35Experimental estimates

Current article: TRIAD 834 PreK–K .46Current article: TRIAD 834 K–Grade 1 .48Current article: TRIAD 834 Grade 1–Grade 4 .48�

Current article: TRIAD 834 Grade 4–Grade 5 N/A��

Hofer et al., 2013: TRIAD 1192 Pre-K–K .28Hofer et al., 2013: TRIAD 1129 K–Grade 1 .67Smith et al., 2013 320 Grade 1–Grade 2 .22���

Simple average .43Unweighted study average .39Weighted study average .44

Note. SECCYD � Study of Early Childcare and Youth Development (see NICHD Early Child Care ResearchNetwork, 2002); TRIAD � Technology-Enhanced, Research-Based, Instruction, Assessment, and ProfessionalDevelopment. One-year MS estimates from intervals with multiple lags are calculated by raising the reportedestimate to the power of 1/t, where t is the number of years in the given interval. MS estimates from experimentalstudies are calculated by dividing a treatment effect by a prior treatment effect, and are corrected using the sameexponential transformation when intervals between measurements vary by an amount different from 1 year. Allstate-trait models corrected correlations among math tests for measurement error by setting the path from thelatent state factor to the mathematics measure equal to the square root of the reliability of the test. Forinformation regarding the Missouri Math Study, see Geary, 2010. Information regarding the TRIAD study ispresented in the Appendix and in Clements et al., 2013.� Fourth grade is the average of two fourth-grade scores. Average interval is 2.75 years. �� Fifth-grade estimateis higher than fourth-grade estimate; neither is statistically distinguishable from 0. ��� Treatment effects werecalculated as the average of the three standardized mathematics tests administered at both waves.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

89RISKY BUSINESS

These patterns may generalize well to other kinds ofstudies of children’s academic achievement outcomes. In areview of experimental estimates from 67 high-quality stud-ies of early childhood education programs, Li and col-leagues (2016) reported an average end-of-treatment effectsize of .23, with estimates 0–1 year after treatment averag-ing approximately .10. Impact estimates from subsequentwaves were smaller, but their precise values were sensitiveto inclusion criteria.

As mentioned previously in the discussion of Figure 2,correlational data track experimental impact estimates fromTRIAD much more closely at the end of kindergarten thanin later grades. In light of the estimates of the MS pathsshown in Table 1, the model depicted in Figure 4 appears totrack experimental estimates more closely in later gradesthan at the end of kindergarten. An obvious possible expla-nation for the larger 1-year lagged MS paths inferred fromexperimental estimates is some misspecification or bias inthe Figure 4 model. Another possible explanation for thedifference is that early mathematics interventions generatetransitory impacts on a broader set of children’s capacities(e.g., oral language [Sarama et al., 2012] or motivation),which independently boost children’s later mathematicsachievement. Although in the latter case, the state-traitestimates would actually provide more accurate estimates ofMS paths than those inferred from experimental impacts, theexperimental impacts are more policy-relevant than thestate-trait estimates in either case.

In summary, a model that allows for persistent unmea-sured factors produces estimates consistent with the expo-nential decay in treatment effects observed in the mostrelevant set of experimental studies on children’s earlymathematics achievement, whereas traditional methods failto do so after the first year or so. Notably, the correlational

estimates most in line with experimental data were pro-duced by the state-trait model from the largest sample towhich it has been applied, which also spanned the longesttime interval. In our view, this model has advantages overstandard ordinary least squares regression at approximatingexperimental impacts because it considers persistent unmea-sured factors influencing children across development.However, the model does not resolve several importantissues. First, it is unclear whether this particular model is theideal specification (see Cole et al., 2005, for a discussion,and Hamaker et al., 2015, for alternatives), or whetherintervention effects are likely to be confined to occasion-specific variation, rather than stable environmental and per-sonal factors. Furthermore, more research should investi-gate what variables constitute the stable environmental andpersonal factors that influence children’s mathematicslearning throughout development.

What Stable Factors are Missing From OurRegression Models?

As noted previously, we think that the “unmeasured per-sistent factors” in Figure 4 that influence children’s aca-demic and social development are in all likelihood a set ofstable environmental and personal factors. Probable influ-ences on child achievement throughout development in-clude domain-general cognitive abilities, personality, andenvironmental affordances. Intelligence and working mem-ory have been strongly implicated as key drivers of chil-dren’s academic development in correlational studies(Deary, Strand, Smith, & Fernandes, 2007; Geary, Hoard,Nugent, & Bailey, 2012; Szücs, Devine, Soltesz, Nobes, &Gabriel, 2014), so much so that a general factor extractedfrom various cognitive tests was found to correlate .83 with

Figure 5. Correlations inferred from math skills (MS) path estimates in Table 1.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

90 BAILEY, DUNCAN, WATTS, CLEMENTS, AND SARAMA

a general factor extracted from academic achievement tests(Kaufman, Reynolds, Liu, Kaufman, & McGrew, 2012).Personality also likely plays a significant and complex rolein children’s academic development.

Both personality and domain-general cognitive abilitiesare substantially influenced by differences in both genesand environments (Bouchard & McGue, 2003). Asidefrom latent environmental effects inferred from imperfectcorrelations between identical twins, strong designs havealso identified effects of measured environments, such asadoption (Kendler, Turkheimer, Ohlsson, Sundquist, &Sundquist, 2015; van Ijzendoorn, Juffer, & Poelhuis,2005), maternal nutrition during prenatal development(Almond & Mazumder, 2011), or a very intensive earlychildhood education program (Campbell, Pungello,Miller-Johnson, Burchinal, & Ramey, 2001), on cogni-tive abilities many years later.

Implications for Design and Analysis inDevelopmental Research

We have argued that commonly used approaches to in-ferring skill-building processes from longitudinal correla-tion are based on insufficiently risky tests. In particular, wehave shown that longitudinal correlations imply a muchstronger skill-building process than does more direct evi-dence from experimental studies; that the similarity ofwithin- and cross-domain correlations over time constitutesa falsification test that a simple math skill-building modeldoes not pass; and that at least one alternative developmen-tal model, which accounts for unmeasured factors, betterreproduces the declining pattern of impacts generated by alarge random-assignment evaluation of an intensive mathskills intervention.

Although our review has been confined to mathematicslearning, we suspect that we would find similar patterns indata on literacy, given the parallel nature of skill-building inthose two domains of learning and the similar patterns ofwithin-domain longitudinal correlations shown in Figure 1.Whether our conclusions about math generalize to otherdomains of interest in developmental research, such asantisocial behavior or executive functions, is less obvious.The differing heights of patterns of math-to-reading andmath-to-antisocial behavior correlational lines shown inFigure 1 suggest that whatever latent factor may underliemath and reading trajectories does not substantially impactantisocial behavior across development. However, an anal-ogous developmental story may apply: Correlations be-tween kindergarten antisocial behavior and antisocial be-havior at subsequent waves follow a pattern similar to thosefor children’s academic skills (Figure 1), suggesting thatother factors may generate stability in children’s antisocialbehavior across time. If these factors are not well measured,the auto-regressive effects of antisocial behavior will be

exaggerated. Consistent with this possibility, Anusic andSchimmack (2015) observed substantial stability in the in-terwave correlations of personality, affect, self-esteem, andlife satisfaction, with the highest stability observed in per-sonality.4 It is important to emphasize that the existence ofeffects of stable environmental and personal factors onchildren’s academic development does not preclude skill-building processes, nor does the existence of empiricallystable environmental and personal factors imply that suchfactors are immutable.

Better measurement of children’s skills enables riskiertests of developmental hypotheses. Measures in large lon-gitudinal studies are often based on single scores rather thanmore cognitively complex and diagnostic assessments.Thus, children assigned the same score are assumed to havethe same knowledge state, such as children who raised theirscores via participation in an intervention group matched tocontrols who achieved that score without the intervention(Bailey et al., 2016). The intervention may have taught theformer certain concepts and skills, raising their score. How-ever, the latter, control children will likely have a far longer,far more extensive, set of experiences that led to the samescore. For example, building parallel distributed processnetworks of broad reach across the brain, which, becausethey have been reinforced for years, have established re-trieval paths are myriad, strong, and stable. The formerchildren have none of these advantages. Therefore, futuremeasures of the two groups of children may yield differentscores even if subsequent experiences are the same. Futureassessments and research designs, including those that in-vestigate measurement invariance across subgroups (Wich-erts, 2016), are needed to investigate such possibilities.

Some research has included much more specific measuresof children’s knowledge in longitudinal studies. The advan-tage of this approach is that such studies can, in principal,provide riskier tests of skill building theories by relatinggains in specific knowledge states to subsequent knowledgestates. If researchers are testing a very specific theory oflearning, this approach enables them to make very specificpredictions about where nonzero estimated effects shouldappear, how big they should be, and perhaps most impor-tantly, when they should vanish. This approach is whatShadish, Cook, and Campbell (2002) refer to as coherentpattern matching. The downsides to this approach are that(a) better measurement in one domain often comes at thecost of lower sample size and/or worse measurement inother domains, including family background characteristics,

4 Anusic and Schimmack (2015) reported values of “1-year stability ofthe change component” comparable to the 1-year MS paths reported in thisarticle, Table 1. For personality, affect, self-esteem, and life satisfaction,the authors reported values of .25, .88, .79, and .78, respectively. Thus, the.35 estimate in Table 1 indicates that inter-individual stability across timein children’s mathematics achievement more closely resembles the patternobserved for personality than for affect, self-esteem, or life satisfaction.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

91RISKY BUSINESS

(b) some developmental theories are sufficiently open-ended that a very wide range of estimated effects can beargued to be plausible after the fact, and (c) in the contextof early cognitive development, individual differences maybe sufficiently nondifferentiated that differential predictionof later outcomes better reflects the extent to which ameasure captures commonalities in knowledge states ratherthan uniqueness to a specific knowledge state (Paris, 2005;Purpura & Lonigan, 2013; Schenke, Lam, Rutherford, &Bailey, 2016). These shortcomings may be addressed byprecision and completeness in measurement and theory andby formulating the riskiest tests possible. We see as partic-ularly useful the practice of comparing correlational esti-mates to the most relevant experimental impacts wheneverpossible.

Following a demonstration of mismatched correlationaland experimental findings, it is common to call for morerandomized controlled trials. We certainly endorse random-ized, controlled trials because they can provide the strongestevidence on skill-building processes, but we recognize thatmany (including most lab-based experiments) have limitedexternal validity, sometimes target a bundle of constructsthat may benefit children but render implications for devel-opmental processes unclear, and rarely track longer-termpersistence. We also endorse the pursuit of data from sib-ling, neighborhood and school fixed effects models andfrom so-called “natural experiments” such as school policychanges that provide significantly more intensive academictraining to a subset of students who can be compared witha very similar group of “untreated” students. An example isthe “double dose” algebra training introduced to low-achieving ninth-graders in the Chicago Public Schools asdefined by an eighth-grade math test score cutoff (Cortes &Goodman, 2014). Natural experiments (including thedouble-dose program) are not unproblematic (they oftenface the same problems with external and construct valid-ity), but they can help to adjudicate competing theories oflearning.

Should we give up on the idea of using correlational dataanalysis to make theoretical or policy-relevant inferencesabout children’s academic development? We are not sopessimistic. Indeed, we believe that when exposed to riskiertests and informed by prior experimental work, correlationaldata analyses can also help triangulate to the most usefultheories of children’s academic development: theories thatcan accurately predict when the effects of academic inter-ventions will fade out or persist.

References

Almond, D., & Mazumder, B. (2011). Health capital and the prenatalenvironment: The effect of Ramadan observance during pregnancy.American Economic Journal: Applied Economics, 3, 56–85. http://dx.doi.org/10.1257/app.3.4.56

Anusic, I., & Schimmack, U. (2015). Stability and change of personalitytraits, self-esteem, and well-being: Introducing the meta-analytic stabil-ity and change model of retest correlations. Journal of Personality andSocial Psychology, 110, 766–781.

Aunola, K., Leskinen, E., Lerkkanen, M.-L., & Nurmi, J.-E. (2004).Developmental dynamics of math performance from pre-school to Grade2. Journal of Educational Psychology, 96, 699–713. http://dx.doi.org/10.1037/0022-0663.96.4.699

Bailey, D. H., Duncan, G., Odgers, C., & Yu, W. (2017). Persistence andfadeout in the impacts of child and adolescent interventions (WorkingPaper No. 2015–27). Retrieved from: http://www.lifecoursecentre.org.au/working-papers/persistence-and-fadeout-in-the-impacts-of-child-and-adolescent-interventions

Bailey, D. H., Nguyen, T., Jenkins, J. M., Domina, T., Clements, D. H., &Sarama, J. S. (2016). Fadeout in an early mathematics intervention:Constraining content or preexisting differences? Developmental Psy-chology, 52, 1457–1469. http://dx.doi.org/10.1037/dev0000188

Bailey, D. H., Siegler, R. S., & Geary, D. C. (2014). Early predictors ofmiddle school fraction knowledge. Developmental Science, 17, 775–785. http://dx.doi.org/10.1111/desc.12155

Bailey, D. H., Watts, T. W., Littlefield, A. K., & Geary, D. C. (2014). Stateand trait effects on individual differences in children’s mathematicaldevelopment. Psychological Science, 25, 2017–2026. http://dx.doi.org/10.1177/0956797614547539

Baroody, A. J. (1987). The development of counting strategies for single-digit addition. Journal for Research in Mathematics Education, 18,141–157. http://dx.doi.org/10.2307/749248

Bayley, N. (1949). Consistency and variability in the growth of intelligencefrom birth to eighteen years. The Pedagogical Seminary and Journal ofGenetic Psychology, 75, 165–196. http://dx.doi.org/10.1080/08856559.1949.10533516

Bouchard, T. J., Jr., & McGue, M. (2003). Genetic and environmentalinfluences on human psychological differences. Journal of Neurobiol-ogy, 54, 4–45. http://dx.doi.org/10.1002/neu.10160

Bus, A. G., & van IJzendoorn, M. H. (1999). Phonological awareness andearly reading: A meta-analysis of experimental training studies. Journalof Educational Psychology, 91, 403–414. http://dx.doi.org/10.1037/0022-0663.91.3.403

Campbell, F. A., Pungello, E. P., Miller-Johnson, S., Burchinal, M., &Ramey, C. T. (2001). The development of cognitive and academicabilities: Growth curves from an early childhood educational experi-ment. Developmental Psychology, 37, 231–242. http://dx.doi.org/10.1037/0012-1649.37.2.231

Clements, D. H., & Sarama, J. (2008). Experimental evaluation of theeffects of a research-based preschool mathematics curriculum. AmericanEducational Research Journal, 45, 443–494. http://dx.doi.org/10.3102/0002831207312908

Clements, D. H., & Sarama, J. (2013). Building blocks (Vol. 1 and 2).Columbus, OH: McGraw-Hill Education.

Clements, D. H., Sarama, J., Khasanova, E., & Van Dine, D. W. (2012).TEAM 3–5: Tools for elementary assessment in mathematics. Denver,CO: University of Denver.

Clements, D. H., Sarama, J. H., & Liu, X. H. (2008). Development of ameasure of early mathematics achievement using the Rasch model: TheResearch-Based Early Maths Assessment. Educational Psychology, 28,457–482. http://dx.doi.org/10.1080/01443410701777272

Clements, D. H., Sarama, J., Layzer, C., Unlu, F., Wolfe, C. B., Fesler, L.,. . . Spitler, M. E. (2017). Effects of TRIAD on mathematics achievement:Long-term impacts. Manuscript submitted for publication.

Clements, D. H., Sarama, J., Spitler, M. E., Lange, A. A., & Wolfe, C. B.(2011). Mathematics learned by young children in an intervention basedon learning trajectories: A large-scale cluster randomized trial. Journalfor Research in Mathematics Education, 42, 127–166.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

92 BAILEY, DUNCAN, WATTS, CLEMENTS, AND SARAMA

Clements, D. H., Sarama, J., Wolfe, C. B., & Spitler, M. E. (2013).Longitudinal evaluation of a scale-up model for teaching mathematicswith trajectories and technologies: Persistence of effects in the thirdyear. American Educational Research Journal, 50, 812–850. http://dx.doi.org/10.3102/0002831212469270

Cole, D. A., Martin, N. C., & Steiger, J. H. (2005). Empirical andconceptual problems with longitudinal trait-state models: Introducing atrait-state-occasion model. Psychological Methods, 10, 3–20. http://dx.doi.org/10.1037/1082-989X.10.1.3

Cortes, K. E., & Goodman, J. S. (2014). Ability-tracking, instructionaltime, and better pedagogy: The effect of Double-Dose Algebra onstudent achievement. The American Economic Review, 104, 400–405.http://dx.doi.org/10.1257/aer.104.5.400

Cunha, F., & Heckman, J. J. (2007). The technology of skill formation. TheAmerican Economic Review, 97, 31–47. http://dx.doi.org/10.1257/aer.97.2.31

Deary, I. J., Strand, S., Smith, P., & Fernandes, C. (2007). Intelligence andeducational achievement. Intelligence, 35, 13–21. http://dx.doi.org/10.1016/j.intell.2006.02.001

Duncan, G. J., Dowsett, C. J., Claessens, A., Magnuson, K., Huston, A. C.,Klebanov, P., . . . Japel, C. (2007). School readiness and later achieve-ment. Developmental Psychology, 43, 1428–1446. http://dx.doi.org/10.1037/0012-1649.43.6.1428

Engel, M., Claessens, A., Watts, T. W., & Farkas, G. (2016). Mathematicscontent coverage and student learning in kindergarten. EducationalResearcher, 45, 293–300. http://dx.doi.org/10.3102/0013189X16656841

Fraley, R. C., & Roisman, G. I. (2015). Do early caregiving experiencesleave an enduring or transient mark on developmental adaptation? Cur-rent Opinion in Psychology, 1, 101–106. http://dx.doi.org/10.1016/j.copsyc.2014.11.007

Geary, D. C. (2010). Missouri longitudinal study of mathematicaldevelopment and disability. British Journal of EducationalPsychology Monograph Series II, 7, 31– 49. http://dx.doi.org/10.1348/97818543370009X12583699332410

Geary, D. C., Hoard, M. K., Nugent, L., & Bailey, D. H. (2012). Mathe-matical cognition deficits in children with learning disabilities andpersistent low achievement: A five-year prospective study. Journal ofEducational Psychology, 104, 206 –223. http://dx.doi.org/10.1037/a0025398

Geary, D. C., Hoard, M. K., Nugent, L., & Bailey, D. H. (2013). Adoles-cents’ functional numeracy is predicted by their school entry numbersystem knowledge. PLoS ONE, 8, e54651. http://dx.doi.org/10.1371/journal.pone.0054651

Gresham, F., & Elliot, S. (1990). Social skills rating system. Circle Pines,MN: American Guidance Services, Inc.

Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. (2015). A critique of thecross-lagged panel model. Psychological Methods, 20, 102–116. http://dx.doi.org/10.1037/a0038889

Hofer, K. G., Lipsey, M. W., Dong, N., & Farran, D. (2013). Results of theearly math project: Scale-up cross-site results. Nashville, TN: Vander-bilt University. Retrieved from https://my.vanderbilt.edu/mathfollowup/reports/technicalreports

Jordan, N. C., Kaplan, D., Ramineni, C., & Locuniak, M. N. (2009). Earlymath matters: Kindergarten number competence and later mathematicsoutcomes. Developmental Psychology, 45, 850–867. http://dx.doi.org/10.1037/a0014939

Kaufman, S. B., Reynolds, M. R., Liu, X., Kaufman, A. S., & McGrew,K. S. (2012). Are cognitive g and academic achievement g one and thesame g? An exploration on the Woodcock–Johnson and Kaufman tests.Intelligence, 40, 123–138. http://dx.doi.org/10.1016/j.intell.2012.01.009

Kendler, K. S., Turkheimer, E., Ohlsson, H., Sundquist, J., & Sundquist, K.(2015). Family environment and the malleability of cognitive ability: ASwedish national home-reared and adopted-away cosibling controlstudy. Proceedings of the National Academy of Sciences of the United

States of America, 112, 4612– 4617. http://dx.doi.org/10.1073/pnas.1417106112

Kilpatrick, J., Swafford, J., & Findell, B. (Eds.). (2001). Adding it up:Helping children learn mathematics. Washington, DC: National Acad-emies Press.

Lemaire, P., & Siegler, R. S. (1995). Four aspects of strategic change:Contributions to children’s learning of multiplication. Journal of Exper-imental Psychology: General, 124, 83–97. http://dx.doi.org/10.1037/0096-3445.124.1.83

Li, W., Leak, J., Duncan, G. J., Magnuson, K., Schindler, H., & Yo-shikawa, H. (2016). Is timing everything? How early childhood educa-tion program impacts vary by starting age, program duration and timesince the end of the program (Working Paper). National Forum on EarlyChildhood Policy and Programs, Meta-analytic Database Project. Centeron the Developing Child, Harvard University.

Massachusetts Department of Elementary and Secondary Education.(2011). Massachusetts curriculum framework for mathematics. Malden,MA: Massachusetts Department of Elementary and Secondary Educa-tion.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, SirRonald, and the slow progress of soft psychology. Journal of Consultingand Clinical Psychology, 46, 806–834. http://dx.doi.org/10.1037/0022-006X.46.4.806

Meehl, P. E. (1990). Why summaries of research on psychological theoriesare often uninterpretable. Psychological Reports, 66, 195–244. http://dx.doi.org/10.2466/pr0.1990.66.1.195

Moffitt, T. E. (1993). Adolescence-limited and life-course-persistent anti-social behavior: A developmental taxonomy. Psychological Review,100, 674–701. http://dx.doi.org/10.1037/0033-295X.100.4.674

National Council of Teachers of Mathematics. (2000). Principles andstandards for school mathematics (Vol. 1). Reston, VA: National Coun-cil of Teachers of Mathematics.

Nguyen, T., Watts, T. W., Duncan, G. J., Clements, D. H., Sarama, J. S.,Wolfe, C., & Spitler, M. E. (2016). Which preschool mathematicscompetencies are most predictive of fifth grade achievement? EarlyChildhood Research Quarterly, 36, 550–560. http://dx.doi.org/10.1016/j.ecresq.2016.02.003

NICHD Early Child Care Research Network. (2002). Early child care andchildren’s development prior to school entry: Results from the NICHDStudy of Early Child Care. American Educational Research Journal, 39,133–164. http://dx.doi.org/10.3102/00028312039001133

OECD. (2016). The Survey of Adult Skills: Reader’s Companion, SecondEdition, OECD Skills Studies (Paris: OECD Publishing). http://dx.doi.org/10.1787/9789264258075-en

Paris, S. G. (2005). Reinterpreting the development of reading skills.Reading Research Quarterly, 40, 184–202. http://dx.doi.org/10.1598/RRQ.40.2.3

Perkins, D. N., & Salomon, G. (1988). Teaching for transfer. Educationalleadership, 46, 22–32.

Pollack, J. M., Rock, D. A., Weiss, M. J., Burnett, S. A., Tourangeau, K.,West, J., & Hausken, E. G. (2005a). Early Childhood LongitudinalStudy–Kindergarten Class of 1998–99 (ECLS-K), psychometric reportfor the fifth grade. Washington, DC: U.S. Department of Education,National Center for Education Statistics. http://dx.doi.org/10.1037/e428752005-001

Pollack, J. M., Rock, D. A., Weiss, M. J., Burnett, S. A., Tourangeau, K.,West, J., & Hausken, E. G. (2005b). Early Childhood LongitudinalStudy–Kindergarten Class of 1998–99 (ECLS-K), psychometric reportfor the third grade. Washington, DC: U.S. Department of Education,National Center for Education Statistics. http://dx.doi.org/10.1037/e609682011-001

Prenoveau, J. M. (2016). Specifying and interpreting latent state–traitmodels with autoregression: An illustration. Structural Equation Mod-eling, 23, 731–749. http://dx.doi.org/10.1080/10705511.2016.1186550

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

93RISKY BUSINESS

Protzko, J. (2015). The environment in raising early intelligence: A meta-analysis of the fadeout effect. Intelligence, 53, 202–210. http://dx.doi.org/10.1016/j.intell.2015.10.006

Puma, M., Bell, S., Cook, R., Heid, C., Broene, P., Jenkins, F., . . . Downer,J. (2012). Third Grade Follow-up to the Head Start Impact Study FinalReport, OPRE Report # 2012–45. Washington, DC: Office of Planning,Research and Evaluation, Administration for Children and Families,U.S. Department of Health and Human Services.

Purpura, D. J., & Lonigan, C. J. (2013). Informal numeracy skills: Thestructure and relations among numbering, relations, and arithmetic op-erations in preschool. American Educational Research Journal, 50,178–209.

Reinhart, A. L., Haring, S. H., Levin, J. R., Patall, E. A., & Robinson, D. H.(2013). Models of not-so-good behavior: Yet another way to squeezecausality and recommendations for practice out of correlational data.Journal of Educational Psychology, 105, 241–247. http://dx.doi.org/10.1037/a0030368

Rock, D. A., & Pollack, J. M. (2002). Early Childhood LongitudinalStudy–Kindergarten Class of 1998–99 (ECLS-K), psychometric reportfor kindergarten through first grade. Washington, DC: U. S. Departmentof Education, National Center for Education Statistics.

Sarama, J., Lange, A. A., Clements, D. H., & Wolfe, C. B. (2012). Theimpacts of an early mathematics curriculum on oral language andliteracy. Early Childhood Research Quarterly, 27, 489–502. http://dx.doi.org/10.1016/j.ecresq.2011.12.002

Scarr, S., & McCartney, K. (1983). How people make their own environ-ments: A theory of genotype greater than environment effects. ChildDevelopment, 54, 424–435.

Schenke, K., Lam, A. C., Rutherford, T., & Bailey, D. H. (2016). Constructconfounding among predictors of mathematics achievement. AERAOpen, 2. http://dx.doi.org/10.1177/2332858416648930

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental andquasi-experimental designs for generalized causal inference. Boston,MA: Houghton Mifflin.

Siegler, R. S., Duncan, G. J., Davis-Kean, P. E., Duckworth, K., Claessens,A., Engel, M., . . . Chen, M. (2012). Early predictors of high schoolmathematics achievement. Psychological Science, 23, 691–697. http://dx.doi.org/10.1177/0956797612440101

Smith, T. M., Cobb, P., Farran, D. C., Cordray, D. S., & Munter, C.(2013). Evaluating math recovery assessing the causal impact of adiagnostic tutoring program on student achievement. American Ed-ucational Research Journal, 50, 397– 428. http://dx.doi.org/10.3102/0002831212469045

Stanovich, K. E. (1986). Matthew effects in reading: Some consequencesof individual differences in the acquisition of literacy. Reading ResearchQuarterly, 21, 360–407. http://dx.doi.org/10.1598/RRQ.21.4.1

Steyer, R. (1987). Konsistenz und Spezifitaet: Definition zweier zentralerBegriffe der Differentiellen Psychologie und ein einfaches Modell zuihrer Identifikation [Consistency and specificity: Definition of two cen-

tral concepts of differential psychology and a simple model for theiridentification]. Zeitschrift für Differentielle und Diagnostische Psy-chologie, 8, 245–258.

Steyer, R., & Schmitt, T. (1994). The theory of confounding and itsapplication in causal modeling with latent variables. In A. von Eye &C. C. Clogg (Eds.), Latent variables analysis: Applications for develop-mental research (pp. 36–67). Thousand Oaks, CA: Sage.

Szücs, D., Devine, A., Soltesz, F., Nobes, A., & Gabriel, F. (2014).Cognitive components of a mathematical processing network in 9-year-old children. Developmental Science, 17, 506–524. http://dx.doi.org/10.1111/desc.12144

Tourangeau, K., Nord, C., Lê, T., Sorongon, A. G., Najarian, M., &Hausken, E. G. (2009). Early childhood longitudinal study, kindergartenclass of 1998–99 (ECLS-K) combined user’s manual for the ECLS-KEighth-Grade and K-8 full sample data files and electronic codebook.Washington, DC: U.S. Department of Education, National Center forEducation Statistics.

Tucker-Drob, E. M., & Briley, D. A. (2014). Continuity of genetic andenvironmental influences on cognition across the life span: A meta-analysis of longitudinal twin and adoption studies. Psychological Bul-letin, 140, 949–979. http://dx.doi.org/10.1037/a0035893

van Ijzendoorn, M. H., Juffer, F., & Poelhuis, C. W. K. (2005). Adoptionand cognitive development: A meta-analytic comparison of adopted andnonadopted children’s IQ and school performance. Psychological Bul-letin, 131, 301–316. http://dx.doi.org/10.1037/0033-2909.131.2.301

Watts, T. W., Clements, D. H., Sarama, J., Wolfe, C. B., Spitler, M. E., &Bailey, D. H. (2016). Does early mathematics intervention change theprocessing underlying children’s mathematics achievement? Journal ofResearch on Educational Effectiveness.

Watts, T. W., Duncan, G. J., Clements, D. H., & Sarama, J. (2017). Whatis the Long-Run Impact of Learning Mathematics During Preschool?Child Development. Advance online publication. http://dx.doi.org/10.1111/cdev.12713

Watts, T. W., Duncan, G. J., Siegler, R. S., & Davis-Kean, P. E. (2014).What’s past is prologue: Relations between early mathematics knowl-edge and high school achievement. Educational Researcher, 43, 352–360. http://dx.doi.org/10.3102/0013189X14553660

Weiland, C., Wolfe, C. B., Hurwitz, M. D., Clements, D. H., Sarama, J. H.,& Yoshikawa, H. (2012). Early mathematics assessment: Validation ofthe short form of a prekindergarten and kindergarten mathematics mea-sure. Educational Psychology, 32, 311–333. http://dx.doi.org/10.1080/01443410.2011.654190

Wicherts, J. M. (2016). The importance of measurement invariance inneurocognitive ability testing. The Clinical Neuropsychologist, 30,1006–1016. http://dx.doi.org/10.1080/13854046.2016.1205136

Received August 22, 2016Revision received December 21, 2016

Accepted January 22, 2017 �

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

94 BAILEY, DUNCAN, WATTS, CLEMENTS, AND SARAMA


Recommended